Standards on electrophysiology acquisition¶
Version: 0.0.2
Introduction¶
This document describes the standards for the acquisition of electrophysiology data in the Allen Institute for Neural Dynamics, which primarily consists of extracellular electrophysiology signals acquired using Neuropixels probes.
Raw Data Format¶
Following SciComp standards, ecephys data from experiments should be saved to the "ecephys" modality folder.
File format¶
The raw data format is a folder produced by the Open Ephys GUI (version>=0.6.0) in binary format and is organized as outlined in the official Open Ephys Documentation.
In addition, it is required that the Open Ephys folder is temporally aligned using the generate_report
or the align_timestamps
functions from the aind-ephys-rig-qc package to ensure that different streams are synced with each other either locally or to the HARP clock (see HARP format). Time alignment will produce additional timestamps files for each continuous/event stream: - original_timestamps.npy
- localsync_timestamps.npy
(only if HARP clock is detected)
Application notes¶
The raw data can be directly read using the open-ephys-python-tools or the SpikeInterface package.
Relationship to aind-data-schema¶
The rig.json
will contain relevant metadata from the Open Ephys settings, such as the probe names and serial numbers.
File Quality Assurance and Assumptions¶
The electrophysiology data is visually inspected during the experiments. In addition, after a session is acquired, a quality control report is generated using the aind-ephys-rig-qc generate_qc_report
function. This report is used to assess the quality of the data and to identify potential issues with the acquisition, including timestamps misalignments, abnormal noise levels and power spectra, and excessive drift.
Primary Data Format¶
File format¶
Before the data is uploaded to the cloud, it undergoes a data transformation to compress the raw data using the aind-data-transformation. The original Open Ephys folder is kept, but the .dat
files are clipped to reduce the file size maintaining maintain the original folder structure. This clipped Open Ephys folder is saved in the ecephys_clipped
folder. The .dat
files are compressed to zarr
using the SpikeInterface library. The compressed data is saved in the ecephys_compressed
folder.
The primary data format is organized as follows:
π¦ecephys
β£ πecephys_clipped
β β π<Record Node *I*>
β β π<experiment*J*>
β β£ π<recording*K*>
β β β£ π<continuous>
β β β β π<StreamName>
β β β β£ πclipped continuous.dat
β β β β πother intact files
β β β£ π<events>
β β β β π<StreamName>
β β β β π original event files
β β β π structure.oebin
β β π settings.xml
β πecephys_compressed
β£ π<experiment*J*_Record Node*I*#StreamName.zarr>
β β πchannel_ids
β β πproperties
β β πtimes_seg*K-1*
β β πtraces_seg*K-1*
β β π.zattrs
β ...
The ecephys_clipped
folder contains all the original Open Ephys folders and files, with the only difference that the .dat
files are clipped to 100 samples. Therefore, this folder can be opened by the normal sowtware tools that can read Open Ephys data, such as the open-ephys-python-tools and the SpikeInterface library.
The ecephys_compressed
folder contains the compressed data in the zarr
format. Data from different Open Ephys "experiments" and different streams are saved in different .zarr
files. The zarr
file is produced from the spikeinterface.BaseRecording.save_to_zarr()
function, which saves and compresses the data and its metadata to a .zarr
file. In case of multiple Open Ephys "recordings", the data is saved in different "segments" (e.g. traces_seg0
, traces_seg1
, etc.).
Here is a brief description of the content of the .zarr
file:
channel_ids
: list of channel ids from the original Open Ephys dataproperties
: dictionary with the properties of the recording (e.g.channel_locations
,gains
, etc.)times_seg*K-1*
: Open Ephys timestamps for each segmenttraces_seg*K-1*
: compressed traces for each segment
Compression details¶
The traces_seg*K-1*
is chunked in (30000, 384), so that each chunk corresponds to 1 second of data for all channels. By default, we use the wavpack
codec, implemented as a zarr
compressor in the wavpack-numcodecs package. Prior to compression, the LSB (least significant bit) offset is subtracted from each channel, using the spikeinterface.preprocessing.correct_lsb()
function. This is done to ensure that the data is centered around 0, which improves the compression ratio. For more information, see Buccino et al. 2023.
Application notes¶
Here is a snippet of code that shows how to read the clipped and compressed data using the SpikeInterface library:
import spikeinterface as si
import spikeinterface.extractors as se
# read number of blocks (experiments) and streams (probes) from the clipped data
num_blocks = se.get_neo_num_blocks("openephysbinary", "ecephys/ecephys_clipped")
stream_names, stream_ids = se.get_neo_streams("openephysbinary", "ecephys/ecephys_clipped")
# read the first block and stream
block_index = 0
stream_name = stream_names[0]
recording_clipped = se.read_openephys(
"ecephys/ecephys_clipped/",
block_index=block_index,
stream_name=stream_name
)
# read the openephys events
events = se.read_openephys_events("ecephys/ecephys_clipped/")
# read the compressed data for the first block and stream
recording_compressed = si.read_zarr(
f"ecephys/ecephys_compressed/experiment{block_index+1}_{stream_name}.zarr"
)
Relationship to aind-data-schema¶
The processing.json
includes the compression details and parameters.
File Quality Assurance and Assumptions¶
No further quality assurance is performed during the conversion from the raw data to the primary data format.