Skip to content

Standards on <data-format/modality> acquisition

Version

<SemVer>(e.g. 0.1.0) See Contributing.md for more information on versioning.

Introduction

This section should briefly introduce the data format and its purpose.

Raw Data Format

File format

This section describes the raw data format of the asset. Data is considered in its Raw format when it is directly acquired from the hardware and logged without any lossy / compression transformation. The resulting data asset will be considered immutable.

The section should also include a brief description of the folder directory that results from the generation of the data asset. We recommend using the file-tree-generator vscode extension or the tree command in the terminal to generate a tree structure of the data asset.

e.g.:

📦behavior
┣ 📂foo
┃  ┗ 📂bar_datetime
┃     ┣ 📜baz.dat
┃     ┗ 📜baz_metadata.txt
...

Application notes

This section is reserved to provide additional information on how to acquire the data in the data format described above. It can include information relative to the hardware (e.g. supported models), software interface (e.g. SoftwareFoo with version >= 1.0.0), ideally, some easy to deploy or follow examples that can get anyone to reproduce the data format.

Relationship to aind-data-schema

This section is reserved to describe how the data format relates and/or is represented by the aind-data-schema library. Examples include how the hardware and software metadata information should be encoded in the schemas, how data acquired should exist as Epochs and other critical conventions. It is important to note that this section should create a one-way dependency, where data formats should NOT depend on the schema and instead stand on their own. aind-data-schema should instead be used to provide extra information, context and validation to the data asset.

File Quality Assurances

This section is reserved to describe what features of the data format should be true if the data asset is to be considered valid. Conceptually, this section should describe features that can be easily tested and validated by unit tests. Examples include: - "There will always be two files: data.dat and metadata.txt" - "For each frame in video.avi, there will be a corresponding row in metadata.csv" - "Field Bar will always be a positive integer" - "The first timestamp value in metadata_camera.csv will always be greater than the first one in metadata_behavior.csv - The Time column in file.bin is assumed to be aligned (sharing the same time domain) with Time column of another_file.csv

Primary Data Format

File format

This section describes the primary data format of the asset, which is the format of the data as it is uploaded. Primary data can have minimal processing applied, usually a compression or file format transformation. This section describes that transformation (if any), and the format of the resulting data. Similarly to the raw data format, it is considered immutable.

Application notes

Identical to the Application notes section in the Acquisition/Raw Data Format section. It should ideally contain information on how to generate the primary data format from the raw data format.

Relationship to aind-data-schema

Identical to the Relationship to aind-data-schema Session section in the Acquisition/Raw Data Format section.

File Quality Assurances

Identical to the File Quality Assurances section in the Acquisition/Raw Data Format section.

Derived Data Format

File format

This section is reserved to describe any derived data format from the primary data format. Derived data formats are considered to be post-processed data assets (potentially lossy) that are generated from the raw or primary data format. These are generally generated after the data has been acquired and uploaded. While immutable after created, derived data can be regenerated from primary data assets.

Application notes

Identical to the Application notes section in the Acquisition/Raw Data Format section. It should ideally contain information on how to generate the derived data format from the raw and/or primary data format.

Relationship to aind-data-schema

Identical to the Relationship to aind-data-schema Session section in the Acquisition/Raw Data Format section.

File Quality Assurances

Identical to the File Quality Assurances section in the Acquisition/Raw Data Format section.