Standard conventions

Version

0.1.0-draft

Introduction

The goal of this document is NOT to be exhaustive and opinionated. Instead, it should include a minimal list of common patterns that can be easily referenced and re-used across all data formats standards guaranteeing a minimal level of consistency and quality.

Filenames

In general, filename conventions will be defined by the specific data format standard. However, some general rules will be enforced:

  • Filenames must not contain spaces or special characters. Use this as a reference for special characters.

  • “Underscore” _ should be used instead of “-” or any other special character to separate words.

  • Filenames should always contain a file extension. If no extension is provided, the file will be considered a flat binary file.

  • ANY file name can be suffixed with a datetime. This suffix will ALWAYS be the last suffix in the filename, in case multiple suffixes are used, and will follow the ISO 8601 format and always be timezone aware, or UTC if no tz information is provided. If a datetime field is added we will adopt the format YYYY-MM-DDTHHMMSS e.g. 2023-12-25T133015 (Reference). See the datetime: section for more details.

  • As an example, if two files (data_stream.bin) are generated as part of two different acquisition streams:

📂Modality
┣ 📜data_stream_2023-12-25T133015Z.bin
┗ 📜data_stream_2023-12-25T145235Z.bin
  • This rule can be generalized to container-like file formats by adding the suffix to the container:

📂Modality
┣ 📂FileContainer_2023-12-25T133015Z
┃ ┣ 📜file1.bin
┃ ┗ 📜file2.csv
┣ 📂FileContainer_2023-12-25T145235Z
┃ ┣ 📜file1.bin
┗ ┗ 📜file2.csv

Datetime

All datetime used in data formats should follow the ISO 8601 standard. This standard is widely used and supported by most programming languages and libraries. All datetime are expected to be timezone aware, or UTC if no timezone information is provided. We encourage the following pattern:

  • YYYY-MM-DDTHHMMSS e.g. 2023-12-25T133015

  • YYYY-MM-DDTHHMMSSZ e.g. 2023-12-25T133015Z

  • YYYY-MM-DDTHHMMSS±HHMM e.g. 2023-12-25T133015+1200

The following examples show how to parse these formats in Python:

import datetime

implicit_utc = "2023-12-25T133015"
utc = "2023-12-25T133015Z"
tz_aware = "2023-12-25T133015+1200"

print(datetime.datetime.fromisoformat(implicit_utc))
#  2023-12-25 13:30:15
print(datetime.datetime.fromisoformat(utc))
#  2023-12-25 13:30:15+00:00
print(datetime.datetime.fromisoformat(tz_aware))
#  2023-12-25 13:30:15+12:00

Tabular formats

The supported tabular formats are:

Comma-separated values (CSV)

CSV files will follow a subset of the RFC 4180 standard. The following rules will be enforced:

  • The first row will always be the header row.

  • The separator will always be a comma (,).

  • The file will always be encoded in UTF-8.

  • The extension of the file will always be .csv.