Meeting Notes

Agenda

Follow-up discussion on how to organize the first data release paper.

Led by @jeromelecoq, we aim to better assign sub-tasks for this paper.

The draft is available here

along with the list of tasks

Meeting Recording

Meeting Notes

Data Release Paper Organization and Collaborative Workflow: Jerome led the discussion on the structure and collaborative workflow for the upcoming data release paper, referencing contributions from Lucas, Sarah, Alex, and others, and emphasizing the use of Google Docs, Excel sheets, and Google Slides for transparent task assignment and authorship credit.

Collaborative Document Setup: Jerome explained that the team will use a combination of Google Docs, Excel sheets, and Google Slides to organize the paper, assign tasks, and track authorship, with the Excel sheet serving as both a to-do list and a record of contributions.

Figure and File Management: Jerome described how figures are currently placeholders linked to Google Slides, which Lucas is assembling, and the intent is to embed finalized figures into the manuscript for clarity and consistency.

Authorship Transparency: Jerome emphasized that anyone contributing feedback, comments, or edits should be listed as a co-author, ensuring transparency and recognition for all contributors.

Reference Management: Jerome mentioned the continued use of Paperpile for reference management within Google Docs, unless there are strong objections from the team.

Manuscript Framing and Section Content: Jerome outlined the framing of the manuscript, including the abstract, background rationale, statement of problems, and the importance of multimodal data, with input from Sarah regarding the operational definition of 'multimodal' and the need for clarity in terminology.

Section Structure: Jerome described the traditional structure for a data release paper, including sections for abstract, background rationale, statement of problems, and the relationship to the review, with placeholders added to clarify each section's purpose.

Multimodal Definition: Sarah raised concerns about the term 'multimodal,' suggesting it should be clearly defined as referring to different recording techniques rather than sensory modalities, and Jerome agreed to include this clarification in the manuscript and figures.

Gap Addressed by Dataset: Jerome discussed the need to explain which scientific gaps the dataset is intended to fill, particularly emphasizing cross-context comparability and multimodal aspects as central threads throughout the paper.

Methods Section Detailing and Feedback: Jerome and the team discussed the critical importance of the methods section, covering experimental design, recording modalities, and the need for comprehensive details, with Sarah and Stefan providing feedback on including LFP data, transgenic mouse line explanations, and code versioning.

Experimental Design Coverage: Jerome explained that the methods section will detail experimental animals, surgery, behavior training, stimulus parameters, and recording modalities, with figures illustrating CAD designs and exemplary assets for each technique.

Inclusion of LFP Data: Sarah asked about the inclusion of LFP data from neuropixels, and Jerome confirmed that both spike and LFP data are recorded and will be included in the methods section.

Transgenic Mouse Line Clarification: Sarah highlighted the confusion students face regarding transgenic mouse lines and sensor usage, prompting Jerome to agree on expanding the mouse table with descriptions and reasons for sensor choices, and to solicit student feedback for further clarity.

Code Versioning and Reproducibility: Stefan stressed the importance of providing frozen versions of the code used for data processing, and Jerome agreed to release public capsules containing both data and code, with tables indicating pipeline versions for each asset.

Data Processing and Unit Extraction: Jerome, Lucas, Sarah, Stefan, and Carter discussed the data processing pipelines for each modality, the extraction and validation of units, and the importance of linking code and providing both raw and processed data, with attention to terminology and reproducibility.

Processing Pipeline Explanation: Jerome described that each modality will have its own processing pipeline, and the code used for these pipelines will be linked via Code Ocean, allowing users to review the exact steps taken from raw to processed data.

Terminology Clarification: Sarah pointed out the ambiguity in terms like 'ROI,' prompting Jerome to agree on the need for a glossary to clarify terminology across modalities, ensuring users understand what each unit or ROI represents.

Code and Data Release Strategy: Carter explained that before the paper is released, all files across modalities will be re-uploaded with consistent versions to ensure compatibility, and both raw and processed data, along with the processing code, will be made available.

Validation Metrics: Jerome discussed the inclusion of quality control metrics such as signal-to-noise ratio, stability, and unit yield for each modality, with figures illustrating these aspects to help users assess data quality.

Data Records, Tables, and Glossary: Jerome and Sarah discussed the organization of data records, the creation of modality-specific tables, and the inclusion of a glossary or acronym list to aid users in navigating the dataset, with Carter providing updates on data upload timelines.

Data Table Organization: Jerome described plans for extensive tables listing experiments, animals, and modality-specific details, with Carter tasked to explain file organization through DANDI and raw files.

Glossary and Acronym List: Sarah and Jerome agreed on the need for a glossary or separate document listing acronyms and definitions, especially for terms like DMD, to reduce confusion for new users and students.

Data Upload Status: Carter updated the team on the status of data uploads, noting that electrophysiology files are awaiting QC, mesoscope files are nearly ready, and SLAP 2 packaging will take several more weeks, with raw data to be released via Code Ocean.

Data Validation and Analysis Plans: Jerome, Sarah, Lucas, Karim, and Nicholas discussed the validation section, including unit extraction, receptive field analysis, behavior data, and stimulus-evoked responses, with input on cell type classification, motion correction metrics, and figure design.

Validation Themes: Jerome proposed organizing validation into unit extraction, receptive field across modalities, behavior across modalities, and stimulus-evoked responses, with figures designed to parallelize descriptions across modalities.

Cell Type Classification: Sarah suggested including cell type classification via waveform clustering for electrophysiology, and Jerome agreed this could be provided as example code in the usage notes or as a supplemental section.

Motion Correction Metrics: Sarah asked about motion correction metrics, and Jerome confirmed that motion correction algorithms are run, with outputs available for plotting motion across modalities, noting differences in correction needs between techniques.

Behavioral Data Inclusion: Sarah and Lucas discussed the inclusion of behavioral metrics such as pupil tracking, running, and motion energy of the face, with Jerome noting the availability of face camera data and the challenge of processing pipelines for these metrics.

Figure Layout Consistency: Lucas and Stefan emphasized the importance of consistent figure layouts, suggesting column designs for modalities and rows for analysis types, and Jerome encouraged the team to use Google Slides for collaborative figure development.

Task Assignment and Excel Sheet Usage: Jerome reviewed the Excel sheet for task assignment, encouraging team members to self-organize, express interest, and transparently document their contributions, with Carter providing updates on data access timelines.

Task Assignment Process: Jerome explained that the Excel sheet lists manuscript sections and contributors, and team members should add their names to tasks they wish to lead, ensuring transparency and clear documentation of contributions.

Data Access Timeline: Carter informed the team that data access for some assets is expected within one to two weeks, and Jerome advised revisiting the sheet once data is available to help team members get started.

Upcoming AI Collaboration Presentation: Jerome announced an upcoming presentation by the Allen Institute for Artificial Intelligence (AI2) team, including Pavithra and Bodhisattwa, who will demonstrate AI tools for scientific analysis, with Karim raising questions about the types of AI agents used.

AI Tool Demonstration: Jerome shared that AI2 will present tools such as Semantic Scholar and Asta, which use large language models to assist in analysis and report generation, and invited the team to engage with the presenters and explore potential collaborations.

AI Agent Types: Karim asked whether the tools use only large language models or other AI agents, and Jerome suggested saving the question for the presenters, acknowledging the experiment's exploratory nature.