Meeting Notes

Agenda

@pavi-rajes @majumderb

Introduction to Allen Institute for Artifical intelligence 2 AI-assisted research tool called Asta

Discussion on the potential to leverage this tool as part of the analysis of the community project.

Meeting Recording

Meeting Notes

Overview of AI Tools for Scientific Discovery: Bodhi provided a comprehensive introduction to the suite of AI tools developed at AI2, including Paper Finder, Data Voyager, and Auto Discovery, explaining their roles in literature-driven and data-driven scientific discovery, with Pavithra later demonstrating their application in neuroscience research.

Paper Finder Functionality: Bodhi described Paper Finder as a tool that allows users to explore scientific literature using natural language queries, offering deep reasoning for paper selection, links to papers, and evidence summaries, which can be used for literature surveys and comparative reviews.

Data Voyager Capabilities: Bodhi explained that Data Voyager enables researchers to upload datasets and ask questions, with the system generating Python code to analyze the data, providing reproducible results and assumptions, and supporting privacy-preserving deployment across institutions.

Auto Discovery and Hypothesis Search: Bodhi introduced Auto Discovery as a tool for open-ended hypothesis generation, leveraging Bayesian surprise and Monte Carlo tree search to identify and prioritize surprising hypotheses, and described its successful application in Alzheimer and cancer research datasets.

Integration and Workflow: Bodhi outlined how Auto Discovery and Data Voyager can be used in tandem, starting with open-ended exploration to generate hypotheses and then using goal-driven analysis for secondary verification, creating a cyclical workflow for scientific inquiry.

Application of AI Tools to Neuroscience Data: Pavithra demonstrated the use of Auto Discovery and Data Voyager on neuroscience datasets, detailing the setup, hypothesis generation, analysis plans, code validation, and the iterative process for extracting meaningful findings from complex neural recordings.

Dataset Setup and Context: Pavithra described uploading NWB files representing individual subjects, providing context about the data, known gaps, and task details, and emphasized the importance of specifying intent to guide hypothesis generation.

Hypothesis Generation and Analysis: Pavithra illustrated how Auto Discovery generates hypotheses, creates analysis plans, and produces code to test hypotheses, including loading data, applying quality control, and performing statistical tests across multiple subjects.

Validation and Iteration: Pavithra explained the built-in validation system that reviews whether generated code faithfully tests the hypothesis, prompting iteration if necessary, and highlighted the collaborative interaction between human scientists and AI agents.

Handling Complex Data: Pavithra discussed the challenges of analyzing complex neuroscience data, the importance of data curation, feature extraction, and metric selection, and noted that the AI tools facilitate iterative analysis, reducing the time required for discovery.

Mechanism of Prior Belief and Bayesian Surprise: David raised a question about how the system establishes prior belief before experiments, which Bodhi and Pavithra addressed by explaining the use of language models trained on scientific literature to estimate prior probabilities, and the potential for recalibration based on human input or literature evidence.

Prior Belief Estimation: Bodhi clarified that prior belief is derived from the language model's parametric knowledge, which reflects its training on scientific literature and serves as a proxy for human intuition in hypothesis evaluation.

Calibration and Surprisal: Bodhi noted that the prior may not always align with human expectations, and the system measures Bayesian surprise as the epistemic shift between prior and posterior beliefs after data analysis, with the possibility of recalibrating surprisal scores using user-provided beliefs.

Literature-Based Recalibration: Pavithra described using Paper Finder to search for supporting or opposing evidence in the literature for generated hypotheses, allowing human scientists to adjust their belief based on available evidence and contextual relevance.

Collaborative Use and Data Sharing Strategies: Jerome, Bodhi, Pavithra, and Alexander discussed approaches for collaborative use of AI tools, including preloading datasets, sharing analysis notebooks via cloud platforms like Google Colab, and handling data access and privacy concerns, with plans to make OpenScope data accessible for group exploration.

Collaboration Features: Bodhi explained that both Auto Discovery and Data Voyager support sharing results, but collaborative editing requires careful management of data access and user consent, especially for public-facing versions.

Preloading and Access Control: Bodhi suggested preloading OpenScope datasets to facilitate shared access, and described technical solutions such as pre-signed URLs for private S3 buckets and public access via Dandy for large datasets.

Cloud-Based Analysis: Alexander recommended using Google Colab for reproducible, dependency-managed analysis, and Bodhi confirmed that their tools operate in cloud environments with fixed execution settings, supporting integration with cloud notebooks.

Next Steps for OpenScope Data: Jerome proposed working together to make OpenScope predictive processing data accessible to the AI tools, encouraging group members to generate hypotheses and share outputs via Google Colab or GitHub for iterative exploration.

Guidance on Intent Specification and Hypothesis Selection: Jerome, Marcel, Bodhi, and Pavithra discussed the importance of specifying intent to focus hypothesis generation, the use of Monte Carlo tree search to prioritize surprising hypotheses, and strategies for iterative narrowing of research directions based on initial findings.

Intent Box Usage: Pavithra and Bodhi explained that specifying intent in Auto Discovery helps narrow the scope of hypothesis generation, allowing researchers to guide the system toward relevant questions and improve the quality of generated hypotheses.

Monte Carlo Tree Search: Bodhi detailed how Monte Carlo tree search rewards nodes with surprising hypotheses, balancing exploration and exploitation to efficiently sample the hypothesis space and prioritize follow-up experiments.

Iterative Refinement: Bodhi described running Auto Discovery without intent to identify promising directions, then iteratively refining the search by specifying those directions as intent in subsequent runs.

Contact and Follow-Up: Jerome asked about the best way to reach out for further collaboration, and Bodhi and Pavithra recommended email communication for follow-up and support.

Email Communication: Bodhi and Pavithra advised participants to contact them via email for questions, collaboration, or support regarding the AI tools and ongoing projects.