BrainScape: An Open-Source Framework for Integrating and Preprocessing Anatomical MRI Datasets
Date
Supervisor
Item type
Journal Article
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
Massachusetts Institute of Technology Press
Abstract
MRI has revolutionized our ability to investigate and understand brain structure and function in health and disease. A large amount of MRI data is widely available to researchers, both from large-scale multi-site consortia and smaller site-specific datasets. This wealth of MRI data offers opportunities to advance our understanding of the brain, particularly through machine learning and deep learning approaches that rely on large sample sizes to reveal complex associations between brain organization and its behavioral and clinical associations. Many large-scale initiatives provide extensive datasets with sufficient statistical power to support reproducibility, but reproducibility alone does not ensure clinical relevance or broad generalizability due to narrow demographic representations and minimized dataset variability. Recent work highlights the need to embrace dataset variability and open-science collaborations for pooling heterogeneous datasets. Nevertheless, effectively integrating these diverse resources remains a significant challenge. Inconsistencies in organization, data formatting, acquisition protocols, and metadata remain, especially for smaller, site-specific datasets, despite ongoing efforts within the neuroimaging community to standardize data sharing practices. To address these issues, we introduce BrainScape: a curated collection of 160 publicly available MRI datasets packaged with an open-source, plugin-based Python framework that automates the download, organization, preprocessing, and demographic attachment of the MRI data. Each individual dataset includes a detailed configuration file capturing all dataset-specific parameters, enabling other researchers to regenerate the BrainScape dataset. The current BrainScape dataset integrates 160 datasets, encompassing a total of 27227 subjects and 46583 multimodal MRI scans after quality control. The BrainScape framework’s pipeline effectively aggregates these heterogeneous datasets while preserving the original dataset structure and demographic details. Its modular design allows integration into data pipelines, supporting large-scale studies involving diverse cohorts and targeted research on rare phenotypes. BrainScape framework employs an easy-to-use plugin-based architecture with distinct modules for data downloading, file mapping, validation, preprocessing, and demographics attachment. Furthermore, each MRI image can be traced to its source project and repository, and subjects excluded from datasets are documented in dedicated dataset-specific configuration files, providing transparent and reproducible exclusion criteria. BrainScape dataset includes multiple MRI modalities such as T1-weighted (T1w), T2-weighted (T2w), gadolinium-enhanced T1-weighted (T1Gd), and fluid-attenuated inversion recovery (FLAIR) from diverse sources and integrates key demographic fields, such as age, sex, and handedness, for large-scale studies. This unified workflow reduces manual labor and minimizes the risk of data duplication and biases. By providing automated, transparent, and configurable workflows, BrainScape hopes to address open science challenges, accelerate data-driven investigations, and promote inclusivity and reproducibility in neuroscience research.Description
Keywords
BrainScape, MRI data integration, MRI preprocessing, and MRI data pooling, deep learning MRI, multimodal MRI dataset, 46 Information and Computing Sciences, 31 Biological Sciences, 4611 Machine Learning, Clinical Research, Biomedical Imaging, Aging, Neurosciences, Bioengineering, Machine Learning and Artificial Intelligence, Networking and Information Technology R&D (NITRD), Data Science, 1.4 Methodologies and measurements, Generic health relevance
Source
Imaging Neuroscience, ISSN: 2837-6056 (Print); 2837-6056 (Online), Massachusetts Institute of Technology Press, 3, IMAG.a.944-. doi: 10.1162/imag.a.944
Rights statement
© 2025 The Authors. Published under a Creative Commons
Attribution 4.0 International (CC BY 4.0) license. All articles published Open Access will be immediately and permanently free for everyone to read, download, copy and distribute.
