Sharing Data to Solve the Riddle of Autism

Sharing Data to Solve the Riddle of Autism

Sharing Data to Solve the Riddle of Autism 620 372 IEEE Pulse

Worldwide, at least one in 100 people has autism spectrum disorder (ASD). In the United States, the Centers for Disease Control and Prevention put the number at one in 68 [1]. Despite this high rate of prevalence and the increased public awareness of autism in recent years, the underlying mechanisms of ASD still remain unclarified.
In an attempt to shed some much-needed light on this condition, a grassroots effort called the Autism Brain Imaging Data Exchange (ABIDE) [2] is bringing together labs from around the world to contribute resting state functional magnetic resonance imaging (R-fMRI) data sets that provide infor- mation about brain activity, along with anatomical and phenotypic data sets. The idea is to maintain an open-access resource so scientists from many disciplines can delve into the data to understand why ASD occurs in one person but not the next and to gain insight into the considerable diversity within the disorder itself.
The first iteration of the effort—now known as ABIDE I—began gathering previously collected data in December 2011; in August 2012, it released 1,112 data sets from 17 international sites. The data sets were nearly evenly split between individuals who had ASD and control individuals who did not. Since then, a second phase—ABIDE II—has received funding from the National Institute of Mental Health to carry on with the work; as of June 2017, it has aggregated more than 1,100 additional data sets, which were released to the broader scientific community last summer.
To learn more about the initiative, IEEE Pulse spoke with Adriana Di Martino, M.D., who cofounded and now coordinates ABIDE, and Michael Milham, M.D., Ph.D., a member of the ABIDE team tasked with aggregating and organizing imaging data. Di Martino is an associate professor in the Department of Child and Adolescent Psychiatry at the New York University (NYU) Langone Medical Center. Milham is director of the Center for the Developing Brain at the non- profit Child Mind Institute in New York City, as well as director of the Center for Biomedical Imaging and Neuromodulation at the Nathan S. Kline Institute for Psychiatric Research in Orangeburg, New York.
IEEE Pulse: Talk a bit about what we know—or don’t know— about ASD.

Adriana Di Martino, M. D., who cofounded and now coordinates the Autism Brain Imaging Data Exchange (ABIDE), M.D., is an associate professor in the Department of Child and Adolescent Psychiatry at New York University (NYU) Langone Medical Center. Photo courtesy of NYU Langone Medical Center
Adriana Di Martino, M.D.

Di Martino: What we know today is that autism is a neurodevelopmental condition that is characterized by impairment in social communication and by a pattern of restricted, stereotyped interests or behaviors that can be seen very early on. During play, for instance, a young child with autism may spend more time lining up toys in a particular order than engaging in pretend play. Later on, in individuals who are verbal and have typical intelligence, this might [manifest] as being particularly interested in specific topics, such as trains or maps, to the point that these interests can interfere with their learning and their ability to interact.
In addition, ASD affects the ability to have typical social interactions, including the ability to maintain and sustain typical friendships. It doesn’t mean that the individuals with autism will not develop friendships at all; autism just makes it harder to navigate the social world. That said, what we also know is that the earlier the diagnosis is made, the more impactful the interventions.

Michael Milham, M.D., Ph.D., a member of the ABIDE team as well as director of the Center for the Developing Brain at the Child Mind Institute in New York City, and director of the Center for Biomedical Imaging and Neuromodulation at the Nathan S. Kline Institute for Psychiatric Research, in Orangeburg, New York. Photo courtesy of the Child Mind Institute.
Michael Milham, M.D., Ph.D.

To date, however, we do not know the exact etiology or causes of ASD, and, while we know that it is a neurodevelopmental condition, we don’t know the specific mechanisms involved.
Milham: In terms of mechanisms, there is general agreement that patterns of brain connectivity in autism—whether you’re looking structurally or functionally—are different. The question now is, what is the nature of these differences? And as you can imagine, the brain is highly complex and heterogeneous, so maybe there’s one network or set of networks that shows increased connectivity in autism and another that may show decreased connectivity.
IEEE Pulse: How does ABIDE fit into the quest for a greater understanding of ASD?
Di Martino: Although we don’t know the underlying mechanisms of autism, the evidence from multiple disciplines—genetics, pathology, and neuroimaging—suggests that connectivity of brain circuits, which we call the connectome, is involved. The awareness that autism may be a disconnection syndrome was one of the main reasons that motivated the ini- tial grassroots initiative. The other motivator was the realization that very large-scale databanks and collaborations have been feasible and successful to study complex disorders, as shown in genetics and other prior successful data sharing initiatives, including one to investigate attention deficit hyperactivity disorder (ADHD) [3].
For autism, if you think about the millions of connections existing in the brain, in combination with the significant and remarkable heterogeneity of its presentation—and likely in its underlying mechanisms—we have to deal with a formidable challenge. Before we started ABIDE, neuroimaging data in particular were collected from single labs, generally yielding relatively small sample size studies. So while each individual lab may have had great ideas on how to address important questions about the brain connectome in autism, the sample sizes that each lab could afford to generate, due to the cost of these studies, were limited.
So we started with a group of colleagues who had already participated in data-sharing initiatives, and we began spreading the word and the invitation among other colleagues. We were pleased to find they were ready to provide access to data for the purpose of accelerating the pace of discovery about the brain connectome in autism.
IEEE Pulse: What led up to ABIDE?
Milham: ABIDE is essentially part of a larger set of data-sharing initiatives, including the 1,000 Functional Connectomes Project [4], that started back in 2009. Each of more than 30 sites from around the world contributed R-fMRI data sets. The idea was that we would collect preexisting data sets, aggregate them, and share them so researchers could do functional connectivity analyses and various sorts of morphometry analyses. That was the beginning.
A year later, we founded the International Neuroimaging Data-Sharing Initiative [5], which was basically the next phase of the 1,000 Functional Connectomes Project. The idea there was to start to shift the community toward looking at more clinically enriched data sets, meaning data sets with more comprehensive phenotyping and psychiatric characterizations, and that also included clinical populations.
After that came the ADHD-200 data-sharing project Dr. Di Martino mentioned, which launched in 2011, and then ABIDE came along to focus on autism data. For that, Dr. Di Martino, Stewart Mostofsky (director of the Laboratory for Neurocognitive and Imaging Research, Kennedy Krieger Institute in Baltimore), and I worked to pull together previously collected autism imaging data, aggregate the data, openly share them, and put minimal restrictions on users. For instance, people don’t have to register their analyses with us, but we ask them to appropriately cite where they got the data.
IEEE Pulse: With the success of ABIDE I in providing more than 1,100 datasets, why was it important to continue with the second iteration of ABIDE?
Milham: There are multiple answers to that. First, at the purest level, you want discovery data sets and replication data sets, so you need more and more data sets. Second, because these sites aren’t working together when they generate the data, there’s a lot of variation in terms of what protocols are used, including which age groups are in the contributed data—some were from adults, some from children—so again more data are better to take into account for these variations. And then, beyond that, we know that autism, as well as most psychiatric disorders, is quite heterogeneous and has a range of presentations. So overall, the major driving forces for data-sharing initiatives, whether it’s ABIDE or something else, is to create large-scale data sets that attempt to capture heterogeneity as well as the varying sources of con- founding artifacts so that we can come up with more meaningful scientific findings.
IEEE Pulse: What are you learning from ABIDE I and II in terms of data collection and analysis?
Di Martino: We learned pretty quickly from ABIDE I that even though it was an unprecedentedly large sample, the data were still not sufficient to process the heterogeneity in ASD, and even with ABIDE II, we may find that an even larger data set may still be helpful to create more homogeneous subgroups within this disorder. As an obvious example, there are many more data from males than from females. While autism is more frequent in males by a ratio of about 3:1–4:1, it is still extremely important to learn about the brain connectome in females so that we can understand the mechanisms underlying this sex difference. There are other things that today we cannot see but that we may be able to see with more data.
Overall, I have to say that ABIDE I and ABIDE II have been successful in many ways, not only because we showed that it was feasible to do it, but also because there are already more than 77 peer-reviewed manuscripts that have used ABIDE data as of June 2017. In addition, this initiative has opened up the data not only to experts in autism but also to applied mathematicians, statisticians, and others who generally do not have easy access to these data.
Milham: One of the interesting findings actually came from ABIDE collaborators Alexandre Abraham and Gael Varoquaux (of Inria Saclay-Île-de-France, Saclay), who are looking into the challenges arising from data sets collected using different protocols. What their group did was come up with predictive classifiers for identifying the presence or absence of autism in an individual, which were robust to the site at which the data were collected [6].
IEEE Pulse: In other words, from these multiple sites that collected the data using different protocols, they still managed to develop a way to tease out sufficient patterns of connectivity so that they could identify which subjects had autism and which were controls.
Milham: Yes. The accuracy of predictions is still somewhere in the mid- to high-60th percentile range, so there’s obviously more work to be done, but they showed it was possible. In addition, they also found that the more data that were included, the better the classifiers performed. Many thought more data would mean more noise because the data sets are coming from different sites, but, instead, when the classifiers were presented with a data set from a site that had never been seen before, they were able to extract enough “signal” to find commonalities and make a prediction.
IEEE Pulse: Have ABIDE I and II contributed to our understanding of autism yet?
Milham: There has been a range of findings. One of the things that researchers are working to reconcile is the seemingly conflicting findings about whether the brain is more or less connected in autism, and by looking at autism more broadly through the initial data set from ABIDE I, we in the ABIDE consortium published a paper showing that, in autism, cortical–cortical connectivity is decreased, whereas subcortical–cortical connectivity is increased [7].
Di Martino: A couple of other leads are emerging and suggesting that along with the usual suspects, such as cortical circuits involved in social interaction, there are also circuits involved in sensory processes and motor processes that are involved and affected in autism.
IEEE Pulse: What do you ultimately hope that ABIDE I and II will accomplish?
Di Martino: One area of great interest is to be able to identify and clarify mechanisms, particularly those of the subtypes. If it is true that this heterogeneity has a biological underpinning, perhaps ABIDE can contribute and provide some needed insight.
Milham: My hope for ABIDE and the broader data-sharing initiatives is that they will advance our understanding of phenomenology associated with autism and our understanding of various brain differences, and will also start to foster more of a neurodevelopmental perspective by the inclusion of increasingly younger children.
My other hope is that it will push the scientific community to innovate the methodologies required for more sophisticated imaging-data analysis and predictive tools. As we start to develop predictive tools, we’ll be able to not only begin taking on the challenges of differentiating individuals into subtypes based upon patterns of brain connectivity, but also start to make predictions about prognosis and risk. That’s where we can start having an impact on early identification and intervention.


  1. Centers for Disease Control and Prevention, “Autism Spectrum Disorder (ASD): Data & Statistics.”
  2. ABIDE, “Welcome to the Autism Brain Imaging Data Exchange!”
  3. HD-200 Consortium, “The ADHD-200 Consortium: A model to advance the translational potential of neuroimaging in clinical neuroscience,” Front Syst Neurosci, vol. 6 (September 2012): 62.
  4. “1000 Functional Connectomes Project.”
  5. M. Mennes, B. B. Biswal, F. X. Castellanos, and M. P. Milham. “Making data sharing work: The FCP/INDI experience,” Neuroimage, vol. 82 (November 15, 2013): 683-691.
  6. A. Abraham, M. Milham, A. Di Martino, R. C. Craddock, D. Samaras, B. Thirion, and G. Varoquaux. “Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example,” Neuroimage, vol. 147 (February 15, 2017):736-745.
  7. A. Di Martino, C. G. Yan, Q. Li, E. Denio, F. X. Castellanos, K. Alaerts, J. S. Anderson, M. Assaf, S. Y. Bookheimer, M. Dapretto, B. Deen, S. Delmonte, I. Dinstein, B. Ertl-Wagner, D. A. Fair, L. Gallagher, D. P. Kennedy, C. L. Keown, C. Keysers, J. E. Lainhart, C. Lord, B. Luna, V. Menon, N. J. Minshew, C. S. Monk, S. Mueller, R. A. Müller, M. B. Nebel, J. T. Nigg, K. O’Hearn, K. A. Pelphrey, S. J. Peltier, J. D. Rudie, S. Sunaert, M. Thioux, J. M. Tyszka, L. Q. Uddin, J. S. Verhoeven, N. Wenderoth, J. L. Wiggins, S. H. Mostofsky, and M. P. Milham. “The autism brain imaging data exchange: Towards a large-scale evaluation of the intrinsic brain architecture in autism,” Molecular Psychiatry, vol. 19, no. 6 (June 2014): 659-67.