When undergraduate student Arnelle Etienne (Figure 1) joined a research group at Carnegie Mellon University (CMU), Pittsburgh, PA, USA, to help with the development of electroencephalograph (EEG) electrodes, her first task was to do some background work and learn about them. What she found was both surprising and dismaying: EEG electrodes had never worked—and still did not work on—a large segment of the population; and clinicians and researchers knew about the oversight.
Etienne is part of that segment of the population, and so are others in the research group. One of them is Jasmine Kwasa, who did her doctoral project on neuropsychiatric functioning using high-density EEG, but couldn’t record EEG data on herself. For both women, the electrodes were unable to make good contact with their scalps because they had naturally thick, coarse, and curly hair, a characteristic they share with the nearly 15% of U.S. residents who are also African-American.
“It is really unfortunate that all of our neuroscience understanding is based on sampling very little from more than a billion people in the world, so it’s so important to learn that these kinds of biases exist,” said research group leader Pulkit Grover, Ph.D. (Figure 2), associate professor of electrical and computer engineering from the Neuroscience Institute, CMU. He and his group had already been working on devices suited to low-income medical settings, so they quickly took up the job of correcting the long-standing failure. They are now preparing to commercialize the solution and make it widely accessible by “keeping the price point low,” he said.
Disparities in devices and other biomedical technologies are all too common. “It’s called implicit bias. No one intends to be biased; they just have assumptions from their own experiences and those assumptions don’t really cover everyone,” said Londa Schiebinger, Ph.D. (Figure 3), who chaired the European Commission expert group behind the 2020 report “Gendered Innovations 2: How Inclusive Analysis Contributes to Research and Innovation” , which defines the issue, presents case studies, and describes clear methods to reduce bias in research design.
Bias can affect research and health care in many ways. The Gendered Innovations report took special note of drug studies. While women have historically been underrepresented and sometimes completely unrepresented in such studies, sex differences in drug metabolism and elimination are associated with higher risks for adverse drug reactions, side effects, and overdose in women –, according to the report. It also points out other research demonstrating that body size, hormones, and other sex/gender differences are important factors, and states, “If women are not included in clinical trials, the real-world effects of a medicine will not be detected before it is released to the market.”
Gender bias is not the only issue, said Schiebinger, who is also the director of Gendered Innovations in Science, Health & Medicine, Engineering, and Environment , and John L. Hinds Professor of History of Science at Stanford University (Figure 3). Age, race, and culture are also important factors to consider. “The U.S. Food and Drug Administration (FDA) today asks that studies have a certain percent population of African Americans, but this rarely rises to statistical significance unless you’re really focusing on that demographic.” Bias is now more transparent with the addition of an FDA online Drug Trials Snapshots , or “dashboard” as Schiebinger describes it, which reveals the sex, race, and age distribution of each study. That is a positive step, but examination of the inclusiveness on the dashboard reveals that many studies still fall short, she said. As a 65-plus person, she has scanned the dashboard to check drug studies for participants in her age group, and often finds them lacking. “And yet who takes more drugs?” she asked.
“Everybody thinks it’s more expensive to do inclusive research, but if you make a drug that is developed on a male pipeline all the way through, it’s going to fail when you get to using it on real live women,” said Schiebinger, asserting that the same holds true for research that underrepresents any population. “So, you can say it’s more difficult or expensive, but really for whom?”
Biases in AI
One place where bias can be particularly problematic is artificial intelligence (AI), which is becoming an increasingly common part of health care devices, analytics, and decision-making, said James Zou, Ph.D. (Figure 4), assistant professor of biomedical data science at Stanford University and the faculty director of its AI for Health Program. “It’s especially important to think about how we can ensure these AI algorithms are reliable, trustworthy, and unbiased, and can work across diverse demographic settings and environments, and with diverse patients.”
Zou and his group have developed a systematic framework to audit the data being used to train the AI algorithm . He gave the example of training data for an algorithm that makes diagnoses from X-ray images. “Here, each X-ray image would actually have a lot of metadata attached to it, including: the demographics of the patient, such as race, gender, age, and comorbidities; how the image was taken, such as type of machine, day the image was taken, and name of the technician who took it; and how image is pre-processed, such as cropping or down-sampling, before it’s stored in the electronic health record, and before it’s given to the algorithm. All of those things could introduce different biases into the model, so our framework looks at metadata that might cause drop-offs in an algorithm’s performance.”
His research group has used this framework to audit a variety of algorithms, including some of the increasingly popular AI applications used to detect skin cancer. “These could be really important because there are billions of people around the world who lack access to skin cancer/skin care, so they could just use a phone to take a photo of a mole, and the algorithm tells you if it is malignant or benign,” Zou said. While the developers may report greater than 90% accuracy, his group’s audits have shown considerable drop-off in both: in some cases, down to 60 percent among some populations, notably those people with darker skin tones. Major contributors to the decrease were training datasets, which contained few to no dark skin images, he said. “That’s an example of the kinds of disparity in performance that we find in these AI algorithms, and which could be very damaging if they’re not caught and are actually applied to patients.”
Zou and his group have also found biases arising from the text data used to train AI algorithms. In health care, the bulk of that text data comes from clinical notes. “There are different levels of detail the patients provide, how clinicians take that data from patients, and how that data is curated, so there are opportunities for bias all down the pipeline,” he said. “AI is a super-exciting area and can be transformative for biomedicine and for health care,” he added, “but we need to make sure it works for diverse patients.”
EEG for all
The traditional EEG electrode is a clear example of bias in devices, but rather than ignoring the issue as researchers and clinicians had been doing for decades, the CMU group decided to do something about it.
“EEG is a 100-year-old technology, and our first line of defense for so many disorders, including the gold standard for epilepsy. Yet, if you talk to clinicians, EEG technologists, or neuroscientists, they are well aware that the electrodes don’t work with certain hair types, especially those that are common in black people,” Grover said. In fact, he noted, his group’s publication on their new EEG device  is the first mention of the problem in the research literature.
The design for the new device, called Sevo, is a laser-cut, polymer clip with a 3D-printed adapter that holds a standard electrode (Gold-Cup or BioStim). “We cornrow the hair in accordance with the ‘10-20 system’ so we can put the electrodes in proper places, add the electrode to the clip, and slide the clip between the corn rows,” Etienne described. The design of the clip, which has small wings to attach to either side of the cornrow braids, uses the strength of the hair to push it down onto the scalp and provide a good signal (Figure 5). When designing the clip, she found that it worked well on her hair, but not on slightly less-full hair. “I ended up excluding a different subsection of people with my hair type, so I had to check my own bias and we had to offer more than one size clip,” she said.
While the group continues doing final testing on the clip and collaborates on a clinical study with UPMC Children’s Hospital of Pittsburgh, it is working toward commercialization through Precision Neuroscopics, a startup designed to bring CMU medical devices to market. As that progresses, the group is also conducting research on other noninvasive brain-signal sensing and neurostimulation projects. “Our obsession is with disorders, such as epilepsy, brain injuries, and stroke, that can really benefit from better diagnosis,” Grover said, “so we want to make that happen in a way that the hardware, the algorithms, the portability, and everything else works for everybody.”
How to cut bias
The answer is to eliminate bias in biomedical technologies. That might not be simple, but it is possible and worth the effort, Schiebinger reiterated. “It’s just like doing any research: you need to do a literature review, you need to consider what methods you’re going to use, what your target population is – maybe you can’t target everyone, but you want to make sure you have included people who need this technology, this medical device or this drug.”
She recommended that biomedical innovators take a look at the “Detailed Methods” section of the Gendered Innovations 2 report for a step-by-step approach to identifying the potential sources of bias in design, data collection, and analysis; and to explain how bias was considered. That section also includes a wealth of basic literature to further describe each step. “We also have new methods to demonstrate how to consider this in advance, because it’s all about designing the research correctly from the very beginning,” Schiebinger said. “So many people just do something and then plan to fix it later, but that is unnecessary and expensive.”
When it comes to reducing bias in AI algorithms and models, Zou urged innovators to employ training databases that are appropriately diverse, but noted that is not an easy task because so few are available. “Creating a training dataset is a very expensive process, so once somebody does it, a lot of companies and researchers will use the same one. My group right now is investing a lot of resources to come up with and curate more diverse datasets,” he said. One that his group has just curated is the Stanford Diverse Dermatology Image Dataset, which will be available soon. He described it as “a large collection of skin images, which are confirmed with biopsy samples. And images come from darker-skinned patients.” In addition, innovators must do comprehensive testing of AI algorithms once they are deployed and over time to make sure it is functioning properly.
Although Schiebinger is pleased with the increasing awareness about bias in research and biomedical technologies, it requires constant vigilance on all three pillars of the science infrastructure . “We need the funding agencies on board at the beginning of the project to help make sure research is inclusive. We need peer-reviewed journals on board at the end of the project to make sure the manuscript took bias into account, or it shouldn’t be published,” she said, noting that both funding agencies and journals have been making good progress. “The third pillar of the scientific infrastructure is universities, and we just aren’t doing our part yet. Methods to reduce bias are not taught systematically in the engineering curriculum, and the medical curriculum does not include everything that is important for sex and gender, let alone race and ethnicity. These are variables that need to be part of undergraduate, graduate, and professional preparation.”
She added, “I think we’re in a period of huge change surrounding sex, gender, and diversity in research and design. We need people to continue working on this, and calling attention to it.”
- L. Schiebinger and I. Klinge, Eds., “Gendered innovations 2: How inclusive analysis contributes to research and innovation,” Publications Office Eur. Union, Luxembourg City, Luxembourg, Tech. Rep., 2020. Accessed: Sep. 30, 2021, doi: 10.2777/53572. [Online]. Available: https://ec.europa.eu/info/publications/genderedinnovation-2-how-inclusive-analysis-contributes-research-and-innovation_en
- A. C. Freire, A. W. Basit, R. Choudhary, C. W. Piong, and H. A. Merchant, “Does sex matter? The influence of gender on gastrointestinal physiology and drug delivery,” Int. J. Pharmaceutics, vol. 415, nos. 1–2, pp. 15–28, Aug. 2011.
- F. Franconi and I. Campesi, “Sex and gender influences on pharmacological response: An overview,” Expert Rev. Clin. Pharmacol., vol. 7, no. 4, pp. 469–485, Jul. 2014.
- I. Zucker and B. J. Prendergast, “Sex differences in pharmacokinetics predict adverse drug reactions in women,” Biol. Sex Differences, vol. 11, no. 1, pp. 1–14, Jun. 2020. Accessed: Sep. 30, 2021. [Online]. Available: https://bsd.biomedcentral.com/articles/10.1186/s13293-020-00308-5
- L. Schiebinger, I. Klinge, I. S. de Madariaga, H. Y. Paik, M. Schraudner, and M. Stefanick, Eds. (2021). Gendered Innovations in Science, Health & Medicine, Engineering and Environment. Accessed: Oct. 4, 2021. [Online]. Available: http://genderedinnovations.stanford.edu/
- U. S. Food and Drug Administration. Drug Trials Snapshots. Accessed: Oct. 4, 2021. [Online]. Available: https://www.fda.gov/drugs/drug-approvals-and-databases/drug-trials-snapshots
- E. Wu, K. Wu, R. Daneshjou, D. Ouyang, D. E. Ho, and J. Zou, “How medical AI devices are evaluated: Limitations and recommendations from an analysis of FDA approvals,” Nature Med., vol. 27, no. 4, pp. 582–584, Apr. 2021.
- A. Etienne et al., “Novel electrodes for reliable EEG recordings on coarse and curly hair,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., Jul. 2020, pp. 6151–6154.
- C. Tannenbaum, R. P. Ellis, F. Eyssel, J. Zou, and L. Schiebinger, “Sex and gender analysis improves science and engineering,” Nature, vol. 575, no. 7781, pp. 137–146, Nov. 2019.