The Case for Big Data
Cigarette smoking is tied to lung cancer, but people still smoke. Why do people start smoking in the first place? That is one of the many complex, interdisciplinary questions behind the Kavli HUMAN Project, a massive data-collection endeavor with the goal of learning how everything—from biology to behavior and environment—affects the human condition.
A collaboration between the Kavli Foundation, the Institute for the Interdisciplinary Study of Decision Making at New York University (NYU), and the NYU Center for Urban Science and Progress, the Kavli HUMAN Project will gather a detailed array of measurements from 10,000 New York City residents over a 20-year span.
A team of experts with backgrounds ranging from computer science to economics and psychology to biology started the Kavli HUMAN Project as a way to learn about the widely ranging factors that affect health and well-being, says Andrew Caplin (Figure 1), director of the scientific agenda for the project and Silver Professor of Economics at NYU. That includes the things people can’t alter, such as their genetics or family history, and things they can, such as how much they exercise and what they eat. Caplin believes a more complete understanding of how those changeable and unchangeable aspects interact will reveal greater truths about health and the human condition.
“We at the Kavli HUMAN Project are nearing the end of a three-year development cycle that involves the deliberative design of the population sample, data-measurement devices, apps, and the scientific hypotheses we will be addressing with the data we collect,” he says. The process began with a framework that combined biological, behavioral, and environmental measurements and has evolved as more and more researchers have heard about the project and submitted white papers and suggestions.
“We’re also simultaneously working to find out what would make people in the community want to participate in a study of this nature and stay with us for as much of the 20-year study period as possible,” Caplin explains, noting that participants will be providing all kinds of personal data through such avenues as genome sequencing and other health measurements, daily app interactions, and location sensors that identify how much time they spend near other family members in the home. Focus groups are under way to help the researchers refine the methodology so that it provides sufficiently detailed information without being onerous to the participants.
“One thing that is pleasantly surprising is that people so far are really enthusiastic about taking part in the project,” says Hannah Bayer (Figure 2), chief scientist for the Kavli HUMAN Project and research professor of decision sciences at NYU. For instance, some are concerned about a family history of a certain disease and what they might be able to do to prevent it in themselves or their children; others wonder how they can alter environmental conditions to decrease asthma symptoms, she explains. “I think people generally want to make the world a better place, and we’re trying to tap into that because we feel the project can do that.”
Although a few of the fine details may change, Bayer says, participants will receive a briefing of the study, including what they can expect and an explanation of security measures to keep their data anonymous and secure. They will also receive an initial home visit from a study team, comprising a psychologist, a trained health professional, and a tech specialist. The psychologist performs IQ and mental-health testing; the health professional conducts a basic medical exam and takes blood and other samples; and the tech specialist describes and installs home instrumentation, including sensors to detect environmental conditions, devices to pick up the physical locations of different family members within the home relative to one another, and the core data-collection technology, which is a smartphone app.
The study team provides a smartphone and cellphone contract to those participants who don’t have one, Bayer adds. “This is important because we want a representative sample of all New Yorkers, not just those with smartphones. That’s critical for making sense of the data.” The app development was time consuming because project organizers wanted it to gather extremely diverse data, especially on behaviors, but also be fun and quick enough that participants would continue to use it. For that, they have a team that specializes in gamification designing the app to deliver game-type questionnaires or other interactions and provide small incentives, such as points participants can earn as they play. All interactions take three minutes or less and total no more than 20 minutes a week. “It actually winds up being quite a bit of time that we spend with them,” Bayer notes, “but it’s in these small chunks so hopefully they’ll play once a day and it won’t be a burden.”
The Kavli HUMAN Project will recruit a pilot group of participants early in 2017, begin enrolling its first study participants by the second half of 2017, and continue signing up people for about two to three years (Figure 3). The accumulating quantity of data will become a public resource, Bayer says. “We think that the way that we will have the most impact is by setting all the best minds free on the data, so we are building a cutting-edge security plan that will ensure the privacy of participants while also allowing researchers to access it and make the most of it.”
Because the Kavli HUMAN Project initially focuses on New York City, any findings will be representative of people there, says Caplin. “The Kavli HUMAN Project’s initial focus on New York City will allow researchers to learn almost everything about the city thanks to its representative study population,” he continues. “But, of course, it is also a template for similar surveys that can be representative of other populations.”
Adds Bayer, “We see this as a demonstration project. We believe that if we can get all of the technology and other pieces working in New York, you could roll out a similar study in Chicago or in Dallas, or in a suburban or rural area. I think the sky’s the limit.”