Frequently Asked Questions
The Human Project is led by top scientists from New York University, with support from highly regarded experts based at MIT, UCLA, The University of Michigan, and other leading educational and scientific organizations. It is overseen by advisory boards that govern ethics, privacy and security, and the scientific agenda. You can learn more about the leadership and staff of The Human Project by visiting the “Our Team” page on this website.
The Human Project aims to use data as the foundation to solve our community’s problems. The study’s research platform will enable investigators to explore and better understand the root causes of illnesses and social challenges, so we can find solutions to them. Past studies of this nature led to major discoveries, such as the fact that obesity and smoking increase the risk of heart disease, while exercise improves heart health. The Human Project is different, however, because it will look at a much broader range of information, using supercomputers to discover hidden patterns and learn how our bodies, environment, and behavior interact to shape our lives. Putting those puzzle pieces together will create a fuller picture of human health, with the potential to yield major advances in medicine and public policy.
You can only join the study if you are invited to do so. The reason for this is simple: we are aiming to build a study population that “looks like” New York City, with participants representing the diverse communities that live here. To accomplish that, we are in the process of identifying 100 “micro-neighborhoods” across all five boroughs that together mirror the population of the city-at-large. We then randomly select addresses within those target areas, and invite people living in those households to be part of the study. You never know if you may be selected to make history as part of The Human Project!
We will enroll the first official participant in early 2018, and we hope to complete enrollment of all 10,000 participants over the next two to three years.
One way to get involved in the study is by joining The Sentinel Group. To join, you only need to be at least 13 years old, own a smartphone, and be a resident of New York City. Sentinel Group members play a key role in the study by beta-testing tools, completing surveys and games, and sharing feedback with NYU researchers.
To learn more, call (212)-998-3620.
Research and measurement
“Big Data” describes information that is enormous in volume and variety, making it difficult to classify into neat categories. So, yes, The Human Project will gather a wide array of different data, from social, to physical, to medical, and more. We will then use powerful supercomputers to make sense of it all, with the goal of solving some of the toughest health and public policy challenges we face today.
Everything! Okay, that’s impossible, but our goal is to gather as much information about our participants as possible — with a particular focus on information that they already share with companies like their cell phone providers or other commercial data companies. This includes financial, educational, criminal justice and health records as well as information about their daily habits such as what they eat, where they go, when they sleep, how they exercise, and their patterns of communication and interaction. The information will be collected via wearable activity trackers, smart phone apps, special scales, air quality sensors, surveys, games, and more. This information will allow scientists to get a full view of participants’ lives.
A critical element of The Human Project is integrating environmental data into the study. The Center for Urban Science and Progress (CUSP) at NYU holds unique access to key data through a partnership with the New York City government. Our goal is to capture relevant aspects of that dataflow, expand it to the national and international level, and embed it into the database in a way that makes examining the relationship between biology, behavior, and the environment simple to achieve. Once we build our research platform in New York City, we expect to be able to implement it in other cities that are working on urban data collections (e.g. Chicago and London), but which are not currently quite as far along in the process as New York.
Some data gaps are easier to catch than others. For example, we will monitor incoming data streams from participant mobile phones to identify any service disruptions. Our staff will follow up to determine the source of the disruption, then work with participants to resolve it whether it’s a dead phone battery, a lost device, or a vacation out of the country. Of course, there will also be missing data that will be more difficult to remediate. For example, a participant might have a second mobile phone for work and choose not to carry the project smartphone some of the time. In these cases, we may not be able to do anything about it.
As we design the study, we are mindful that nearly all the technology we use today will become obsolete. At the same time, it is likely to be replaced by improved tools that hold the potential to collect even better-quality data. Given this reality, we focus on what we plan to measure, rather than the tools or methods used to measure it. For example, when we talk about gathering location data today that means using technologies like GPS, wireless, or Bluetooth. In the future, we still expect to gather location data, but hope to use tools that are cheaper, simpler and ever more reliable. The key to combining these data sets over time will be to ensure that the accuracy and precision of all measurements are well documented. To ensure that our data collection systems remain state-of-the-art, we plan to replace our software and hardware at least once every three years. We are also working hand-in-hand with Data Cubed to develop the next generation of research tools, which will allow us to continuously anticipate and lead technological innovation rather than simply keeping up with an evolving marketplace.
Our study is being implemented under the oversight of an NYU-selected Institutional Review Board (IRB), an independent administrative group established to protect the rights and welfare of human subjects involved in any university research project. Under IRB regulations, children are a specially protected class, so they cannot be involved in certain data collection procedures. We will, of course, follow all of those regulations. In addition, it would not be developmentally appropriate for children under ten to carry cell phones, so certain forms of data collection that depend on smartphones, such as location tracking and communication partners, will not be possible with the youngest children in the study. We do plan to use wearable Bluetooth beacons to examine interactions between children and adults in the study, as well as gathering information about the home environment in which children reside and all of their education data. This will result in a very rich set of data for studying child developmental outcomes.
Households will be randomly selected using an address-based, multi-stage area probability sample design. Addresses will come from the US Postal Service Computerized Delivery Sequence File, in which there is an entry for every address in New York City that receives mail. Additional data sources will be used to supplement this file. Given that the study frame we use will select households using a specific statistical method, a key to mitigating bias risk will be ensuring that a large proportion of households invited into the study are ultimately willing to enroll.
We can only estimate the attrition rate for the study at this point, as no previous study has employed our planned technology to both engage and track participants. Large-scale studies such as the Health and Retirement Survey (HRS) and the Framingham Heart Study have relatively high loss of participants after the first interaction, but tend to be quite stable thereafter, as people begin to identify with the study and become personally invested in contributing to its mission. We expect a similar pattern of participation for The Human Project; however, in many longitudinal surveys, a main source of attrition is from people who move without providing a forwarding address. We anticipate that this will be less of a challenge for us, as we will be able to contact our participants via their cell phones if necessary.
Some people will inevitably relocate, but the outmigration rate in New York City is surprisingly low. Although thousands of people move out every year, they represent only 5 percent of the total city population. As a result, we expect that we would lose about that percentage of our study population to outmigration each year. Fortunately, these levels of outmigration are not necessarily detrimental to the survey, particularly as much of the data collection will be automated. We expect to remain able to follow those who move to nearby suburbs or who leave the area temporarily (such as elders traveling to warmer climates for the winter or students traveling to college), as we can maintain face-to-face contact with these participants without too much difficulty. Even those who permanently leave the New York City area can be retained to some degree, so long as they allow us to continue all passive and non-physical data collection. Only those who migrate out of the country will be completely lost from the study.
It depends on why they leave. People who resign from the study, leave the country, or pass away, will be replaced by new participants selected based on our dynamic model of New York City. We do expect to be able to retain people who move out of New York City, but remain in the New York metropolitan area or return periodically. For those who move permanently outside the New York area, but stay in the United States, we may also be able to continue to gather most kinds of data, though there may be some limits to the amount of face-to-face contact and technical support that will be economically feasible.
Yes. Initially, participants will receive several hundred dollars to compensate them for their time. If they stick with the study, they will periodically receive cash compensation for completing certain study activities. Finding appropriate levels of cash incentives can be difficult. For lower-income participants, it might compromise their eligibility for social programs; for our highest-income participants, the amount of cash that would be motivating is likely more than a reasonable budget can bear. To address that challenge, The Human Project is pairing more traditional incentives with innovative perks, such as the opportunity to earn points toward funding community improvement projects. Most importantly, participants will know that they are making a lasting difference that has the potential to benefit their community, the city of New York, and the wider world. They will be remembered for that contribution for many generations to come.
In general, participants joining to the study will commit to sharing most forms of data being collected. There are certain exceptions: participants can opt-out of providing a blood sample, certain financial data, and a few other kinds of data. Age also plays ,a role in determining what data we collect. For example, it would not be developmentally appropriate for children under 10 to carry a smartphone and children under 10 do not have credit scores. We also never attempt to gather criminal justice-related data about children. Still, we are encouraging participants to share as much data as possible, because gaps in our data set will make it difficult to conduct meaningful research. This is especially true in places where those gaps are likely to result in biases. That would make it difficult for us to find the kinds of solutions we hope to identify, and to ensure that those policy solutions benefit all communities.
Privacy and research access
It is essential that we protect participant data against third-party requests. Studies that collect sensitive data (for example, research on drug abuse or any health-related topic) typically use a Certificate of Confidentiality granted by the NIH. This Certificate allows researchers to refuse to disclose information in response to legal demands. The Human Project will apply for a Certificate of Confidentiality, and this will protect the data from most third-party requests. This means that our participants’ data will be safe with us unless a judge rules that one of the participants is an immediate threat to national security. If that happens, then we are required to share that participant’s data with the FBI. Even in that case, most of the data collected by the Project is directly available from the original data source. For example, cell phone location data is easily subpoenaed from wireless providers, who are not protected by research regulations. Such requests are made and fulfilled on a regular basis, so there is little incentive for lawyers to engage in a long process of obtaining records from us given those circumstances.
The Human Project takes participant privacy so seriously that we have a council of experts dedicated specifically to this aspect of the study. Much of the founding investment in the project is focused on data security, which will exceed protections used at banks and other high-security institutions. Our data vault won’t communicate with the Internet, and all information will be anonymized and encrypted, so participants cannot be identified. Only researchers whose proposals meet our scientific merit and ethics standards can work with data from our system, and even they cannot see the identity of our participants. Finally, nobody can take or keep any data, ever — the only thing researchers get to keep are the findings from their studies.
Approved, supervised researchers must come on site to study the data at The Human Project’s secure lab. They will conduct their work at the facility, and will be able to leave with their analysis, but not the de-identified data. Participant data is private, and we will never sell or distribute it. All research requests will be evaluated by advisory councils composed of university scholars and experts in issues such as ethics, data security, human study population issues, and more. Their goal will be to ensure that the data is used in accordance with The Human Project’s mission and core values, to develop insights that help to improve health outcomes and advance public policies that make New York City a healthier, safer, happier place to live.
It is essential that we protect participant data against third-party requests. Studies that collect sensitive data (for example, research on drug abuse or any health-related topic) typically use a Certificate of Confidentiality granted by the NIH. This Certificate allows researchers to refuse to disclose information in response to legal demands. The Human Project will apply for a Certificate of Confidentiality, and this will protect the data from most third-party requests. This means that our participants’ data will be safe with us unless a judge rules that one of the participants is an immediate threat to national security. If that happens, we are required to share that participant’s data with the FBI. But even then, most of the data collected by the Project is directly available from the original data source. For example, cell phone location data is easily subpoenaed from wireless providers, who are not protected by research regulations. Such requests are made and fulfilled on a regular basis, so there is little incentive for lawyers to engage in a long process of obtaining records from us given those circumstances.
Will you build in safeguards if crises are detected (e.g., suicidal indicators or tumor detected in brain scan)?
It will be standard procedure to provide some sort of intervention for any life-endangering finding. For example, if the medical personnel performing the physical exam measure an unsafe blood-pressure level, they will call an ambulance or provide a referral to a doctor, depending on the severity of the finding. For other kinds of data, identifying circumstances that require intervention will be complicated because the data will not be examined in real time. No one watches the data from our participants as it comes in because that information is private.
Numerous features have been put in place to allow researchers to access data without compromising the security and anonymity of our participants. Data will go through a staging process as it is ingested into the facility for storage. Then, when researchers want to run queries or perform analyses, temporary specialized “data marts” will be created containing only the specific data the researcher needs. The data marts are handled with the same degree of security as the actual database, and researchers can only work with the data inside our secure facility.
We employ dozens of techniques to protect the data from our participants from hackers. The data center is located behind multiple secure firewalls, we monitor every scrap of information that moves in the facility, all people who work with the data have regular criminal background checks, and even fingerprints are checked every time someone enters the data facility. Video cameras monitor the facility 24 hours a day and to enter the core of our facility each worker actually has to be finger printed twice. We also test our facility regularly by hiring “red teams” to simulate hackers trying to break into the facility. Finally, the data center has been physically designed so that there really is no way out for data, only a way in. We never allow USB sticks or removable disks and there is no internet connection going out from the facility.
We take the consent process very seriously, and have designed the process to ensure that consent is truly informed. First, a video will describe the basic issues for a given topic, including possible risks. The video will be divided into segments, with comprehension quizzes at key intervals. The videos are simple and engaging, covering all the essential elements of a traditional legal consent form in terms that could be understood by anyone with sixth-grade comprehension skills. Separate adaptations are tailored to children (7-12) and teenagers (13-17). After the video, participants will be invited to provide oral consent. If they do so, the final step will be the presentation of a traditional legal form for participants to sign.
For reasons of security and quality control, data will not be accessible in real time. The data will be automatically integrated into our research platform as it is ingested, de-identified, and encrypted. Researchers can run queries or perform analyses shortly after new information is integrated.