A study finds that a representative sample of people given wearable data collection devices provides more equitable and accurate health data than larger convenience samples of people who already own wearable devices. Leveraging the smartwatches and other data-logging wearables that people already have is a tempting way to gather data, but such groups overrepresent the wealthy, urban, White, and fit people who tend to buy these products. Ritika Chaturvedi and colleagues recruited 1,038 participants for American Life in Realtime (ALiR), a longitudinal health study that provided Fitbits and tablets to participants from the Understanding America Study, a probability sample of adults. Unlike typical convenience samples that underrepresent minorities, older adults, and lower-income groups, ALiR achieved broad demographic representation across race, education, and income levels. The authors compared COVID-19 detection models trained on ALiR data versus NIH's All of Us program, comprising 14,133 participants who already owned a wearable device. ALiR's model performed consistently across demographic subgroups, while the All of Us model showed 22–40% worse performance in older women and non-white populations. According to the authors, probability sampling and providing devices to participants removes participation barriers and creates a better data source, which could enable development of AI health tools that work equally well across all populations.
Journal
PNAS Nexus
Article Title
American Life in Realtime: Benchmark, publicly available person-generated health data for equity in precision health
Article Publication Date
7-Oct-2025
COI Statement
M.J., A.B., E.D., and A.M. are affiliated with Evidation Health, Inc. Their contributions were made as part of the NLM-funded grant supporting this study. All other authors declare no competing interests.