Telehealth has become a critical way for doctors to still provide health care while minimizing in-person contact during COVID-19. But with phone or Zoom appointments, it's harder for doctors to get important vital signs from a patient, such as their pulse or respiration rate, in real time.
A University of Washington-led team has developed a method that uses the camera on a person's smartphone or computer to take their pulse and respiration signal from a real-time video of their face. The researchers presented this state-of-the-art system in December at the Neural Information Processing Systems conference.
Now the team is proposing a better system to measure these physiological signals. This system is less likely to be tripped up by different cameras, lighting conditions or facial features, such as skin color. The researchers will present these findings April 8 at the ACM Conference on Health, Interference, and Learning.
"Machine learning is pretty good at classifying images. If you give it a series of photos of cats and then tell it to find cats in other images, it can do it. But for machine learning to be helpful in remote health sensing, we need a system that can identify the region of interest in a video that holds the strongest source of physiological information -- pulse, for example -- and then measure that over time," said lead author Xin Liu, a UW doctoral student in the Paul G. Allen School of Computer Science & Engineering.
"Every person is different," Liu said. "So this system needs to be able to quickly adapt to each person's unique physiological signature, and separate this from other variations, such as what they look like and what environment they are in."
The team's system is privacy preserving -- it runs on the device instead of in the cloud -- and uses machine learning to capture subtle changes in how light reflects off a person's face, which is correlated with changing blood flow. Then it converts these changes into both pulse and respiration rate.
The first version of this system was trained with a dataset that contained both videos of people's faces and "ground truth" information: each person's pulse and respiration rate measured by standard instruments in the field. The system then used spatial and temporal information from the videos to calculate both vital signs. It outperformed similar machine learning systems on videos where subjects were moving and talking.
But while the system worked well on some datasets, it still struggled with others that contained different people, backgrounds and lighting. This is a common problem known as "overfitting," the team said.
The researchers improved the system by having it produce a personalized machine learning model for each individual. Specifically, it helps look for important areas in a video frame that likely contain physiological features correlated with changing blood flow in a face under different contexts, such as different skin tones, lighting conditions and environments. From there, it can focus on that area and measure the pulse and respiration rate.
While this new system outperforms its predecessor when given more challenging datasets, especially for people with darker skin tones, there's still more work to do, the team said.
"We acknowledge that there is still a trend toward inferior performance when the subject's skin type is darker," Liu said. "This is in part because light reflects differently off of darker skin, resulting in a weaker signal for the camera to pick up. Our team is actively developing new methods to solve this limitation."
The researchers are also working on a variety of collaborations with doctors to see how this system performs in the clinic.
"Any ability to sense pulse or respiration rate remotely provides new opportunities for remote patient care and telemedicine. This could include self-care, follow-up care or triage, especially when someone doesn't have convenient access to a clinic," said senior author Shwetak Patel, a professor in both the Allen School and the electrical and computer engineering department. "It's exciting to see academic communities working on new algorithmic approaches to address this with devices that people have in their homes."
Ziheng Jiang, a doctoral student in the Allen School; Josh Fromm, a UW graduate who now works at OctoML; Xuhai Xu, a doctoral student in the Information School; and Daniel McDuff at Microsoft Research are also co-authors on this paper. This research was funded by the Bill & Melinda Gates Foundation, Google and the University of Washington.