In 2012, a group of UCLA researchers set out to mine thousands of electronic health records for a more accurate and less expensive way to identify people who have undiagnosed Type 2 diabetes. The researchers got much more than they bargained for.
Not only did they develop a screening algorithm with the potential to vastly increase the number of correct diagnoses of the disease by refining the pool of candidates who are put forward for screening; they also uncovered several previously unknown risk factors for diabetes, including a history of sexual and gender identity disorders, intestinal infections and a category of illnesses that includes such sexually transmitted diseases as chlamydia.
The findings appear February 16 in the Journal of Biomedical Informatics.
"With widespread implementation, these discoveries have the potential to dramatically decrease the number of undetected cases of Type 2 diabetes, prevent complications from the disease and save lives," said Ariana Anderson, the study's lead author and an assistant research professor and statistician at UCLA's Semel Institute for Neuroscience and Human Behavior.
Anderson and Mark Cohen, a Semel Institute professor in residence, led a team that examined electronic records for 9,948 people from hospitals, clinics and doctor's offices in all 50 states. Although the patients themselves were not identifiable, the records included their vital signs, prescription medications and reported ailments, categorized according to the International Classification of Diseases diagnostic codes.
The researchers used half of the records to refine an algorithm that allowed them to predict the likelihood of an individual having diabetes, and then tested this pre-screening tool on the other half. They found that having any diagnosis of sexual and gender identity disorders increased the risk for Type 2 diabetes by roughly 130 percent -- about the same as high blood pressure, which is a leading risk factor.
Other health conditions were shown to be nearly as important risk factors for the disease. Among them were a history of viral infections and chlamydia (which increase people's risk for diabetes by 82 percent) and a history of intestinal infections such as colitis, enteritis and gastroenteritis (88 percent increase). In fact, those predictors were nearly as strong as having a high body mass index (101 percent increase).
Herpes zoster had previously been shown to have a link to diabetes, and the project confirmed that connection (finding that it increases the risk by about 90 percent) -- along with some other lesser-known risk factors. Chicken pox, shingles and a range of other viral infections (which are grouped together under one ICD diagnostic code) increased the risk for Type 2 diabetes as much as high cholesterol, the team found.
Researchers also determined certain factors that appear to be related to a lower risk for diabetes. Being prone to migraines, for instance, reduced an individual's risk for the disease by the same amount as being 29 years younger. And people taking anti-anxiety and anti-seizure medications such as clonazepam and diazepam had a significantly lower risk.
"The overall message is that ordinary record keeping that doctors do is a very, very rich source of information," said Cohen. "If you use a computerized approach to studying patterns in that data, you can greatly improve diagnosis and medical care."
The researchers are affiliated with a laboratory run by Cohen that uses mathematical modeling to analyze large quantities of brain images. The team has applied similar techniques to predict diseases, including epilepsy and irritable bowel syndrome. In this case, they targeted Type 2 diabetes because so many Americans with diabetes have yet to be diagnosed.
Additional research will be required to determine the medical reasons that certain factors correlate with greater or lesser risk. And because the analysis was based largely on diagnostic codes, rather than actual individual diagnoses, the findings are not fine-grained enough to tell precisely which conditions are linked to diabetes.
For instance, the ICD code for sexual and gender identity disorders includes a wide range of conditions ranging from transexualism to exhibitionism, and the researchers do not know which one or ones are most important for a diabetes diagnosis. Similarly, the code for viral and chlamydial infections encompasses a wide range of conditions, including the human papillomavirus, chlamydia and coxsackie virus, which causes conjunctivitis and hand, foot and mouth disease.
Traditionally, medical providers have determined whom to screen for the disease based on a limited range of factors, including blood pressure, BMI, age, gender and whether or not they smoke. But the pre-screening tool based on the entirety of a patient's electronic health record proved 2.5 percent better at identifying people with diabetes than the standard approach, and 14 percent better at identifying those who do not have it. The researchers calculated that if the new method were used nationally, it would identify 400,000 people who have not yet been diagnosed with the disease.
"Given that 1 in 4 people with diabetes don't know they have the disease," Anderson said, "it's very important to be able to say, 'This person has all these other diagnoses, so we're a little bit more confident that she is likely to have diabetes. We need to be sure to give her the formal laboratory test, even if she's asymptomatic.'"
Mining big data for ways to improve medical care emerged as a national trend following the 2009 economic stimulus package, which included incentives for digitizing medical records. Advocates argue that using computers to uncover unexpected patterns in vast amounts of data -- or machine learning -- has the power to revolutionize medicine.
Left untreated, diabetes can cause blindness or lead to problems with the feet and legs that necessitate amputation. Although current approaches to screening for the disease are generally accurate, they can be costly and onerous because they involve blood draws and fasting for lengthy periods of time.
"There's so much more information available in the medical record that could be used to determine whether a patient needs to be screened, and this information isn't currently being used," said Cohen, who also is the director of UCLA's Laboratory of Integrative Neuroimaging Technology. "This is a treasure trove of information that has not begun to be exploited to the full extent possible."
The research was supported by the Burroughs Wellcome Fund.