Being stuck in bed, waiting for the flu to run its course, is pretty unpleasant. And it’s also really boring. What else is there to do but search for symptoms online, and read entries about the flu on Wikipedia or WebMD or post messages on Facebook and Twitter about how sick you are?
A lot of people get the flu every year and many of them do exactly that: they search for relevant information, and share their misery with the rest of us. The consequence is remarkable: a description of their symptoms, time-stamped and perhaps even geo-tagged, is online. Which means that the internet has a rather detailed picture of the health of the population, coming from digital sources, through all of our connected devices, including smartphones.
Get daily business news.
The latest stories, funding information, and expert advice. Free to sign up.
This is digital epidemiology: the idea that the health of a population can be assessed through digital traces, in real time.
It has the potential to be a powerful boon for traditional epidemiology. Researchers have already started to develop methods and strategies for using digital epidemiology to support infectious disease monitoring and surveillance or understand attitudes and concerns about infectious diseases. But much more needs to be done to integrate digital epidemiology with existing practices, and to address ethical concerns about privacy. By 2020, there will be 6.1 billion smartphone users, so it is high time to get serious about digital epidemiology.
Digital epidemiology goes mainstream: Google flu trends
Google Flu Trends was one of the first popular examples of digital epidemiology. Launched in 2008 to help predict flu epidemics, it was based on a very simple idea: when people come down with the flu, they will often turn to the internet and search for information about their symptoms.
In 2009, researchers from Google and the US Centers for Disease Control and Prevention (CDC) published a paper with the apt title “Detecting influenza epidemics using search engine query data,” outlining a method for using search queries to recognize flu outbreaks.
For many years, Google Flu Trends has served as a prime example of digital epidemiology. It embodies both the opportunities and the challenges the field faces. While it has undoubtedly popularized the idea of using digital data to derive epidemiological insights, Google Flu Trends has also demonstrated that this is no easy task.
For starters, its estimates were not always very accurate. Indeed, during the 2012-2013 flu season in the northern hemisphere, it overestimated the flu prevalence by up to 100% (relative to CDC numbers). And the estimates cannot be reproduced easily – Google controls access to Google data, of course.
For this reason alone, many researchers have in the past few years turned to alternative data sources. Twitter has been a particularly popular source, because tweets are public by default, and because Twitter data can be accessed by anyone.
Twitter and Wikipedia are becoming data sources for digital epidemiology
For instance, a study from 2011 used data from Twitter to measure public interest and concern about the influenza H1N1 virus and to track disease activity. Another study from 2014 showed that incorporating data from Twitter into CDC influenza-like illness models can reduce forecasting errors. Twitter has also been used to assess health sentiments such as those about vaccination, and to monitor drug safety.
And Wikipedia access logs – open accessible data about how often certain Wikipedia pages were accessed on the web – have recently provided a rich data source for disease monitoring and forecasting. Research suggests that examining Wikipedia access logs could support traditional disease surveillance for influenza.
The doctor is in your pocket: epidemiology goes mobile
But it’s not just publicly accessible data from Twitter and Wikipedia that have been harnessed for epidemiology. Anonymized mobile phone data have provided unparalleled insights into how the movement of people affects disease dynamics.
For example, cell phone data have been used to measure how human travel patterns spread malaria and to rapidly estimate population movements during disasters and outbreaks, such as the earthquake and subsequent cholera outbreak in Haiti in 2010.
Apps that allow the self-diagnosing of diseases are not too far away. With the help of a small attachment, a smartphone can already be turned into a mobile clinic able to diagnose multiple infectious diseases in minutes.
Traditional + digital = a better picture
Public health is traditionally based on data collected from health-care providers, who collect data from sick patients. This produces a very patchy picture. It only includes those populations who have access to health care or who decide to go to the doctor in the first place. And it mostly includes information about reportable diseases, missing out on a huge array of other illnesses. Last but not least, it largely misses out on information about health behaviors, sentiments and opinions.
Digital epidemiology can add more information to that picture and fill in some of the blanks. Of course, digital epidemiology won’t capture the entire population. But, neither do traditional ways of gathering epidemiological data. With the vast majority of the world getting online, populations who slipped under the radar of public health will become more visible, which is crucial in a world where diseases anywhere today are diseases everywhere tomorrow. And it will also enable us to fulfill the mantra of “early detection, early response” by building digital warning systems designed to stop pandemics in their tracks.
Don’t forget privacy and surveillance
Digital epidemiology faces ethical challenges about surveillance and privacy as well. Ill health is stigmatized – socially and economically – in all societies. And people are more and more concerned about surveillance and information privacy. As digital epidemiology grows, we need to keep these ethical considerations at the forefront.