Respiratory syncytial virus tracking using internet search engine data

Eyal - Oren, Justin Frere, Eran Yom-Tov, Elad Yom-Tov

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Respiratory Syncytial Virus (RSV) is the leading cause of hospitalization in children less than 1 year of age in the United States. Internet search engine queries may provide high resolution temporal and spatial data to estimate and predict disease activity. Methods: After filtering an initial list of 613 symptoms using high-resolution Bing search logs, we used Google Trends data between 2004 and 2016 for a smaller list of 50 terms to build predictive models of RSV incidence for five states where long-term surveillance data was available. We then used domain adaptation to model RSV incidence for the 45 remaining US states. Results: Surveillance data sources (hospitalization and laboratory reports) were highly correlated, as were laboratory reports with search engine data. The four terms which were most often statistically significantly correlated as time series with the surveillance data in the five state models were RSV, flu, pneumonia, and bronchiolitis. Using our models, we tracked the spread of RSV by observing the time of peak use of the search term in different states. In general, the RSV peak moved from south-east (Florida) to the north-west US. Conclusions: Our study represents the first time that RSV has been tracked using Internet data results and highlights successful use of search filters and domain adaptation techniques, using data at multiple resolutions. Our approach may assist in identifying spread of both local and more widespread RSV transmission and may be applicable to other seasonal conditions where comprehensive epidemiological data is difficult to collect or obtain.

Original languageEnglish (US)
Article number445
JournalBMC Public Health
Volume18
Issue number1
DOIs
StatePublished - Apr 3 2018

Fingerprint

Search Engine
Respiratory Syncytial Viruses
Internet
Hospitalization
Bronchiolitis
Information Storage and Retrieval
Incidence
Pneumonia

Keywords

  • Domain adaptation
  • Google trends
  • Internet data
  • RSV

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health

Cite this

Respiratory syncytial virus tracking using internet search engine data. / Oren, Eyal -; Frere, Justin; Yom-Tov, Eran; Yom-Tov, Elad.

In: BMC Public Health, Vol. 18, No. 1, 445, 03.04.2018.

Research output: Contribution to journalArticle

Oren, Eyal - ; Frere, Justin ; Yom-Tov, Eran ; Yom-Tov, Elad. / Respiratory syncytial virus tracking using internet search engine data. In: BMC Public Health. 2018 ; Vol. 18, No. 1.
@article{9870fe23fae24abf92f381aa9f0fab58,
title = "Respiratory syncytial virus tracking using internet search engine data",
abstract = "Background: Respiratory Syncytial Virus (RSV) is the leading cause of hospitalization in children less than 1 year of age in the United States. Internet search engine queries may provide high resolution temporal and spatial data to estimate and predict disease activity. Methods: After filtering an initial list of 613 symptoms using high-resolution Bing search logs, we used Google Trends data between 2004 and 2016 for a smaller list of 50 terms to build predictive models of RSV incidence for five states where long-term surveillance data was available. We then used domain adaptation to model RSV incidence for the 45 remaining US states. Results: Surveillance data sources (hospitalization and laboratory reports) were highly correlated, as were laboratory reports with search engine data. The four terms which were most often statistically significantly correlated as time series with the surveillance data in the five state models were RSV, flu, pneumonia, and bronchiolitis. Using our models, we tracked the spread of RSV by observing the time of peak use of the search term in different states. In general, the RSV peak moved from south-east (Florida) to the north-west US. Conclusions: Our study represents the first time that RSV has been tracked using Internet data results and highlights successful use of search filters and domain adaptation techniques, using data at multiple resolutions. Our approach may assist in identifying spread of both local and more widespread RSV transmission and may be applicable to other seasonal conditions where comprehensive epidemiological data is difficult to collect or obtain.",
keywords = "Domain adaptation, Google trends, Internet data, RSV",
author = "Oren, {Eyal -} and Justin Frere and Eran Yom-Tov and Elad Yom-Tov",
year = "2018",
month = "4",
day = "3",
doi = "10.1186/s12889-018-5367-z",
language = "English (US)",
volume = "18",
journal = "BMC Public Health",
issn = "1471-2458",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Respiratory syncytial virus tracking using internet search engine data

AU - Oren, Eyal -

AU - Frere, Justin

AU - Yom-Tov, Eran

AU - Yom-Tov, Elad

PY - 2018/4/3

Y1 - 2018/4/3

N2 - Background: Respiratory Syncytial Virus (RSV) is the leading cause of hospitalization in children less than 1 year of age in the United States. Internet search engine queries may provide high resolution temporal and spatial data to estimate and predict disease activity. Methods: After filtering an initial list of 613 symptoms using high-resolution Bing search logs, we used Google Trends data between 2004 and 2016 for a smaller list of 50 terms to build predictive models of RSV incidence for five states where long-term surveillance data was available. We then used domain adaptation to model RSV incidence for the 45 remaining US states. Results: Surveillance data sources (hospitalization and laboratory reports) were highly correlated, as were laboratory reports with search engine data. The four terms which were most often statistically significantly correlated as time series with the surveillance data in the five state models were RSV, flu, pneumonia, and bronchiolitis. Using our models, we tracked the spread of RSV by observing the time of peak use of the search term in different states. In general, the RSV peak moved from south-east (Florida) to the north-west US. Conclusions: Our study represents the first time that RSV has been tracked using Internet data results and highlights successful use of search filters and domain adaptation techniques, using data at multiple resolutions. Our approach may assist in identifying spread of both local and more widespread RSV transmission and may be applicable to other seasonal conditions where comprehensive epidemiological data is difficult to collect or obtain.

AB - Background: Respiratory Syncytial Virus (RSV) is the leading cause of hospitalization in children less than 1 year of age in the United States. Internet search engine queries may provide high resolution temporal and spatial data to estimate and predict disease activity. Methods: After filtering an initial list of 613 symptoms using high-resolution Bing search logs, we used Google Trends data between 2004 and 2016 for a smaller list of 50 terms to build predictive models of RSV incidence for five states where long-term surveillance data was available. We then used domain adaptation to model RSV incidence for the 45 remaining US states. Results: Surveillance data sources (hospitalization and laboratory reports) were highly correlated, as were laboratory reports with search engine data. The four terms which were most often statistically significantly correlated as time series with the surveillance data in the five state models were RSV, flu, pneumonia, and bronchiolitis. Using our models, we tracked the spread of RSV by observing the time of peak use of the search term in different states. In general, the RSV peak moved from south-east (Florida) to the north-west US. Conclusions: Our study represents the first time that RSV has been tracked using Internet data results and highlights successful use of search filters and domain adaptation techniques, using data at multiple resolutions. Our approach may assist in identifying spread of both local and more widespread RSV transmission and may be applicable to other seasonal conditions where comprehensive epidemiological data is difficult to collect or obtain.

KW - Domain adaptation

KW - Google trends

KW - Internet data

KW - RSV

UR - http://www.scopus.com/inward/record.url?scp=85044729674&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044729674&partnerID=8YFLogxK

U2 - 10.1186/s12889-018-5367-z

DO - 10.1186/s12889-018-5367-z

M3 - Article

VL - 18

JO - BMC Public Health

JF - BMC Public Health

SN - 1471-2458

IS - 1

M1 - 445

ER -