Time-series modeling of COVID-19 cases in the United States with google search trends
- Biometrics & Biostatistics International Journal
-
Mohamed S. Mohamed, Leah Vaidya, Masuma Mannan, Evrim Oral
Abstract
Syndromic surveillance offers a rapid, low-cost approach to monitoring emerging health threats, complementing traditional case-based systems. This study investigates the utility of Google Trends data as a proxy for incident COVID-19 cases in the United States between March 2020 and April 2023. Weekly search interest for terms including covid, COVID-19, fever, mask, flu, and COVID-19 vaccine was analyzed alongside reported cases. Timeseries modeling compared vector autoregressive (VAR), transfer function (TFM), and web-search-only (WSO) approaches. VAR models produced the most accurate forecasts of weekly cases and epidemic peaks, while TFM showed moderate accuracy, and WSO models—although overestimating magnitudes— were useful in identifying epidemic onset, peak timing, and decline. Our findings highlight the promise of integrating web-based search data into surveillance frameworks, especially in settings with limited diagnostic or reporting capacity, while also underscoring limitations such as news bias, confounding from overlapping symptoms, and the need for early calibration in novel outbreaks.
Keywords
.


