Large-scale analysis of free-text data for mental health surveillance with topic modelling

Yang Gu, Gondy Leroy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Autism spectrum disorder (ASD) affects 1 in 59 children in the US and costs the US economy $66 billion annually. The Center for Disease Control and Prevention (CDC) has collected a large set of EHR as part of surveillance in the US. In Arizona, the dataset contains 4480 EHR with 10 million free text tokens over ten years. It contains detailed descriptions of children with ASD-like behaviors. While the knowledge about and the diagnostic criteria of ASD have evolved, the data collected from earlier years have not been re-evaluated. To more efficiently leverage this data and uncover causes for the increase in ASD prevalence observed in epidemiological surveillance, we use Latent Dirichlet Allocation (LDA) to analyze the content of the text data automatically. Preliminary results suggest LDA can model topics in EHR content and show variations in content that are consistent with changes in the data collection effort.

Original languageEnglish (US)
Title of host publication26th Americas Conference on Information Systems, AMCIS 2020
PublisherAssociation for Information Systems
ISBN (Electronic)9781733632546
StatePublished - 2020
Event26th Americas Conference on Information Systems, AMCIS 2020 - Salt Lake City, Virtual, United States
Duration: Aug 10 2020Aug 14 2020

Publication series

Name26th Americas Conference on Information Systems, AMCIS 2020

Conference

Conference26th Americas Conference on Information Systems, AMCIS 2020
Country/TerritoryUnited States
CitySalt Lake City, Virtual
Period8/10/208/14/20

Keywords

  • ASD
  • Autism
  • Healthcare analytics
  • LDA
  • NLP
  • Natural language processing
  • Topic modelling

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Computer Networks and Communications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Large-scale analysis of free-text data for mental health surveillance with topic modelling'. Together they form a unique fingerprint.

Cite this