Towards using social media to identify individuals at risk for preventable chronic illness

Dane Bell, Daniel Fried, Luwen Huangfu, Mihai Surdeanu, Stephen G Kobourov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We describe a strategy for the acquisition of training data necessary to build a social-media-driven early detection system for individuals at risk for (preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter. The questions are designed to inspire participant engagement and collect relevant data to train a public-health model applied to individuals. Prior systems designed to use social media such as Twitter to predict obesity (a risk factor for T2DM) operate on entire communities such as states, counties, or cities, based on statistics gathered by government agencies. Because there is considerable variation among individuals within these groups, training data on the individual level would be more effective, but this data is difficult to acquire. The approach proposed here aims to address this issue. Our strategy has two steps. First, we trained a random forest classifier on data gathered from (public) Twitter statuses and state-level statistics with state-of-the-art accuracy. We then converted this classifier into a 20-questions-style quiz and made it available online. In doing so, we achieved high engagement with individuals that took the quiz, while also building a training set of voluntarily supplied individual-level data for future classification.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
PublisherEuropean Language Resources Association (ELRA)
Pages2957-2964
Number of pages8
ISBN (Electronic)9782951740891
StatePublished - Jan 1 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: May 23 2016May 28 2016

Other

Other10th International Conference on Language Resources and Evaluation, LREC 2016
CountrySlovenia
CityPortoroz
Period5/23/165/28/16

Fingerprint

social media
chronic illness
twitter
quiz
statistics
government agency
Chronic Illness
Social Media
public health
Diabetes Mellitus
Classifier
Statistics
Type 2 Diabetes
community
Group

Keywords

  • Machine learning
  • Obesity detection
  • Social media

ASJC Scopus subject areas

  • Linguistics and Language
  • Library and Information Sciences
  • Language and Linguistics
  • Education

Cite this

Bell, D., Fried, D., Huangfu, L., Surdeanu, M., & Kobourov, S. G. (2016). Towards using social media to identify individuals at risk for preventable chronic illness. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016 (pp. 2957-2964). European Language Resources Association (ELRA).

Towards using social media to identify individuals at risk for preventable chronic illness. / Bell, Dane; Fried, Daniel; Huangfu, Luwen; Surdeanu, Mihai; Kobourov, Stephen G.

Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. p. 2957-2964.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bell, D, Fried, D, Huangfu, L, Surdeanu, M & Kobourov, SG 2016, Towards using social media to identify individuals at risk for preventable chronic illness. in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), pp. 2957-2964, 10th International Conference on Language Resources and Evaluation, LREC 2016, Portoroz, Slovenia, 5/23/16.
Bell D, Fried D, Huangfu L, Surdeanu M, Kobourov SG. Towards using social media to identify individuals at risk for preventable chronic illness. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA). 2016. p. 2957-2964
Bell, Dane ; Fried, Daniel ; Huangfu, Luwen ; Surdeanu, Mihai ; Kobourov, Stephen G. / Towards using social media to identify individuals at risk for preventable chronic illness. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016. European Language Resources Association (ELRA), 2016. pp. 2957-2964
@inproceedings{4843c65dc1524cfc81dd274073c932c8,
title = "Towards using social media to identify individuals at risk for preventable chronic illness",
abstract = "We describe a strategy for the acquisition of training data necessary to build a social-media-driven early detection system for individuals at risk for (preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter. The questions are designed to inspire participant engagement and collect relevant data to train a public-health model applied to individuals. Prior systems designed to use social media such as Twitter to predict obesity (a risk factor for T2DM) operate on entire communities such as states, counties, or cities, based on statistics gathered by government agencies. Because there is considerable variation among individuals within these groups, training data on the individual level would be more effective, but this data is difficult to acquire. The approach proposed here aims to address this issue. Our strategy has two steps. First, we trained a random forest classifier on data gathered from (public) Twitter statuses and state-level statistics with state-of-the-art accuracy. We then converted this classifier into a 20-questions-style quiz and made it available online. In doing so, we achieved high engagement with individuals that took the quiz, while also building a training set of voluntarily supplied individual-level data for future classification.",
keywords = "Machine learning, Obesity detection, Social media",
author = "Dane Bell and Daniel Fried and Luwen Huangfu and Mihai Surdeanu and Kobourov, {Stephen G}",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
pages = "2957--2964",
booktitle = "Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Towards using social media to identify individuals at risk for preventable chronic illness

AU - Bell, Dane

AU - Fried, Daniel

AU - Huangfu, Luwen

AU - Surdeanu, Mihai

AU - Kobourov, Stephen G

PY - 2016/1/1

Y1 - 2016/1/1

N2 - We describe a strategy for the acquisition of training data necessary to build a social-media-driven early detection system for individuals at risk for (preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter. The questions are designed to inspire participant engagement and collect relevant data to train a public-health model applied to individuals. Prior systems designed to use social media such as Twitter to predict obesity (a risk factor for T2DM) operate on entire communities such as states, counties, or cities, based on statistics gathered by government agencies. Because there is considerable variation among individuals within these groups, training data on the individual level would be more effective, but this data is difficult to acquire. The approach proposed here aims to address this issue. Our strategy has two steps. First, we trained a random forest classifier on data gathered from (public) Twitter statuses and state-level statistics with state-of-the-art accuracy. We then converted this classifier into a 20-questions-style quiz and made it available online. In doing so, we achieved high engagement with individuals that took the quiz, while also building a training set of voluntarily supplied individual-level data for future classification.

AB - We describe a strategy for the acquisition of training data necessary to build a social-media-driven early detection system for individuals at risk for (preventable) type 2 diabetes mellitus (T2DM). The strategy uses a game-like quiz with data and questions acquired semi-automatically from Twitter. The questions are designed to inspire participant engagement and collect relevant data to train a public-health model applied to individuals. Prior systems designed to use social media such as Twitter to predict obesity (a risk factor for T2DM) operate on entire communities such as states, counties, or cities, based on statistics gathered by government agencies. Because there is considerable variation among individuals within these groups, training data on the individual level would be more effective, but this data is difficult to acquire. The approach proposed here aims to address this issue. Our strategy has two steps. First, we trained a random forest classifier on data gathered from (public) Twitter statuses and state-level statistics with state-of-the-art accuracy. We then converted this classifier into a 20-questions-style quiz and made it available online. In doing so, we achieved high engagement with individuals that took the quiz, while also building a training set of voluntarily supplied individual-level data for future classification.

KW - Machine learning

KW - Obesity detection

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=84992629161&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992629161&partnerID=8YFLogxK

M3 - Conference contribution

SP - 2957

EP - 2964

BT - Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016

PB - European Language Resources Association (ELRA)

ER -