Feature importance and predictive modeling for Multisource healthcare data with missing values

Karthik Srinivasan, Faiz Currim, Sudha Ram, Colin Foe-Parker, Nicole Goebel, Reuben Herzl, Casey Lindberg, Esther Sternberg, Perry Skeath, Matthias R. Mehl, Bijan Najafi, Javad Razjouyan, Hyo Ki Lee, Brian Gilligan, Judith Heerwagen, Kevin Kampschroer, Kelli Canada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

With rapid development of sensor technologies and the internet of things, research in the area of connected health is increasing in importance and complexity with wide-reaching impacts for public health. As data sources such as mobile (wearable) sensors get cheaper, smaller, and smarter, important research questions can be answered by combining information from multiple data sources. However, integration of multiple heterogeneous data streams often results in a dataset with several empty cells or missing values. The challenge is to use such sparsely populated integrated datasets without compromising model performance. Naïve approaches for dataset modification such as discarding observations or ad-hoc replacement of missing values often lead to misleading results. In this paper, we discuss and evaluate current best-practices for modeling such data with missing values and then propose an ensemble-learning based sparse-data modeling framework. We develop a predictive model using this framework and compare it with existing models using a study in a healthcare setting. Instead of generating a single score on variable/feature importance, our framework enables the user to understand the importance of a variable based on the existing data values and their localized impact on the outcome.

Original languageEnglish (US)
Title of host publicationDH 2016 - Proceedings of the 2016 Digital Health Conference
PublisherAssociation for Computing Machinery, Inc
Pages47-54
Number of pages8
ISBN (Electronic)9781450342247
DOIs
StatePublished - Apr 11 2016
Event6th International Conference on Digital Health, DH 2016 - Montreal, Canada
Duration: Apr 11 2016Apr 13 2016

Publication series

NameDH 2016 - Proceedings of the 2016 Digital Health Conference

Other

Other6th International Conference on Digital Health, DH 2016
CountryCanada
CityMontreal
Period4/11/164/13/16

Keywords

  • Data science with missing data
  • Mobile-sensors
  • Multi-source data
  • Well-being analysis

ASJC Scopus subject areas

  • Health Information Management
  • Computer Science Applications
  • Health Informatics

Fingerprint Dive into the research topics of 'Feature importance and predictive modeling for Multisource healthcare data with missing values'. Together they form a unique fingerprint.

  • Cite this

    Srinivasan, K., Currim, F., Ram, S., Foe-Parker, C., Goebel, N., Herzl, R., Lindberg, C., Sternberg, E., Skeath, P., Mehl, M. R., Najafi, B., Razjouyan, J., Lee, H. K., Gilligan, B., Heerwagen, J., Kampschroer, K., & Canada, K. (2016). Feature importance and predictive modeling for Multisource healthcare data with missing values. In DH 2016 - Proceedings of the 2016 Digital Health Conference (pp. 47-54). (DH 2016 - Proceedings of the 2016 Digital Health Conference). Association for Computing Machinery, Inc. https://doi.org/10.1145/2896338.2896347