Information theoretic feature selection for high dimensional metagenomic data

Gregory Ditzler, Gail Rosen, Robi Polikar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Extremely high dimensional data sets are common in genomic classification scenarios, but they are particularly prevalent in metagenomic studies that represent samples as abundances of taxonomic units. Furthermore, the data dimensionality is typically much larger than the number of observations collected for each instance, a phenomenon known as curse of dimensionality, a particularly challenging problem for most machine learning algorithms. The biologists collecting and analyzing data need efficient methods to determine relationships between classes in a data set and the variables that are capable of differentiating between multiple groups in a study. The most common methods of metagenomic data analysis are those characterized by α- and β-diversity tests; however, neither of these tests allow scientists to identify the organisms that are most responsible for differentiating between different categories in a study. In this paper, we present an analysis of information theoretic feature selection methods for improving the classification accuracy with metagenomic data.

Original languageEnglish (US)
Title of host publicationProceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012
Pages143-146
Number of pages4
DOIs
StatePublished - Dec 1 2012
Externally publishedYes
Event2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012 - Washington, DC, United States
Duration: Dec 2 2012Dec 4 2012

Publication series

NameProceedings - IEEE International Workshop on Genomic Signal Processing and Statistics
ISSN (Print)2150-3001
ISSN (Electronic)2150-301X

Conference

Conference2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012
CountryUnited States
CityWashington, DC
Period12/2/1212/4/12

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Computational Theory and Mathematics
  • Signal Processing
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'Information theoretic feature selection for high dimensional metagenomic data'. Together they form a unique fingerprint.

  • Cite this

    Ditzler, G., Rosen, G., & Polikar, R. (2012). Information theoretic feature selection for high dimensional metagenomic data. In Proceedings 2012 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS 2012 (pp. 143-146). [6507749] (Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics). https://doi.org/10.1109/GENSIPS.2012.6507749