Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR

Research output: ResearchConference contribution

Abstract

Using electronic health records of children evaluated for Autism Spectrum Disorders, we are developing a decision support system for automated diagnostic criteria extraction and case classification. We manually created 92 lexicons which we tested as features for classification and compared with features created automatically using word embedding. The expert annotations used for manual lexicon creation provided seed terms that were expanded with the 15 most similar terms (Word2Vec). The resulting 2,200 terms were clustered in 92 clusters parallel to the manually created lexicons. We compared both sets of features to classify case status with a FF\BP neural network (NN) and C5.0 decision tree. For manually created lexicons, classification accuracy was 76.92% for the NN and 84.60% for C5.0. For the automatically created lexicons, accuracy was 79.78% for the NN and 86.81% for C5.0. Automated lexicon creation required a much shorter development time and brought similarly high quality outcomes.

LanguageEnglish (US)
Title of host publicationNatural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings
PublisherSpringer Verlag
Pages34-37
Number of pages4
Volume10260 LNCS
ISBN (Print)9783319595689
DOIs
StatePublished - 2017
Event22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017 - Liege, Belgium
Duration: Jun 21 2017Jun 23 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10260 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017
CountryBelgium
CityLiege
Period6/21/176/23/17

Fingerprint

Clustering
Term
Neural networks
Neural Networks
BP Neural Network
Decision Support Systems
Decision tree
Annotation
Disorder
Diagnostics
Health
Classify
Electronics
Children
Decision trees
Decision support systems
Seed

Keywords

  • Autism spectrum disorders
  • Classification
  • Clustering
  • EHR
  • Electronic health records
  • Natural language processing
  • NLP
  • Word embedding

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Leroy, G., Gu, Y., Pettygrove, S., & Kurzius-Spencer, M. (2017). Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR. In Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings (Vol. 10260 LNCS, pp. 34-37). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10260 LNCS). Springer Verlag. DOI: 10.1007/978-3-319-59569-6_4

Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR. / Leroy, Gondy; Gu, Yang; Pettygrove, Sydney; Kurzius-Spencer, Margaret.

Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings. Vol. 10260 LNCS Springer Verlag, 2017. p. 34-37 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10260 LNCS).

Research output: ResearchConference contribution

Leroy, G, Gu, Y, Pettygrove, S & Kurzius-Spencer, M 2017, Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR. in Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings. vol. 10260 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10260 LNCS, Springer Verlag, pp. 34-37, 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Liege, Belgium, 6/21/17. DOI: 10.1007/978-3-319-59569-6_4
Leroy G, Gu Y, Pettygrove S, Kurzius-Spencer M. Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR. In Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings. Vol. 10260 LNCS. Springer Verlag. 2017. p. 34-37. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Available from, DOI: 10.1007/978-3-319-59569-6_4
Leroy, Gondy ; Gu, Yang ; Pettygrove, Sydney ; Kurzius-Spencer, Margaret. / Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR. Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings. Vol. 10260 LNCS Springer Verlag, 2017. pp. 34-37 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inbook{d435e551186842c3819f48b1b81e37f1,
title = "Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR",
abstract = "Using electronic health records of children evaluated for Autism Spectrum Disorders, we are developing a decision support system for automated diagnostic criteria extraction and case classification. We manually created 92 lexicons which we tested as features for classification and compared with features created automatically using word embedding. The expert annotations used for manual lexicon creation provided seed terms that were expanded with the 15 most similar terms (Word2Vec). The resulting 2,200 terms were clustered in 92 clusters parallel to the manually created lexicons. We compared both sets of features to classify case status with a FF\BP neural network (NN) and C5.0 decision tree. For manually created lexicons, classification accuracy was 76.92% for the NN and 84.60% for C5.0. For the automatically created lexicons, accuracy was 79.78% for the NN and 86.81% for C5.0. Automated lexicon creation required a much shorter development time and brought similarly high quality outcomes.",
keywords = "Autism spectrum disorders, Classification, Clustering, EHR, Electronic health records, Natural language processing, NLP, Word embedding",
author = "Gondy Leroy and Yang Gu and Sydney Pettygrove and Margaret Kurzius-Spencer",
year = "2017",
doi = "10.1007/978-3-319-59569-6_4",
isbn = "9783319595689",
volume = "10260 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "34--37",
booktitle = "Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings",
address = "Germany",

}

TY - CHAP

T1 - Automated lexicon and feature construction using word embedding and clustering for classification of asd diagnoses using EHR

AU - Leroy,Gondy

AU - Gu,Yang

AU - Pettygrove,Sydney

AU - Kurzius-Spencer,Margaret

PY - 2017

Y1 - 2017

N2 - Using electronic health records of children evaluated for Autism Spectrum Disorders, we are developing a decision support system for automated diagnostic criteria extraction and case classification. We manually created 92 lexicons which we tested as features for classification and compared with features created automatically using word embedding. The expert annotations used for manual lexicon creation provided seed terms that were expanded with the 15 most similar terms (Word2Vec). The resulting 2,200 terms were clustered in 92 clusters parallel to the manually created lexicons. We compared both sets of features to classify case status with a FF\BP neural network (NN) and C5.0 decision tree. For manually created lexicons, classification accuracy was 76.92% for the NN and 84.60% for C5.0. For the automatically created lexicons, accuracy was 79.78% for the NN and 86.81% for C5.0. Automated lexicon creation required a much shorter development time and brought similarly high quality outcomes.

AB - Using electronic health records of children evaluated for Autism Spectrum Disorders, we are developing a decision support system for automated diagnostic criteria extraction and case classification. We manually created 92 lexicons which we tested as features for classification and compared with features created automatically using word embedding. The expert annotations used for manual lexicon creation provided seed terms that were expanded with the 15 most similar terms (Word2Vec). The resulting 2,200 terms were clustered in 92 clusters parallel to the manually created lexicons. We compared both sets of features to classify case status with a FF\BP neural network (NN) and C5.0 decision tree. For manually created lexicons, classification accuracy was 76.92% for the NN and 84.60% for C5.0. For the automatically created lexicons, accuracy was 79.78% for the NN and 86.81% for C5.0. Automated lexicon creation required a much shorter development time and brought similarly high quality outcomes.

KW - Autism spectrum disorders

KW - Classification

KW - Clustering

KW - EHR

KW - Electronic health records

KW - Natural language processing

KW - NLP

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=85021769981&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021769981&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-59569-6_4

DO - 10.1007/978-3-319-59569-6_4

M3 - Conference contribution

SN - 9783319595689

VL - 10260 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 34

EP - 37

BT - Natural Language Processing and Information Systems - 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Proceedings

PB - Springer Verlag

ER -