Deep learning based topic identification and categorization: Mining diabetes-related topics on Chinese health websites

Xinhuan Chen, Yong Zhang, Jennifer Xu, Chunxiao Xing, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages481-500
Number of pages20
Volume9642
ISBN (Print)9783319320243
DOIs
StatePublished - 2016
Event21st International Conference on Database Systems for Advanced Applications, DASFAA 2016 - Dallas, United States
Duration: Apr 16 2016Apr 19 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9642
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other21st International Conference on Database Systems for Advanced Applications, DASFAA 2016
CountryUnited States
CityDallas
Period4/16/164/19/16

Fingerprint

Diabetes
Categorization
Medical problems
Websites
Mining
Health
Learning
Deep learning
China
Continue
Benchmark
Experiment

Keywords

  • Chinese
  • Deep learning
  • Healthcare
  • Text classification

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Chen, X., Zhang, Y., Xu, J., Xing, C., & Chen, H. (2016). Deep learning based topic identification and categorization: Mining diabetes-related topics on Chinese health websites. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9642, pp. 481-500). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9642). Springer Verlag. https://doi.org/10.1007/978-3-319-32025-0_30

Deep learning based topic identification and categorization : Mining diabetes-related topics on Chinese health websites. / Chen, Xinhuan; Zhang, Yong; Xu, Jennifer; Xing, Chunxiao; Chen, Hsinchun.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9642 Springer Verlag, 2016. p. 481-500 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9642).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, X, Zhang, Y, Xu, J, Xing, C & Chen, H 2016, Deep learning based topic identification and categorization: Mining diabetes-related topics on Chinese health websites. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9642, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9642, Springer Verlag, pp. 481-500, 21st International Conference on Database Systems for Advanced Applications, DASFAA 2016, Dallas, United States, 4/16/16. https://doi.org/10.1007/978-3-319-32025-0_30
Chen X, Zhang Y, Xu J, Xing C, Chen H. Deep learning based topic identification and categorization: Mining diabetes-related topics on Chinese health websites. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9642. Springer Verlag. 2016. p. 481-500. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-32025-0_30
Chen, Xinhuan ; Zhang, Yong ; Xu, Jennifer ; Xing, Chunxiao ; Chen, Hsinchun. / Deep learning based topic identification and categorization : Mining diabetes-related topics on Chinese health websites. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9642 Springer Verlag, 2016. pp. 481-500 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{a3d72b6d378343c8ae1dfc290f7054ec,
title = "Deep learning based topic identification and categorization: Mining diabetes-related topics on Chinese health websites",
abstract = "As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.",
keywords = "Chinese, Deep learning, Healthcare, Text classification",
author = "Xinhuan Chen and Yong Zhang and Jennifer Xu and Chunxiao Xing and Hsinchun Chen",
year = "2016",
doi = "10.1007/978-3-319-32025-0_30",
language = "English (US)",
isbn = "9783319320243",
volume = "9642",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "481--500",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Deep learning based topic identification and categorization

T2 - Mining diabetes-related topics on Chinese health websites

AU - Chen, Xinhuan

AU - Zhang, Yong

AU - Xu, Jennifer

AU - Xing, Chunxiao

AU - Chen, Hsinchun

PY - 2016

Y1 - 2016

N2 - As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.

AB - As millions of people are diagnosed with diabetes every year, the demand for information about diabetes continues to increase. China is one of the countries with a large population of diabetes patients. Many Chinese health websites provide diabetes related news and articles. However, because most of the online articles are uncategorized or lack a clear topic and theme, users often cannot find their topics of interest effectively and efficiently. The problem of health topic identification and categorization on Chinese websites cannot be easily addressed by applying existing approaches and methods, which have been used for English documents, in a straightforward manner. To address this problem and meet users’ demand for diabetes related information needs, we propose a deep learning based framework to identify and categorize topics related to diabetes in online Chinese articles. Our experiments using datasets with over 19,000 online articles showed that the framework achieved a higher effectiveness and accuracy in categorizing diabetes related topics than most of the state-of-the-art benchmark approaches.

KW - Chinese

KW - Deep learning

KW - Healthcare

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=84962382989&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962382989&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-32025-0_30

DO - 10.1007/978-3-319-32025-0_30

M3 - Conference contribution

AN - SCOPUS:84962382989

SN - 9783319320243

VL - 9642

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 481

EP - 500

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -