Sentiment analysis on chinese health forums: A preliminary study of different language models

Yan Zhang, Yong Zhang, Jennifer Xu, Chunxiao Xing, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Sentiment analysis on Chinese health forums is challenging because of the language, platform, and domain characteristics. Our research investigates the impact of three factors on sentiment analysis: sentiment polarity distribution, language models, and model settings. We manually labeled a large sample of Chinese health forum posts, which showed an extremely unbalanced distribution with a very small percentage of negative posts, and found that the balanced training set could produce higher accuracy than the unbalanced one. We also found that the hybrid approaches combining multiple language model based approaches for sentiment analysis performed better than individual approaches. Finally we evaluated the effects of different model settings and improved the overall accuracy using the hybrid approaches in their optimal settings. Findings from this preliminary study provide deeper insights into the problem of sentiment analysis on Chinese health forums and will inform future sentiment analysis studies.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages68-81
Number of pages14
Volume9545
ISBN (Print)9783319291741
DOIs
StatePublished - 2016
EventInternational Conference for Smart Health, ICSH 2015 - Phoenix, United States
Duration: Nov 17 2015Nov 18 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9545
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

OtherInternational Conference for Smart Health, ICSH 2015
CountryUnited States
CityPhoenix
Period11/17/1511/18/15

Fingerprint

Sentiment Analysis
Language Model
Health
Hybrid Approach
Multiple Models
Polarity
Percentage
High Accuracy
Model-based
Model

Keywords

  • Chinese health forum
  • Language model
  • Sentiment analysis

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Zhang, Y., Zhang, Y., Xu, J., Xing, C., & Chen, H. (2016). Sentiment analysis on chinese health forums: A preliminary study of different language models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9545, pp. 68-81). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9545). Springer Verlag. https://doi.org/10.1007/978-3-319-29175-8_7

Sentiment analysis on chinese health forums : A preliminary study of different language models. / Zhang, Yan; Zhang, Yong; Xu, Jennifer; Xing, Chunxiao; Chen, Hsinchun.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9545 Springer Verlag, 2016. p. 68-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9545).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, Y, Zhang, Y, Xu, J, Xing, C & Chen, H 2016, Sentiment analysis on chinese health forums: A preliminary study of different language models. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9545, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9545, Springer Verlag, pp. 68-81, International Conference for Smart Health, ICSH 2015, Phoenix, United States, 11/17/15. https://doi.org/10.1007/978-3-319-29175-8_7
Zhang Y, Zhang Y, Xu J, Xing C, Chen H. Sentiment analysis on chinese health forums: A preliminary study of different language models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9545. Springer Verlag. 2016. p. 68-81. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-29175-8_7
Zhang, Yan ; Zhang, Yong ; Xu, Jennifer ; Xing, Chunxiao ; Chen, Hsinchun. / Sentiment analysis on chinese health forums : A preliminary study of different language models. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9545 Springer Verlag, 2016. pp. 68-81 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{f23d0364a7d74ad19450ffdd3978fa4f,
title = "Sentiment analysis on chinese health forums: A preliminary study of different language models",
abstract = "Sentiment analysis on Chinese health forums is challenging because of the language, platform, and domain characteristics. Our research investigates the impact of three factors on sentiment analysis: sentiment polarity distribution, language models, and model settings. We manually labeled a large sample of Chinese health forum posts, which showed an extremely unbalanced distribution with a very small percentage of negative posts, and found that the balanced training set could produce higher accuracy than the unbalanced one. We also found that the hybrid approaches combining multiple language model based approaches for sentiment analysis performed better than individual approaches. Finally we evaluated the effects of different model settings and improved the overall accuracy using the hybrid approaches in their optimal settings. Findings from this preliminary study provide deeper insights into the problem of sentiment analysis on Chinese health forums and will inform future sentiment analysis studies.",
keywords = "Chinese health forum, Language model, Sentiment analysis",
author = "Yan Zhang and Yong Zhang and Jennifer Xu and Chunxiao Xing and Hsinchun Chen",
year = "2016",
doi = "10.1007/978-3-319-29175-8_7",
language = "English (US)",
isbn = "9783319291741",
volume = "9545",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "68--81",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Sentiment analysis on chinese health forums

T2 - A preliminary study of different language models

AU - Zhang, Yan

AU - Zhang, Yong

AU - Xu, Jennifer

AU - Xing, Chunxiao

AU - Chen, Hsinchun

PY - 2016

Y1 - 2016

N2 - Sentiment analysis on Chinese health forums is challenging because of the language, platform, and domain characteristics. Our research investigates the impact of three factors on sentiment analysis: sentiment polarity distribution, language models, and model settings. We manually labeled a large sample of Chinese health forum posts, which showed an extremely unbalanced distribution with a very small percentage of negative posts, and found that the balanced training set could produce higher accuracy than the unbalanced one. We also found that the hybrid approaches combining multiple language model based approaches for sentiment analysis performed better than individual approaches. Finally we evaluated the effects of different model settings and improved the overall accuracy using the hybrid approaches in their optimal settings. Findings from this preliminary study provide deeper insights into the problem of sentiment analysis on Chinese health forums and will inform future sentiment analysis studies.

AB - Sentiment analysis on Chinese health forums is challenging because of the language, platform, and domain characteristics. Our research investigates the impact of three factors on sentiment analysis: sentiment polarity distribution, language models, and model settings. We manually labeled a large sample of Chinese health forum posts, which showed an extremely unbalanced distribution with a very small percentage of negative posts, and found that the balanced training set could produce higher accuracy than the unbalanced one. We also found that the hybrid approaches combining multiple language model based approaches for sentiment analysis performed better than individual approaches. Finally we evaluated the effects of different model settings and improved the overall accuracy using the hybrid approaches in their optimal settings. Findings from this preliminary study provide deeper insights into the problem of sentiment analysis on Chinese health forums and will inform future sentiment analysis studies.

KW - Chinese health forum

KW - Language model

KW - Sentiment analysis

UR - http://www.scopus.com/inward/record.url?scp=84958552886&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958552886&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-29175-8_7

DO - 10.1007/978-3-319-29175-8_7

M3 - Conference contribution

AN - SCOPUS:84958552886

SN - 9783319291741

VL - 9545

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 68

EP - 81

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -