Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

Ahmed Abbasi, Hsinchun Chen, Arab Salem

Research output: Contribution to journalArticle

539 Citations (Scopus)

Abstract

The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

Original languageEnglish (US)
Article number12
JournalACM Transactions on Information Systems
Volume26
Issue number3
DOIs
StatePublished - Jun 1 2008

Fingerprint

Feature extraction
Genetic algorithms
Entropy
Syntactics
Testbeds
Linguistics
World Wide Web
Sentiment analysis
Feature selection
Genetic algorithm
Language
Internet
Benchmark

Keywords

  • Feature selection
  • Opinion mining
  • Sentiment analysis
  • Text classification

ASJC Scopus subject areas

  • Information Systems

Cite this

Sentiment analysis in multiple languages : Feature selection for opinion classification in Web forums. / Abbasi, Ahmed; Chen, Hsinchun; Salem, Arab.

In: ACM Transactions on Information Systems, Vol. 26, No. 3, 12, 01.06.2008.

Research output: Contribution to journalArticle

@article{b33948381d224eb3b4c7939906245af5,
title = "Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums",
abstract = "The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91{\%} on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.",
keywords = "Feature selection, Opinion mining, Sentiment analysis, Text classification",
author = "Ahmed Abbasi and Hsinchun Chen and Arab Salem",
year = "2008",
month = "6",
day = "1",
doi = "10.1145/1361684.1361685",
language = "English (US)",
volume = "26",
journal = "ACM Transactions on Information Systems",
issn = "1046-8188",
publisher = "Association for Computing Machinery (ACM)",
number = "3",

}

TY - JOUR

T1 - Sentiment analysis in multiple languages

T2 - Feature selection for opinion classification in Web forums

AU - Abbasi, Ahmed

AU - Chen, Hsinchun

AU - Salem, Arab

PY - 2008/6/1

Y1 - 2008/6/1

N2 - The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

AB - The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

KW - Feature selection

KW - Opinion mining

KW - Sentiment analysis

KW - Text classification

UR - http://www.scopus.com/inward/record.url?scp=46249095180&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=46249095180&partnerID=8YFLogxK

U2 - 10.1145/1361684.1361685

DO - 10.1145/1361684.1361685

M3 - Article

AN - SCOPUS:46249095180

VL - 26

JO - ACM Transactions on Information Systems

JF - ACM Transactions on Information Systems

SN - 1046-8188

IS - 3

M1 - 12

ER -