Evaluation of Naive Bayes and Support Vector Machines for Wikipedia

Sridhar Mocherla, Alexander Danehy, Christopher D Impey

Research output: Contribution to journalArticle

Abstract

Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.

Original languageEnglish (US)
Pages (from-to)1-12
Number of pages12
JournalApplied Artificial Intelligence
DOIs
StateAccepted/In press - Feb 22 2018

Fingerprint

Support vector machines
Classifiers
Sampling

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Evaluation of Naive Bayes and Support Vector Machines for Wikipedia. / Mocherla, Sridhar; Danehy, Alexander; Impey, Christopher D.

In: Applied Artificial Intelligence, 22.02.2018, p. 1-12.

Research output: Contribution to journalArticle

@article{4e45aaed0e444260a44e56d25d59d988,
title = "Evaluation of Naive Bayes and Support Vector Machines for Wikipedia",
abstract = "Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.",
author = "Sridhar Mocherla and Alexander Danehy and Impey, {Christopher D}",
year = "2018",
month = "2",
day = "22",
doi = "10.1080/08839514.2018.1440907",
language = "English (US)",
pages = "1--12",
journal = "Applied Artificial Intelligence",
issn = "0883-9514",
publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - Evaluation of Naive Bayes and Support Vector Machines for Wikipedia

AU - Mocherla, Sridhar

AU - Danehy, Alexander

AU - Impey, Christopher D

PY - 2018/2/22

Y1 - 2018/2/22

N2 - Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.

AB - Wikipedia has become the de facto source for information on the web, and it has experienced exponential growth since its inception. Text Classification with Wikipedia has seen limited research in the past with the goal of studying and evaluating different classification techniques. To this end, we compare and illustrate the effectiveness of two standard classifiers in the text classification literature, Naive Bayes (Multinomial) and Support Vector Machines (SVM), on the full English Wikipedia corpus for six different categories. For each category, we build training sets using subject matter experts and Wikipedia portals and then evaluate Precision/Recall values using a random sampling approach. Our results show that SVM (linear kernel) performs exceptionally across all categories, and the accuracy of Naive Bayes is inferior in some categories, whereas its generalizing capability is on par with SVM.

UR - http://www.scopus.com/inward/record.url?scp=85042209376&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042209376&partnerID=8YFLogxK

U2 - 10.1080/08839514.2018.1440907

DO - 10.1080/08839514.2018.1440907

M3 - Article

SP - 1

EP - 12

JO - Applied Artificial Intelligence

JF - Applied Artificial Intelligence

SN - 0883-9514

ER -