Applying authorship analysis to Arabic web content

Ahmed Abbasi, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science
EditorsP. Kantor, G. Muresan, F. Roberts, D.D. Zeng, F.-Y. Wang, H. Chen, R.C. Merkle
Pages183-197
Number of pages15
Volume3495
StatePublished - 2005
EventIEEE International Conference on Intelligence and Security Informatics, ISI 2005 - Atlanta, GA, United States
Duration: May 19 2005May 20 2005

Other

OtherIEEE International Conference on Intelligence and Security Informatics, ISI 2005
CountryUnited States
CityAtlanta, GA
Period5/19/055/20/05

Fingerprint

Internet
Communication
Electronic mail
Syntactics
Websites
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)

Cite this

Abbasi, A., & Chen, H. (2005). Applying authorship analysis to Arabic web content. In P. Kantor, G. Muresan, F. Roberts, D. D. Zeng, F-Y. Wang, H. Chen, & R. C. Merkle (Eds.), Lecture Notes in Computer Science (Vol. 3495, pp. 183-197)

Applying authorship analysis to Arabic web content. / Abbasi, Ahmed; Chen, Hsinchun.

Lecture Notes in Computer Science. ed. / P. Kantor; G. Muresan; F. Roberts; D.D. Zeng; F.-Y. Wang; H. Chen; R.C. Merkle. Vol. 3495 2005. p. 183-197.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abbasi, A & Chen, H 2005, Applying authorship analysis to Arabic web content. in P Kantor, G Muresan, F Roberts, DD Zeng, F-Y Wang, H Chen & RC Merkle (eds), Lecture Notes in Computer Science. vol. 3495, pp. 183-197, IEEE International Conference on Intelligence and Security Informatics, ISI 2005, Atlanta, GA, United States, 5/19/05.
Abbasi A, Chen H. Applying authorship analysis to Arabic web content. In Kantor P, Muresan G, Roberts F, Zeng DD, Wang F-Y, Chen H, Merkle RC, editors, Lecture Notes in Computer Science. Vol. 3495. 2005. p. 183-197
Abbasi, Ahmed ; Chen, Hsinchun. / Applying authorship analysis to Arabic web content. Lecture Notes in Computer Science. editor / P. Kantor ; G. Muresan ; F. Roberts ; D.D. Zeng ; F.-Y. Wang ; H. Chen ; R.C. Merkle. Vol. 3495 2005. pp. 183-197
@inproceedings{54727f6b3729452e86eb446b449dca16,
title = "Applying authorship analysis to Arabic web content",
abstract = "The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.",
author = "Ahmed Abbasi and Hsinchun Chen",
year = "2005",
language = "English (US)",
volume = "3495",
pages = "183--197",
editor = "P. Kantor and G. Muresan and F. Roberts and D.D. Zeng and F.-Y. Wang and H. Chen and R.C. Merkle",
booktitle = "Lecture Notes in Computer Science",

}

TY - GEN

T1 - Applying authorship analysis to Arabic web content

AU - Abbasi, Ahmed

AU - Chen, Hsinchun

PY - 2005

Y1 - 2005

N2 - The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.

AB - The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.

UR - http://www.scopus.com/inward/record.url?scp=24944516697&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=24944516697&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:24944516697

VL - 3495

SP - 183

EP - 197

BT - Lecture Notes in Computer Science

A2 - Kantor, P.

A2 - Muresan, G.

A2 - Roberts, F.

A2 - Zeng, D.D.

A2 - Wang, F.-Y.

A2 - Chen, H.

A2 - Merkle, R.C.

ER -