Applying authorship analysis to Arabic web content

Ahmed Abbasi, Hsinchun Chen

Research output: Contribution to journalConference article

29 Scopus citations

Abstract

The advent and rapid proliferation of internet communication has allowed the realization of numerous security issues. The anonymous nature of online mediums such as email, web sites, and forums provides an attractive communication method for criminal activity. Increased globalization and the boundless nature of the internet have further amplified these concerns due to the addition of a multilingual dimension. The world's social and political climate has caused Arabic to draw a great deal of attention. In this study we apply authorship identification techniques to Arabic web forum messages. Our research uses lexical, syntactic, structural, and content-specific writing style features for authorship identification. We address some of the problematic characteristics of Arabic in route to the development of an Arabic language model that provides a respectable level of classification accuracy for authorship discrimination. We also run experiments to evaluate the effectiveness of different feature types and classification techniques on our dataset.

Original languageEnglish (US)
Pages (from-to)183-197
Number of pages15
JournalLECTURE NOTES IN COMPUTER SCIENCE
Volume3495
DOIs
StatePublished - Jan 1 2005
EventIEEE International Conference on Intelligence and Security Informatics, ISI 2005 - Atlanta, GA, United States
Duration: May 19 2005May 20 2005

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Applying authorship analysis to Arabic web content'. Together they form a unique fingerprint.

  • Cite this