A framework for stylometric similarity detection in online settings

Ahmed Abbasi, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Online marketplaces and communication media such as email, web sites, forums, and chat rooms have been ubiquitously integrated into our everyday lives. Unfortunately, the anonymous nature of these channels makes them an ideal avenue for online fraud, hackers, and cybercrime. Anonymity and the sheer volume of online content make cyber identity tracing an essential yet strenuous endeavor for Internet users and human analysts. In order to address these challenges, we propose a framework for online stylometric analysis to assist in distinguishing authorship in online communities based on writing style. Our framework includes the use of a scalable identity-level similarity detection technique coupled with an extensive stylistic feature set and an identity database. The framework is intended to support stylometric authentication for Internet users as well as provide support for forensic investigations. The proposed technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 100 electronic market traders. The method outperformed benchmark stylometric techniques with an accuracy of approximately 95% when differentiating between 200 trader identities. The results indicate that the proposed stylometric analysis approach may help mitigate the effects of online anonymity abuse.

Original languageEnglish (US)
Title of host publicationAssociation for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights
Pages1442-1451
Number of pages10
Volume2
StatePublished - 2007
Event13th Americas Conference on Information Systems, AMCIS 2007 - Keystone, CO, United States
Duration: Aug 10 2007Aug 12 2007

Other

Other13th Americas Conference on Information Systems, AMCIS 2007
CountryUnited States
CityKeystone, CO
Period8/10/078/12/07

Fingerprint

Internet
anonymity
Electronic mail
Authentication
Websites
electronic market
Feedback
hacker
communication media
Communication
internet community
chat
fraud
everyday life
abuse

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications
  • Information Systems
  • Library and Information Sciences

Cite this

Abbasi, A., & Chen, H. (2007). A framework for stylometric similarity detection in online settings. In Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights (Vol. 2, pp. 1442-1451)

A framework for stylometric similarity detection in online settings. / Abbasi, Ahmed; Chen, Hsinchun.

Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights. Vol. 2 2007. p. 1442-1451.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abbasi, A & Chen, H 2007, A framework for stylometric similarity detection in online settings. in Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights. vol. 2, pp. 1442-1451, 13th Americas Conference on Information Systems, AMCIS 2007, Keystone, CO, United States, 8/10/07.
Abbasi A, Chen H. A framework for stylometric similarity detection in online settings. In Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights. Vol. 2. 2007. p. 1442-1451
Abbasi, Ahmed ; Chen, Hsinchun. / A framework for stylometric similarity detection in online settings. Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights. Vol. 2 2007. pp. 1442-1451
@inproceedings{7d3cd99a4d5242b6aaa138f9f0ad919b,
title = "A framework for stylometric similarity detection in online settings",
abstract = "Online marketplaces and communication media such as email, web sites, forums, and chat rooms have been ubiquitously integrated into our everyday lives. Unfortunately, the anonymous nature of these channels makes them an ideal avenue for online fraud, hackers, and cybercrime. Anonymity and the sheer volume of online content make cyber identity tracing an essential yet strenuous endeavor for Internet users and human analysts. In order to address these challenges, we propose a framework for online stylometric analysis to assist in distinguishing authorship in online communities based on writing style. Our framework includes the use of a scalable identity-level similarity detection technique coupled with an extensive stylistic feature set and an identity database. The framework is intended to support stylometric authentication for Internet users as well as provide support for forensic investigations. The proposed technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 100 electronic market traders. The method outperformed benchmark stylometric techniques with an accuracy of approximately 95{\%} when differentiating between 200 trader identities. The results indicate that the proposed stylometric analysis approach may help mitigate the effects of online anonymity abuse.",
author = "Ahmed Abbasi and Hsinchun Chen",
year = "2007",
language = "English (US)",
isbn = "9781604233810",
volume = "2",
pages = "1442--1451",
booktitle = "Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights",

}

TY - GEN

T1 - A framework for stylometric similarity detection in online settings

AU - Abbasi, Ahmed

AU - Chen, Hsinchun

PY - 2007

Y1 - 2007

N2 - Online marketplaces and communication media such as email, web sites, forums, and chat rooms have been ubiquitously integrated into our everyday lives. Unfortunately, the anonymous nature of these channels makes them an ideal avenue for online fraud, hackers, and cybercrime. Anonymity and the sheer volume of online content make cyber identity tracing an essential yet strenuous endeavor for Internet users and human analysts. In order to address these challenges, we propose a framework for online stylometric analysis to assist in distinguishing authorship in online communities based on writing style. Our framework includes the use of a scalable identity-level similarity detection technique coupled with an extensive stylistic feature set and an identity database. The framework is intended to support stylometric authentication for Internet users as well as provide support for forensic investigations. The proposed technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 100 electronic market traders. The method outperformed benchmark stylometric techniques with an accuracy of approximately 95% when differentiating between 200 trader identities. The results indicate that the proposed stylometric analysis approach may help mitigate the effects of online anonymity abuse.

AB - Online marketplaces and communication media such as email, web sites, forums, and chat rooms have been ubiquitously integrated into our everyday lives. Unfortunately, the anonymous nature of these channels makes them an ideal avenue for online fraud, hackers, and cybercrime. Anonymity and the sheer volume of online content make cyber identity tracing an essential yet strenuous endeavor for Internet users and human analysts. In order to address these challenges, we propose a framework for online stylometric analysis to assist in distinguishing authorship in online communities based on writing style. Our framework includes the use of a scalable identity-level similarity detection technique coupled with an extensive stylistic feature set and an identity database. The framework is intended to support stylometric authentication for Internet users as well as provide support for forensic investigations. The proposed technique and extended feature set were evaluated on a test bed encompassing thousands of feedback comments posted by 100 electronic market traders. The method outperformed benchmark stylometric techniques with an accuracy of approximately 95% when differentiating between 200 trader identities. The results indicate that the proposed stylometric analysis approach may help mitigate the effects of online anonymity abuse.

UR - http://www.scopus.com/inward/record.url?scp=84870155622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870155622&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781604233810

VL - 2

SP - 1442

EP - 1451

BT - Association for Information Systems - 13th Americas Conference on Information Systems, AMCIS 2007: Reaching New Heights

ER -