Artificial immune system for illicit content identification in social media

Ming Yang, Melody Kiang, Hsinchun Chen, Yijun Li

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets.The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.

Original languageEnglish (US)
Pages (from-to)256-269
Number of pages14
JournalJournal of the American Society for Information Science and Technology
Volume63
Issue number2
DOIs
StatePublished - Feb 2012

Fingerprint

Immune system
social media
Supervised learning
Labeling
Labels
hate
propaganda
Artificial immune system
Social media
heuristics
evaluation
learning
performance

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Information Systems
  • Human-Computer Interaction
  • Computer Networks and Communications

Cite this

Artificial immune system for illicit content identification in social media. / Yang, Ming; Kiang, Melody; Chen, Hsinchun; Li, Yijun.

In: Journal of the American Society for Information Science and Technology, Vol. 63, No. 2, 02.2012, p. 256-269.

Research output: Contribution to journalArticle

@article{c8fb3dc11c10432abd792733535ad72e,
title = "Artificial immune system for illicit content identification in social media",
abstract = "Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets.The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.",
author = "Ming Yang and Melody Kiang and Hsinchun Chen and Yijun Li",
year = "2012",
month = "2",
doi = "10.1002/asi.21673",
language = "English (US)",
volume = "63",
pages = "256--269",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "2",

}

TY - JOUR

T1 - Artificial immune system for illicit content identification in social media

AU - Yang, Ming

AU - Kiang, Melody

AU - Chen, Hsinchun

AU - Li, Yijun

PY - 2012/2

Y1 - 2012/2

N2 - Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets.The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.

AB - Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets.The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.

UR - http://www.scopus.com/inward/record.url?scp=84857361278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857361278&partnerID=8YFLogxK

U2 - 10.1002/asi.21673

DO - 10.1002/asi.21673

M3 - Article

AN - SCOPUS:84857361278

VL - 63

SP - 256

EP - 269

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 2

ER -