Categorization and analysis of text in computer mediated communication archives using visualization

Ahmed Abbasi, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Citations (Scopus)

Abstract

Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM International Conference on Digital Libraries
Pages11-18
Number of pages8
DOIs
StatePublished - 2007
Event7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment - Vancouver, BC, Canada
Duration: Jun 18 2007Jun 23 2007

Other

Other7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
CountryCanada
CityVancouver, BC
Period6/18/076/23/07

Fingerprint

computer-mediated communication
Ink
visualization
Visualization
Digital libraries
Communication
Decision trees
Support vector machines
systems analysis
Navigation
Classifiers
Systems analysis
discourse
experiment
interaction
Experiments

Keywords

  • Computer mediated communication
  • Text mining
  • Visualization

ASJC Scopus subject areas

  • Computer Science(all)
  • Social Sciences(all)

Cite this

Abbasi, A., & Chen, H. (2007). Categorization and analysis of text in computer mediated communication archives using visualization. In Proceedings of the ACM International Conference on Digital Libraries (pp. 11-18) https://doi.org/10.1145/1255175.1255178

Categorization and analysis of text in computer mediated communication archives using visualization. / Abbasi, Ahmed; Chen, Hsinchun.

Proceedings of the ACM International Conference on Digital Libraries. 2007. p. 11-18.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abbasi, A & Chen, H 2007, Categorization and analysis of text in computer mediated communication archives using visualization. in Proceedings of the ACM International Conference on Digital Libraries. pp. 11-18, 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment, Vancouver, BC, Canada, 6/18/07. https://doi.org/10.1145/1255175.1255178
Abbasi A, Chen H. Categorization and analysis of text in computer mediated communication archives using visualization. In Proceedings of the ACM International Conference on Digital Libraries. 2007. p. 11-18 https://doi.org/10.1145/1255175.1255178
Abbasi, Ahmed ; Chen, Hsinchun. / Categorization and analysis of text in computer mediated communication archives using visualization. Proceedings of the ACM International Conference on Digital Libraries. 2007. pp. 11-18
@inproceedings{43b9047a504740b18769a1ff3415d707,
title = "Categorization and analysis of text in computer mediated communication archives using visualization",
abstract = "Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.",
keywords = "Computer mediated communication, Text mining, Visualization",
author = "Ahmed Abbasi and Hsinchun Chen",
year = "2007",
doi = "10.1145/1255175.1255178",
language = "English (US)",
isbn = "1595936440",
pages = "11--18",
booktitle = "Proceedings of the ACM International Conference on Digital Libraries",

}

TY - GEN

T1 - Categorization and analysis of text in computer mediated communication archives using visualization

AU - Abbasi, Ahmed

AU - Chen, Hsinchun

PY - 2007

Y1 - 2007

N2 - Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.

AB - Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.

KW - Computer mediated communication

KW - Text mining

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=36349003655&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36349003655&partnerID=8YFLogxK

U2 - 10.1145/1255175.1255178

DO - 10.1145/1255175.1255178

M3 - Conference contribution

SN - 1595936440

SN - 9781595936448

SP - 11

EP - 18

BT - Proceedings of the ACM International Conference on Digital Libraries

ER -