A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning

Wenli Zhang, Sudha Ram

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

There has been increasing interest in using data from social media, search engines, and other web sources for predictive analytics in many different domains. Although using these datasets in different context has shown significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of the loosely structured textual data and noise caused by anomalous media spikes and use of misleading terms and phases. We introduce a novel and efficient framework combining natural language processing (NLP) and machine learning classification techniques to extract signal from social media text. Our methodology was tested using two different large real world datasets from social media and resulted in an overall accuracy of 88% and high per-class precision and recall. The methodology described in this paper can be used for a variety of purposes to yield improved analyses of social media and web text with a view to enabling improved predictions.

Original languageEnglish (US)
Title of host publication25th Annual Workshop on Information Technologies and Systems, WITS 2015
PublisherUniversity of Texas at Dallas
StatePublished - 2015
Event25th Annual Workshop on Information Technologies and Systems, WITS 2015 - Dallas, United States
Duration: Dec 12 2015Dec 13 2015

Other

Other25th Annual Workshop on Information Technologies and Systems, WITS 2015
CountryUnited States
CityDallas
Period12/12/1512/13/15

ASJC Scopus subject areas

  • Information Systems

Fingerprint Dive into the research topics of 'A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning'. Together they form a unique fingerprint.

  • Cite this

    Zhang, W., & Ram, S. (2015). A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning. In 25th Annual Workshop on Information Technologies and Systems, WITS 2015 University of Texas at Dallas.