A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning

Wenli Zhang, Sudha Ram

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

There has been increasing interest in using data from social media, search engines, and other web sources for predictive analytics in many different domains. Although using these datasets in different context has shown significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of the loosely structured textual data and noise caused by anomalous media spikes and use of misleading terms and phases. We introduce a novel and efficient framework combining natural language processing (NLP) and machine learning classification techniques to extract signal from social media text. Our methodology was tested using two different large real world datasets from social media and resulted in an overall accuracy of 88% and high per-class precision and recall. The methodology described in this paper can be used for a variety of purposes to yield improved analyses of social media and web text with a view to enabling improved predictions.

Original languageEnglish (US)
Title of host publication25th Annual Workshop on Information Technologies and Systems, WITS 2015
PublisherUniversity of Texas at Dallas
StatePublished - 2015
Event25th Annual Workshop on Information Technologies and Systems, WITS 2015 - Dallas, United States
Duration: Dec 12 2015Dec 13 2015

Other

Other25th Annual Workshop on Information Technologies and Systems, WITS 2015
CountryUnited States
CityDallas
Period12/12/1512/13/15

Fingerprint

Search engines
Mountings
World Wide Web
Learning systems
Processing
Predictive analytics

ASJC Scopus subject areas

  • Information Systems

Cite this

Zhang, W., & Ram, S. (2015). A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning. In 25th Annual Workshop on Information Technologies and Systems, WITS 2015 University of Texas at Dallas.

A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning. / Zhang, Wenli; Ram, Sudha.

25th Annual Workshop on Information Technologies and Systems, WITS 2015. University of Texas at Dallas, 2015.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, W & Ram, S 2015, A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning. in 25th Annual Workshop on Information Technologies and Systems, WITS 2015. University of Texas at Dallas, 25th Annual Workshop on Information Technologies and Systems, WITS 2015, Dallas, United States, 12/12/15.
Zhang W, Ram S. A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning. In 25th Annual Workshop on Information Technologies and Systems, WITS 2015. University of Texas at Dallas. 2015
Zhang, Wenli ; Ram, Sudha. / A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning. 25th Annual Workshop on Information Technologies and Systems, WITS 2015. University of Texas at Dallas, 2015.
@inproceedings{b6353bdf40a7403fa9a55a63c590e0d2,
title = "A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning",
abstract = "There has been increasing interest in using data from social media, search engines, and other web sources for predictive analytics in many different domains. Although using these datasets in different context has shown significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of the loosely structured textual data and noise caused by anomalous media spikes and use of misleading terms and phases. We introduce a novel and efficient framework combining natural language processing (NLP) and machine learning classification techniques to extract signal from social media text. Our methodology was tested using two different large real world datasets from social media and resulted in an overall accuracy of 88{\%} and high per-class precision and recall. The methodology described in this paper can be used for a variety of purposes to yield improved analyses of social media and web text with a view to enabling improved predictions.",
author = "Wenli Zhang and Sudha Ram",
year = "2015",
language = "English (US)",
booktitle = "25th Annual Workshop on Information Technologies and Systems, WITS 2015",
publisher = "University of Texas at Dallas",
address = "United States",

}

TY - GEN

T1 - A Comprehensive methodology for extracting signal from social media text using natural language processing and machine learning

AU - Zhang, Wenli

AU - Ram, Sudha

PY - 2015

Y1 - 2015

N2 - There has been increasing interest in using data from social media, search engines, and other web sources for predictive analytics in many different domains. Although using these datasets in different context has shown significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of the loosely structured textual data and noise caused by anomalous media spikes and use of misleading terms and phases. We introduce a novel and efficient framework combining natural language processing (NLP) and machine learning classification techniques to extract signal from social media text. Our methodology was tested using two different large real world datasets from social media and resulted in an overall accuracy of 88% and high per-class precision and recall. The methodology described in this paper can be used for a variety of purposes to yield improved analyses of social media and web text with a view to enabling improved predictions.

AB - There has been increasing interest in using data from social media, search engines, and other web sources for predictive analytics in many different domains. Although using these datasets in different context has shown significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of the loosely structured textual data and noise caused by anomalous media spikes and use of misleading terms and phases. We introduce a novel and efficient framework combining natural language processing (NLP) and machine learning classification techniques to extract signal from social media text. Our methodology was tested using two different large real world datasets from social media and resulted in an overall accuracy of 88% and high per-class precision and recall. The methodology described in this paper can be used for a variety of purposes to yield improved analyses of social media and web text with a view to enabling improved predictions.

UR - http://www.scopus.com/inward/record.url?scp=85006971926&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006971926&partnerID=8YFLogxK

M3 - Conference contribution

BT - 25th Annual Workshop on Information Technologies and Systems, WITS 2015

PB - University of Texas at Dallas

ER -