Entity attribute discovery and clustering from online reviews

Qingliang Miao, Qiudan Li, Dajun Zeng, Yao Meng, Shu Zhang, Hao Yu

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.

Original languageEnglish (US)
Pages (from-to)279-288
Number of pages10
JournalFrontiers of Computer Science
Volume8
Issue number2
DOIs
StatePublished - 2014
Externally publishedYes

Fingerprint

Attribute
Clustering
Semantic Similarity
Labeling
Semantics
Schema
Customers
Review
Experimental Results
Model

Keywords

  • attribute clustering
  • attribute extraction
  • opinion mining

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Entity attribute discovery and clustering from online reviews. / Miao, Qingliang; Li, Qiudan; Zeng, Dajun; Meng, Yao; Zhang, Shu; Yu, Hao.

In: Frontiers of Computer Science, Vol. 8, No. 2, 2014, p. 279-288.

Research output: Contribution to journalArticle

Miao, Qingliang ; Li, Qiudan ; Zeng, Dajun ; Meng, Yao ; Zhang, Shu ; Yu, Hao. / Entity attribute discovery and clustering from online reviews. In: Frontiers of Computer Science. 2014 ; Vol. 8, No. 2. pp. 279-288.
@article{069b455abd4649b7b86e7fe2f914fa7e,
title = "Entity attribute discovery and clustering from online reviews",
abstract = "The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.",
keywords = "attribute clustering, attribute extraction, opinion mining",
author = "Qingliang Miao and Qiudan Li and Dajun Zeng and Yao Meng and Shu Zhang and Hao Yu",
year = "2014",
doi = "10.1007/s11704-014-3043-8",
language = "English (US)",
volume = "8",
pages = "279--288",
journal = "Frontiers of Computer Science",
issn = "2095-2228",
publisher = "Springer Science + Business Media",
number = "2",

}

TY - JOUR

T1 - Entity attribute discovery and clustering from online reviews

AU - Miao, Qingliang

AU - Li, Qiudan

AU - Zeng, Dajun

AU - Meng, Yao

AU - Zhang, Shu

AU - Yu, Hao

PY - 2014

Y1 - 2014

N2 - The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.

AB - The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.

KW - attribute clustering

KW - attribute extraction

KW - opinion mining

UR - http://www.scopus.com/inward/record.url?scp=84897965983&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897965983&partnerID=8YFLogxK

U2 - 10.1007/s11704-014-3043-8

DO - 10.1007/s11704-014-3043-8

M3 - Article

AN - SCOPUS:84897965983

VL - 8

SP - 279

EP - 288

JO - Frontiers of Computer Science

JF - Frontiers of Computer Science

SN - 2095-2228

IS - 2

ER -