Scalable temporal clustering for massive multidimensional data streams

Gediminas Adomavicius, Jesse C Bockstedt, Vishnu Parimi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Today's organizations are continuously capturing extremely large amounts of data, which will only continue to increase. In this paper we present a new approach to discovering clusters in these massive amounts of complex (i.e., multidimensional) continuously-arriving data, which are much too large to be analyzed as one dataset. In order to guarantee acceptable scalability, our approach builds on existing data mining literature and uses sampling-based techniques, an advanced variation of hierarchical agglomerative clustering, and an approach for sample-based cluster reconstruction to provide an approximate clustering solution of very high accuracy. We test the proposed approach empirically and show that it provides excellent clustering performance and, at the same time, demonstrates significant computational savings.

Original languageEnglish (US)
Title of host publication2008 Workshop on Information Technologies and Systems, WITS 2008
PublisherSocial Science Research Network
Pages121-126
Number of pages6
StatePublished - 2008
Externally publishedYes
Event2008 Workshop on Information Technologies and Systems, WITS 2008 - Paris, France
Duration: Dec 13 2008Dec 14 2008

Other

Other2008 Workshop on Information Technologies and Systems, WITS 2008
CountryFrance
CityParis
Period12/13/0812/14/08

Fingerprint

Data mining
Scalability
Sampling

ASJC Scopus subject areas

  • Information Systems
  • Control and Systems Engineering

Cite this

Adomavicius, G., Bockstedt, J. C., & Parimi, V. (2008). Scalable temporal clustering for massive multidimensional data streams. In 2008 Workshop on Information Technologies and Systems, WITS 2008 (pp. 121-126). Social Science Research Network.

Scalable temporal clustering for massive multidimensional data streams. / Adomavicius, Gediminas; Bockstedt, Jesse C; Parimi, Vishnu.

2008 Workshop on Information Technologies and Systems, WITS 2008. Social Science Research Network, 2008. p. 121-126.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Adomavicius, G, Bockstedt, JC & Parimi, V 2008, Scalable temporal clustering for massive multidimensional data streams. in 2008 Workshop on Information Technologies and Systems, WITS 2008. Social Science Research Network, pp. 121-126, 2008 Workshop on Information Technologies and Systems, WITS 2008, Paris, France, 12/13/08.
Adomavicius G, Bockstedt JC, Parimi V. Scalable temporal clustering for massive multidimensional data streams. In 2008 Workshop on Information Technologies and Systems, WITS 2008. Social Science Research Network. 2008. p. 121-126
Adomavicius, Gediminas ; Bockstedt, Jesse C ; Parimi, Vishnu. / Scalable temporal clustering for massive multidimensional data streams. 2008 Workshop on Information Technologies and Systems, WITS 2008. Social Science Research Network, 2008. pp. 121-126
@inproceedings{ac0d0c2aa41a4d4e8da3729c1556087c,
title = "Scalable temporal clustering for massive multidimensional data streams",
abstract = "Today's organizations are continuously capturing extremely large amounts of data, which will only continue to increase. In this paper we present a new approach to discovering clusters in these massive amounts of complex (i.e., multidimensional) continuously-arriving data, which are much too large to be analyzed as one dataset. In order to guarantee acceptable scalability, our approach builds on existing data mining literature and uses sampling-based techniques, an advanced variation of hierarchical agglomerative clustering, and an approach for sample-based cluster reconstruction to provide an approximate clustering solution of very high accuracy. We test the proposed approach empirically and show that it provides excellent clustering performance and, at the same time, demonstrates significant computational savings.",
author = "Gediminas Adomavicius and Bockstedt, {Jesse C} and Vishnu Parimi",
year = "2008",
language = "English (US)",
pages = "121--126",
booktitle = "2008 Workshop on Information Technologies and Systems, WITS 2008",
publisher = "Social Science Research Network",

}

TY - GEN

T1 - Scalable temporal clustering for massive multidimensional data streams

AU - Adomavicius, Gediminas

AU - Bockstedt, Jesse C

AU - Parimi, Vishnu

PY - 2008

Y1 - 2008

N2 - Today's organizations are continuously capturing extremely large amounts of data, which will only continue to increase. In this paper we present a new approach to discovering clusters in these massive amounts of complex (i.e., multidimensional) continuously-arriving data, which are much too large to be analyzed as one dataset. In order to guarantee acceptable scalability, our approach builds on existing data mining literature and uses sampling-based techniques, an advanced variation of hierarchical agglomerative clustering, and an approach for sample-based cluster reconstruction to provide an approximate clustering solution of very high accuracy. We test the proposed approach empirically and show that it provides excellent clustering performance and, at the same time, demonstrates significant computational savings.

AB - Today's organizations are continuously capturing extremely large amounts of data, which will only continue to increase. In this paper we present a new approach to discovering clusters in these massive amounts of complex (i.e., multidimensional) continuously-arriving data, which are much too large to be analyzed as one dataset. In order to guarantee acceptable scalability, our approach builds on existing data mining literature and uses sampling-based techniques, an advanced variation of hierarchical agglomerative clustering, and an approach for sample-based cluster reconstruction to provide an approximate clustering solution of very high accuracy. We test the proposed approach empirically and show that it provides excellent clustering performance and, at the same time, demonstrates significant computational savings.

UR - http://www.scopus.com/inward/record.url?scp=84902166501&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902166501&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84902166501

SP - 121

EP - 126

BT - 2008 Workshop on Information Technologies and Systems, WITS 2008

PB - Social Science Research Network

ER -