Moving from data-constrained to data-enabled research: Experiences and challenges in collecting, validating and analyzing large-scale e-commerce data

Ravi Bapna, Paulo B Goes, Ram Gopal, James R. Marsden

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.

Original languageEnglish (US)
Pages (from-to)116-130
Number of pages15
JournalStatistical Science
Volume21
Issue number2
DOIs
StatePublished - May 2006
Externally publishedYes

Fingerprint

Electronic Commerce
Experimental Economics
Economics
Online Auctions
Experience
World Wide Web
Electronic commerce
Music
Period of time
Large Data Sets
Experimentation
Pricing
Assign
Sharing
Testing
Costs
Demonstrate
Data collection

Keywords

  • Internet data
  • Large-scale
  • Music file sharing
  • Online auctions
  • Web crawling agents

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Moving from data-constrained to data-enabled research : Experiences and challenges in collecting, validating and analyzing large-scale e-commerce data. / Bapna, Ravi; Goes, Paulo B; Gopal, Ram; Marsden, James R.

In: Statistical Science, Vol. 21, No. 2, 05.2006, p. 116-130.

Research output: Contribution to journalArticle

@article{aad0123baa8040d0abe45a361563f9fd,
title = "Moving from data-constrained to data-enabled research: Experiences and challenges in collecting, validating and analyzing large-scale e-commerce data",
abstract = "Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.",
keywords = "Internet data, Large-scale, Music file sharing, Online auctions, Web crawling agents",
author = "Ravi Bapna and Goes, {Paulo B} and Ram Gopal and Marsden, {James R.}",
year = "2006",
month = "5",
doi = "10.1214/088342306000000231",
language = "English (US)",
volume = "21",
pages = "116--130",
journal = "Statistical Science",
issn = "0883-4237",
publisher = "Institute of Mathematical Statistics",
number = "2",

}

TY - JOUR

T1 - Moving from data-constrained to data-enabled research

T2 - Experiences and challenges in collecting, validating and analyzing large-scale e-commerce data

AU - Bapna, Ravi

AU - Goes, Paulo B

AU - Gopal, Ram

AU - Marsden, James R.

PY - 2006/5

Y1 - 2006/5

N2 - Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.

AB - Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.

KW - Internet data

KW - Large-scale

KW - Music file sharing

KW - Online auctions

KW - Web crawling agents

UR - http://www.scopus.com/inward/record.url?scp=33748568772&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748568772&partnerID=8YFLogxK

U2 - 10.1214/088342306000000231

DO - 10.1214/088342306000000231

M3 - Article

AN - SCOPUS:33748568772

VL - 21

SP - 116

EP - 130

JO - Statistical Science

JF - Statistical Science

SN - 0883-4237

IS - 2

ER -