Identity matching and information acquisition: Estimation of optimal threshold parameters

Pantea Alirezazadeh, Fidan Boylu, Robert Garfinkel, Ram Gopal, Paulo Goes

Research output: Contribution to journalArticle

Abstract

With the growing volume of collected and stored data from customer interactions that have recently shifted towards online channels, an important challenge faced by today's businesses is appropriately dealing with data quality problems. A key step in the data cleaning process is the matching and merging of customer records to assess the identity of individuals. The practical importance of this research is exemplified by a large client firm that deals with private label credit cards. They needed to know whether there existed histories of new customers within the company, in order to decide on the appropriate parameters of possible card offerings. The company incurs substantial costs if they incorrectly "match" an incoming application with an existing customer (Type I error), and also if they falsely assume that there is no match (Type II error). While there is a good deal of generic identity matching software available, that will provide a "strength" score for each potential match, the question of how to use the scores for new applications is of great interest and is addressed in this work. The academic significance lies in the analysis of the score thresholds that are typically used in decision making. That is, upper and lower thresholds are set, where matches are accepted above the former, rejected below the latter, and more information is gathered between the two. We show, for the first time, that the optimal thresholds can be considered to be parameters of a matching distribution, and a number of estimators of these parameters are developed and analyzed. Then extensive computations show the effects of various factors on the convergence rates of the estimates.

Original languageEnglish (US)
Pages (from-to)160-171
Number of pages12
JournalDecision Support Systems
Volume57
Issue number1
DOIs
StatePublished - Jan 1 2014

Keywords

  • Data quality
  • Information acquisition
  • Record matching
  • Sampling distributions
  • Statistical estimation
  • Type I and Type II errors

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Identity matching and information acquisition: Estimation of optimal threshold parameters'. Together they form a unique fingerprint.

  • Cite this