Dependence of Bayesian Model Selection Criteria and Fisher Information Matrix on Sample Size

Dan Lu, Ming Ye, Shlomo P Neuman

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Geostatistical analyses require an estimation of the covariance structure of a random field and its parameters jointly from noisy data. Whereas in some cases (as in that of a Matérn variogram) a range of structural models can be captured with one or a few parameters, in many other cases it is necessary to consider a discrete set of structural model alternatives, such as drifts and variograms. Ranking these alternatives and identifying the best among them has traditionally been done with the aid of information theoretic or Bayesian model selection criteria. There is an ongoing debate in the literature about the relative merits of these various criteria. We contribute to this discussion by using synthetic data to compare the abilities of two common Bayesian criteria, BIC and KIC, to discriminate between alternative models of drift as a function of sample size when drift and variogram parameters are unknown. Adopting the results of Markov Chain Monte Carlo simulations as reference we confirm that KIC reduces asymptotically to BIC and provides consistently more reliable indications of model quality than does BIC for samples of all sizes. Practical considerations often cause analysts to replace the observed Fisher information matrix entering into KIC with its expected value. Our results show that this causes the performance of KIC to deteriorate with diminishing sample size. These results are equally valid for one and multiple realizations of uncertain data entering into our analysis. Bayesian theory indicates that, in the case of statistically independent and identically distributed data, posterior model probabilities become asymptotically insensitive to prior probabilities as sample size increases. We do not find this to be the case when working with samples taken from an autocorrelated random field.

Original languageEnglish (US)
Pages (from-to)971-993
Number of pages23
JournalMathematical Geosciences
Volume43
Issue number8
DOIs
StatePublished - Nov 2011

Fingerprint

Bayesian Model Selection
Model Selection Criteria
Variogram
Fisher Information Matrix
Sample Size
Structural Model
Random Field
matrix
Alternatives
variogram
Observed Information
Markov Chain Monte Carlo Simulation
Uncertain Data
Prior Probability
Diminishing
Covariance Structure
Probability Model
Noisy Data
Synthetic Data
Expected Value

Keywords

  • Asymptotic analysis
  • Drift models
  • Model selection
  • Model uncertainty
  • Prior model probability
  • Variogram models

ASJC Scopus subject areas

  • Mathematics (miscellaneous)
  • Earth and Planetary Sciences(all)

Cite this

Dependence of Bayesian Model Selection Criteria and Fisher Information Matrix on Sample Size. / Lu, Dan; Ye, Ming; Neuman, Shlomo P.

In: Mathematical Geosciences, Vol. 43, No. 8, 11.2011, p. 971-993.

Research output: Contribution to journalArticle

@article{614107b4e6164721ac3d04948d60671c,
title = "Dependence of Bayesian Model Selection Criteria and Fisher Information Matrix on Sample Size",
abstract = "Geostatistical analyses require an estimation of the covariance structure of a random field and its parameters jointly from noisy data. Whereas in some cases (as in that of a Mat{\'e}rn variogram) a range of structural models can be captured with one or a few parameters, in many other cases it is necessary to consider a discrete set of structural model alternatives, such as drifts and variograms. Ranking these alternatives and identifying the best among them has traditionally been done with the aid of information theoretic or Bayesian model selection criteria. There is an ongoing debate in the literature about the relative merits of these various criteria. We contribute to this discussion by using synthetic data to compare the abilities of two common Bayesian criteria, BIC and KIC, to discriminate between alternative models of drift as a function of sample size when drift and variogram parameters are unknown. Adopting the results of Markov Chain Monte Carlo simulations as reference we confirm that KIC reduces asymptotically to BIC and provides consistently more reliable indications of model quality than does BIC for samples of all sizes. Practical considerations often cause analysts to replace the observed Fisher information matrix entering into KIC with its expected value. Our results show that this causes the performance of KIC to deteriorate with diminishing sample size. These results are equally valid for one and multiple realizations of uncertain data entering into our analysis. Bayesian theory indicates that, in the case of statistically independent and identically distributed data, posterior model probabilities become asymptotically insensitive to prior probabilities as sample size increases. We do not find this to be the case when working with samples taken from an autocorrelated random field.",
keywords = "Asymptotic analysis, Drift models, Model selection, Model uncertainty, Prior model probability, Variogram models",
author = "Dan Lu and Ming Ye and Neuman, {Shlomo P}",
year = "2011",
month = "11",
doi = "10.1007/s11004-011-9359-0",
language = "English (US)",
volume = "43",
pages = "971--993",
journal = "Mathematical Geosciences",
issn = "1874-8961",
publisher = "Springer Netherlands",
number = "8",

}

TY - JOUR

T1 - Dependence of Bayesian Model Selection Criteria and Fisher Information Matrix on Sample Size

AU - Lu, Dan

AU - Ye, Ming

AU - Neuman, Shlomo P

PY - 2011/11

Y1 - 2011/11

N2 - Geostatistical analyses require an estimation of the covariance structure of a random field and its parameters jointly from noisy data. Whereas in some cases (as in that of a Matérn variogram) a range of structural models can be captured with one or a few parameters, in many other cases it is necessary to consider a discrete set of structural model alternatives, such as drifts and variograms. Ranking these alternatives and identifying the best among them has traditionally been done with the aid of information theoretic or Bayesian model selection criteria. There is an ongoing debate in the literature about the relative merits of these various criteria. We contribute to this discussion by using synthetic data to compare the abilities of two common Bayesian criteria, BIC and KIC, to discriminate between alternative models of drift as a function of sample size when drift and variogram parameters are unknown. Adopting the results of Markov Chain Monte Carlo simulations as reference we confirm that KIC reduces asymptotically to BIC and provides consistently more reliable indications of model quality than does BIC for samples of all sizes. Practical considerations often cause analysts to replace the observed Fisher information matrix entering into KIC with its expected value. Our results show that this causes the performance of KIC to deteriorate with diminishing sample size. These results are equally valid for one and multiple realizations of uncertain data entering into our analysis. Bayesian theory indicates that, in the case of statistically independent and identically distributed data, posterior model probabilities become asymptotically insensitive to prior probabilities as sample size increases. We do not find this to be the case when working with samples taken from an autocorrelated random field.

AB - Geostatistical analyses require an estimation of the covariance structure of a random field and its parameters jointly from noisy data. Whereas in some cases (as in that of a Matérn variogram) a range of structural models can be captured with one or a few parameters, in many other cases it is necessary to consider a discrete set of structural model alternatives, such as drifts and variograms. Ranking these alternatives and identifying the best among them has traditionally been done with the aid of information theoretic or Bayesian model selection criteria. There is an ongoing debate in the literature about the relative merits of these various criteria. We contribute to this discussion by using synthetic data to compare the abilities of two common Bayesian criteria, BIC and KIC, to discriminate between alternative models of drift as a function of sample size when drift and variogram parameters are unknown. Adopting the results of Markov Chain Monte Carlo simulations as reference we confirm that KIC reduces asymptotically to BIC and provides consistently more reliable indications of model quality than does BIC for samples of all sizes. Practical considerations often cause analysts to replace the observed Fisher information matrix entering into KIC with its expected value. Our results show that this causes the performance of KIC to deteriorate with diminishing sample size. These results are equally valid for one and multiple realizations of uncertain data entering into our analysis. Bayesian theory indicates that, in the case of statistically independent and identically distributed data, posterior model probabilities become asymptotically insensitive to prior probabilities as sample size increases. We do not find this to be the case when working with samples taken from an autocorrelated random field.

KW - Asymptotic analysis

KW - Drift models

KW - Model selection

KW - Model uncertainty

KW - Prior model probability

KW - Variogram models

UR - http://www.scopus.com/inward/record.url?scp=80855141656&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80855141656&partnerID=8YFLogxK

U2 - 10.1007/s11004-011-9359-0

DO - 10.1007/s11004-011-9359-0

M3 - Article

VL - 43

SP - 971

EP - 993

JO - Mathematical Geosciences

JF - Mathematical Geosciences

SN - 1874-8961

IS - 8

ER -