### Abstract

This paper defines a new scoring rule, namely relative model score (RMS), for evaluating ensemble simulations of environmental models. RMS implicitly incorporates the measures of ensemble mean accuracy, prediction interval precision, and prediction interval reliability for evaluating the overall model predictive performance. RMS is numerically evaluated from the probability density functions of ensemble simulations given by individual models or several models via model averaging. We demonstrate the advantages of using RMS through an example of soil respiration modeling. The example considers two alternative models with different fidelity, and for each model Bayesian inverse modeling is conducted using two different likelihood functions. This gives four single-model ensembles of model simulations. For each likelihood function, Bayesian model averaging is applied to the ensemble simulations of the two models, resulting in two multi-model prediction ensembles. Predictive performance for these ensembles is evaluated using various scoring rules. Results show that RMS outperforms the commonly used scoring rules of log-score, pseudo Bayes factor based on Bayesian model evidence (BME), and continuous ranked probability score (CRPS). RMS avoids the problem of rounding error specific to log-score. Being applicable to any likelihood functions, RMS has broader applicability than BME that is only applicable to the same likelihood function of multiple models. By directly considering the relative score of candidate models at each cross-validation datum, RMS results in more plausible model ranking than CRPS. Therefore, RMS is considered as a robust scoring rule for evaluating predictive performance of single-model and multi-model prediction ensembles.

Original language | English (US) |
---|---|

Journal | Stochastic Environmental Research and Risk Assessment |

DOIs | |

State | Accepted/In press - Jan 1 2018 |

### Fingerprint

### Keywords

- Bayes factor
- Continuous ranked probability score
- Dispersion
- Log-score
- Reliability
- Scoring rule

### ASJC Scopus subject areas

- Environmental Engineering
- Environmental Chemistry
- Water Science and Technology
- Safety, Risk, Reliability and Quality
- Environmental Science(all)

### Cite this

*Stochastic Environmental Research and Risk Assessment*. https://doi.org/10.1007/s00477-018-1592-3

**Relative model score : a scoring rule for evaluating ensemble simulations with application to microbial soil respiration modeling.** / Elshall, Ahmed S.; Ye, Ming; Pei, Yongzhen; Zhang, Fan; Niu, Guo-Yue; Barron-Gafford, Greg A.

Research output: Contribution to journal › Article

*Stochastic Environmental Research and Risk Assessment*. https://doi.org/10.1007/s00477-018-1592-3

}

TY - JOUR

T1 - Relative model score

T2 - a scoring rule for evaluating ensemble simulations with application to microbial soil respiration modeling

AU - Elshall, Ahmed S.

AU - Ye, Ming

AU - Pei, Yongzhen

AU - Zhang, Fan

AU - Niu, Guo-Yue

AU - Barron-Gafford, Greg A

PY - 2018/1/1

Y1 - 2018/1/1

N2 - This paper defines a new scoring rule, namely relative model score (RMS), for evaluating ensemble simulations of environmental models. RMS implicitly incorporates the measures of ensemble mean accuracy, prediction interval precision, and prediction interval reliability for evaluating the overall model predictive performance. RMS is numerically evaluated from the probability density functions of ensemble simulations given by individual models or several models via model averaging. We demonstrate the advantages of using RMS through an example of soil respiration modeling. The example considers two alternative models with different fidelity, and for each model Bayesian inverse modeling is conducted using two different likelihood functions. This gives four single-model ensembles of model simulations. For each likelihood function, Bayesian model averaging is applied to the ensemble simulations of the two models, resulting in two multi-model prediction ensembles. Predictive performance for these ensembles is evaluated using various scoring rules. Results show that RMS outperforms the commonly used scoring rules of log-score, pseudo Bayes factor based on Bayesian model evidence (BME), and continuous ranked probability score (CRPS). RMS avoids the problem of rounding error specific to log-score. Being applicable to any likelihood functions, RMS has broader applicability than BME that is only applicable to the same likelihood function of multiple models. By directly considering the relative score of candidate models at each cross-validation datum, RMS results in more plausible model ranking than CRPS. Therefore, RMS is considered as a robust scoring rule for evaluating predictive performance of single-model and multi-model prediction ensembles.

AB - This paper defines a new scoring rule, namely relative model score (RMS), for evaluating ensemble simulations of environmental models. RMS implicitly incorporates the measures of ensemble mean accuracy, prediction interval precision, and prediction interval reliability for evaluating the overall model predictive performance. RMS is numerically evaluated from the probability density functions of ensemble simulations given by individual models or several models via model averaging. We demonstrate the advantages of using RMS through an example of soil respiration modeling. The example considers two alternative models with different fidelity, and for each model Bayesian inverse modeling is conducted using two different likelihood functions. This gives four single-model ensembles of model simulations. For each likelihood function, Bayesian model averaging is applied to the ensemble simulations of the two models, resulting in two multi-model prediction ensembles. Predictive performance for these ensembles is evaluated using various scoring rules. Results show that RMS outperforms the commonly used scoring rules of log-score, pseudo Bayes factor based on Bayesian model evidence (BME), and continuous ranked probability score (CRPS). RMS avoids the problem of rounding error specific to log-score. Being applicable to any likelihood functions, RMS has broader applicability than BME that is only applicable to the same likelihood function of multiple models. By directly considering the relative score of candidate models at each cross-validation datum, RMS results in more plausible model ranking than CRPS. Therefore, RMS is considered as a robust scoring rule for evaluating predictive performance of single-model and multi-model prediction ensembles.

KW - Bayes factor

KW - Continuous ranked probability score

KW - Dispersion

KW - Log-score

KW - Reliability

KW - Scoring rule

UR - http://www.scopus.com/inward/record.url?scp=85051490221&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051490221&partnerID=8YFLogxK

U2 - 10.1007/s00477-018-1592-3

DO - 10.1007/s00477-018-1592-3

M3 - Article

AN - SCOPUS:85051490221

JO - Stochastic Environmental Research and Risk Assessment

JF - Stochastic Environmental Research and Risk Assessment

SN - 1436-3240

ER -