On the choice of calibration metrics for "high-flow" estimation using hydrologic models

Naoki Mizukami, Oldrich Rakovec, Andrew J. Newman, Martyn P. Clark, Andrew W. Wood, Hoshin Vijai Gupta, Rohini Kumar

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Calibration is an essential step for improving the accuracy of simulations generated using hydrologic models. A key modeling decision is selecting the performance metric to be optimized. It has been common to use squared error performance metrics, or normalized variants such as Nash-Sutcliffe efficiency (NSE), based on the idea that their squared-error nature will emphasize the estimates of high flows. However, we conclude that NSE-based model calibrations actually result in poor reproduction of high-flow events, such as the annual peak flows that are used for flood frequency estimation. Using three different types of performance metrics, we calibrate two hydrological models at a daily step, the Variable Infiltration Capacity (VIC) model and the mesoscale Hydrologic Model (mHM), and evaluate their ability to simulate high-flow events for 492 basins throughout the contiguous United States. The metrics investigated are (1) NSE, (2) Kling-Gupta efficiency (KGE) and its variants, and (3) annual peak flow bias (APFB), where the latter is an application-specific metric that focuses on annual peak flows. As expected, the APFB metric produces the best annual peak flow estimates; however, performance on other high-flow-related metrics is poor. In contrast, the use of NSE results in annual peak flow estimates that are more than 20% worse, primarily due to the tendency of NSE to underestimate observed flow variability. On the other hand, the use of KGE results in annual peak flow estimates that are better than from NSE, owing to improved flow time series metrics (mean and variance), with only a slight degradation in performance with respect to other related metrics, particularly when a non-standard weighting of the components of KGE is used. Stochastically generated ensemble simulations based on model residuals show the ability to improve the high-flow metrics, regardless of the deterministic performances. However, we emphasize that improving the fidelity of streamflow dynamics from deterministically calibrated models is still important, as it may improve high-flow metrics (for the right reasons). Overall, this work highlights the need for a deeper understanding of performance metric behavior and design in relation to the desired goals of model calibration.

Original languageEnglish (US)
Pages (from-to)2601-2614
Number of pages14
JournalHydrology and Earth System Sciences
Volume23
Issue number6
DOIs
StatePublished - Jun 17 2019

Fingerprint

peak flow
calibration
flood frequency
simulation
streamflow
infiltration
time series
basin
modeling

ASJC Scopus subject areas

  • Water Science and Technology
  • Earth and Planetary Sciences (miscellaneous)

Cite this

On the choice of calibration metrics for "high-flow" estimation using hydrologic models. / Mizukami, Naoki; Rakovec, Oldrich; Newman, Andrew J.; Clark, Martyn P.; Wood, Andrew W.; Gupta, Hoshin Vijai; Kumar, Rohini.

In: Hydrology and Earth System Sciences, Vol. 23, No. 6, 17.06.2019, p. 2601-2614.

Research output: Contribution to journalArticle

Mizukami, Naoki ; Rakovec, Oldrich ; Newman, Andrew J. ; Clark, Martyn P. ; Wood, Andrew W. ; Gupta, Hoshin Vijai ; Kumar, Rohini. / On the choice of calibration metrics for "high-flow" estimation using hydrologic models. In: Hydrology and Earth System Sciences. 2019 ; Vol. 23, No. 6. pp. 2601-2614.
@article{2996abeab0d54d30b98c3c523bf81655,
title = "On the choice of calibration metrics for {"}high-flow{"} estimation using hydrologic models",
abstract = "Calibration is an essential step for improving the accuracy of simulations generated using hydrologic models. A key modeling decision is selecting the performance metric to be optimized. It has been common to use squared error performance metrics, or normalized variants such as Nash-Sutcliffe efficiency (NSE), based on the idea that their squared-error nature will emphasize the estimates of high flows. However, we conclude that NSE-based model calibrations actually result in poor reproduction of high-flow events, such as the annual peak flows that are used for flood frequency estimation. Using three different types of performance metrics, we calibrate two hydrological models at a daily step, the Variable Infiltration Capacity (VIC) model and the mesoscale Hydrologic Model (mHM), and evaluate their ability to simulate high-flow events for 492 basins throughout the contiguous United States. The metrics investigated are (1) NSE, (2) Kling-Gupta efficiency (KGE) and its variants, and (3) annual peak flow bias (APFB), where the latter is an application-specific metric that focuses on annual peak flows. As expected, the APFB metric produces the best annual peak flow estimates; however, performance on other high-flow-related metrics is poor. In contrast, the use of NSE results in annual peak flow estimates that are more than 20{\%} worse, primarily due to the tendency of NSE to underestimate observed flow variability. On the other hand, the use of KGE results in annual peak flow estimates that are better than from NSE, owing to improved flow time series metrics (mean and variance), with only a slight degradation in performance with respect to other related metrics, particularly when a non-standard weighting of the components of KGE is used. Stochastically generated ensemble simulations based on model residuals show the ability to improve the high-flow metrics, regardless of the deterministic performances. However, we emphasize that improving the fidelity of streamflow dynamics from deterministically calibrated models is still important, as it may improve high-flow metrics (for the right reasons). Overall, this work highlights the need for a deeper understanding of performance metric behavior and design in relation to the desired goals of model calibration.",
author = "Naoki Mizukami and Oldrich Rakovec and Newman, {Andrew J.} and Clark, {Martyn P.} and Wood, {Andrew W.} and Gupta, {Hoshin Vijai} and Rohini Kumar",
year = "2019",
month = "6",
day = "17",
doi = "10.5194/hess-23-2601-2019",
language = "English (US)",
volume = "23",
pages = "2601--2614",
journal = "Hydrology and Earth System Sciences",
issn = "1027-5606",
publisher = "European Geosciences Union",
number = "6",

}

TY - JOUR

T1 - On the choice of calibration metrics for "high-flow" estimation using hydrologic models

AU - Mizukami, Naoki

AU - Rakovec, Oldrich

AU - Newman, Andrew J.

AU - Clark, Martyn P.

AU - Wood, Andrew W.

AU - Gupta, Hoshin Vijai

AU - Kumar, Rohini

PY - 2019/6/17

Y1 - 2019/6/17

N2 - Calibration is an essential step for improving the accuracy of simulations generated using hydrologic models. A key modeling decision is selecting the performance metric to be optimized. It has been common to use squared error performance metrics, or normalized variants such as Nash-Sutcliffe efficiency (NSE), based on the idea that their squared-error nature will emphasize the estimates of high flows. However, we conclude that NSE-based model calibrations actually result in poor reproduction of high-flow events, such as the annual peak flows that are used for flood frequency estimation. Using three different types of performance metrics, we calibrate two hydrological models at a daily step, the Variable Infiltration Capacity (VIC) model and the mesoscale Hydrologic Model (mHM), and evaluate their ability to simulate high-flow events for 492 basins throughout the contiguous United States. The metrics investigated are (1) NSE, (2) Kling-Gupta efficiency (KGE) and its variants, and (3) annual peak flow bias (APFB), where the latter is an application-specific metric that focuses on annual peak flows. As expected, the APFB metric produces the best annual peak flow estimates; however, performance on other high-flow-related metrics is poor. In contrast, the use of NSE results in annual peak flow estimates that are more than 20% worse, primarily due to the tendency of NSE to underestimate observed flow variability. On the other hand, the use of KGE results in annual peak flow estimates that are better than from NSE, owing to improved flow time series metrics (mean and variance), with only a slight degradation in performance with respect to other related metrics, particularly when a non-standard weighting of the components of KGE is used. Stochastically generated ensemble simulations based on model residuals show the ability to improve the high-flow metrics, regardless of the deterministic performances. However, we emphasize that improving the fidelity of streamflow dynamics from deterministically calibrated models is still important, as it may improve high-flow metrics (for the right reasons). Overall, this work highlights the need for a deeper understanding of performance metric behavior and design in relation to the desired goals of model calibration.

AB - Calibration is an essential step for improving the accuracy of simulations generated using hydrologic models. A key modeling decision is selecting the performance metric to be optimized. It has been common to use squared error performance metrics, or normalized variants such as Nash-Sutcliffe efficiency (NSE), based on the idea that their squared-error nature will emphasize the estimates of high flows. However, we conclude that NSE-based model calibrations actually result in poor reproduction of high-flow events, such as the annual peak flows that are used for flood frequency estimation. Using three different types of performance metrics, we calibrate two hydrological models at a daily step, the Variable Infiltration Capacity (VIC) model and the mesoscale Hydrologic Model (mHM), and evaluate their ability to simulate high-flow events for 492 basins throughout the contiguous United States. The metrics investigated are (1) NSE, (2) Kling-Gupta efficiency (KGE) and its variants, and (3) annual peak flow bias (APFB), where the latter is an application-specific metric that focuses on annual peak flows. As expected, the APFB metric produces the best annual peak flow estimates; however, performance on other high-flow-related metrics is poor. In contrast, the use of NSE results in annual peak flow estimates that are more than 20% worse, primarily due to the tendency of NSE to underestimate observed flow variability. On the other hand, the use of KGE results in annual peak flow estimates that are better than from NSE, owing to improved flow time series metrics (mean and variance), with only a slight degradation in performance with respect to other related metrics, particularly when a non-standard weighting of the components of KGE is used. Stochastically generated ensemble simulations based on model residuals show the ability to improve the high-flow metrics, regardless of the deterministic performances. However, we emphasize that improving the fidelity of streamflow dynamics from deterministically calibrated models is still important, as it may improve high-flow metrics (for the right reasons). Overall, this work highlights the need for a deeper understanding of performance metric behavior and design in relation to the desired goals of model calibration.

UR - http://www.scopus.com/inward/record.url?scp=85067447861&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067447861&partnerID=8YFLogxK

U2 - 10.5194/hess-23-2601-2019

DO - 10.5194/hess-23-2601-2019

M3 - Article

AN - SCOPUS:85067447861

VL - 23

SP - 2601

EP - 2614

JO - Hydrology and Earth System Sciences

JF - Hydrology and Earth System Sciences

SN - 1027-5606

IS - 6

ER -