An Algorithm for Generating Individualized Treatment Decision Trees and Random Forests

Kevin Doubleday, Hua Zhou, Haoda Fu, Jin Zhou

Research output: Contribution to journalArticle

Abstract

With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice.   Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience.  Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (https://github.com/kdoub5ha/ITR.Forest). Supplementary materials for this article are available online.

Original languageEnglish (US)
Pages (from-to)849-860
Number of pages12
JournalJournal of Computational and Graphical Statistics
Volume27
Issue number4
DOIs
StatePublished - Oct 2 2018

Fingerprint

Random Forest
Decision tree
Reward
Diabetes
Tree Algorithms
Decision Rules
Medicine
Randomized Controlled Trial
Statistical Software
Interpretability
Interaction
Statistical method
Healthcare
Dimensionality
Covariates
Recommendations
Support Vector Machine
Ensemble
High-dimensional
Maximise

Keywords

  • Optimization
  • Precision medicine
  • Recursive partitioning
  • Subgroup identification
  • Value function
  • Variable importance

ASJC Scopus subject areas

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Cite this

An Algorithm for Generating Individualized Treatment Decision Trees and Random Forests. / Doubleday, Kevin; Zhou, Hua; Fu, Haoda; Zhou, Jin.

In: Journal of Computational and Graphical Statistics, Vol. 27, No. 4, 02.10.2018, p. 849-860.

Research output: Contribution to journalArticle

@article{54b94ca7023b49bc88b297c5af357d86,
title = "An Algorithm for Generating Individualized Treatment Decision Trees and Random Forests",
abstract = "With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice.   Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience.  Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (https://github.com/kdoub5ha/ITR.Forest). Supplementary materials for this article are available online.",
keywords = "Optimization, Precision medicine, Recursive partitioning, Subgroup identification, Value function, Variable importance",
author = "Kevin Doubleday and Hua Zhou and Haoda Fu and Jin Zhou",
year = "2018",
month = "10",
day = "2",
doi = "10.1080/10618600.2018.1451337",
language = "English (US)",
volume = "27",
pages = "849--860",
journal = "Journal of Computational and Graphical Statistics",
issn = "1061-8600",
publisher = "American Statistical Association",
number = "4",

}

TY - JOUR

T1 - An Algorithm for Generating Individualized Treatment Decision Trees and Random Forests

AU - Doubleday, Kevin

AU - Zhou, Hua

AU - Fu, Haoda

AU - Zhou, Jin

PY - 2018/10/2

Y1 - 2018/10/2

N2 - With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice.   Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience.  Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (https://github.com/kdoub5ha/ITR.Forest). Supplementary materials for this article are available online.

AB - With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice.   Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience.  Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (https://github.com/kdoub5ha/ITR.Forest). Supplementary materials for this article are available online.

KW - Optimization

KW - Precision medicine

KW - Recursive partitioning

KW - Subgroup identification

KW - Value function

KW - Variable importance

UR - http://www.scopus.com/inward/record.url?scp=85058647875&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058647875&partnerID=8YFLogxK

U2 - 10.1080/10618600.2018.1451337

DO - 10.1080/10618600.2018.1451337

M3 - Article

VL - 27

SP - 849

EP - 860

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

SN - 1061-8600

IS - 4

ER -