Computer algorithm for automated work group classification from free text: The DREAM technique

Philip I Harber, Lori Crawford, Amarpreet Cheema, Levanto Schacter

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

OBJECTIVE: This study developed and tested a computer method to automatically assign subjects to aggregate work groups based on their free text work descriptions. METHODS: The Double Root Extended Automated Matcher (DREAM) algorithm classifies individuals based on pairs of subjects' free text word roots in common with those of standard classification systems and several explicitly defined linkages between term roots and aggregates. RESULTS: DREAM effectively analyzed free text from 5887 participants in a multisite chronic obstructive pulmonary disease prevention study (Lung Health Study). For a test set of 533 cases, DREAMs classifications compared favorably with those of a four-human panel. The humans rated the accuracy of DREAM as good or better in 80% of the test cases. CONCLUSIONS: Automated text interpretation is a promising tool for analyzing large data sets for applications in data mining, research, and surveillance. Work descriptive information is most useful when it can link an individual to aggregate entities that have occupational health relevance. Determining the appropriate group requires considerable expertise. This article describes a new method for making such assignments using a computer algorithm to reduce dependence on the limited number of occupational health experts. In addition, computer algorithms foster consistency of assignments.

Original languageEnglish (US)
Pages (from-to)41-49
Number of pages9
JournalJournal of occupational and environmental medicine / American College of Occupational and Environmental Medicine
Volume49
Issue number1
DOIs
StatePublished - Jan 2007
Externally publishedYes

Fingerprint

Occupational Health
Data Mining
Chronic Obstructive Pulmonary Disease
Lung
Health
Research
Datasets

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health
  • Health, Toxicology and Mutagenesis

Cite this

Computer algorithm for automated work group classification from free text : The DREAM technique. / Harber, Philip I; Crawford, Lori; Cheema, Amarpreet; Schacter, Levanto.

In: Journal of occupational and environmental medicine / American College of Occupational and Environmental Medicine, Vol. 49, No. 1, 01.2007, p. 41-49.

Research output: Contribution to journalArticle

@article{df9b4ee302be4efcbde1147152d296aa,
title = "Computer algorithm for automated work group classification from free text: The DREAM technique",
abstract = "OBJECTIVE: This study developed and tested a computer method to automatically assign subjects to aggregate work groups based on their free text work descriptions. METHODS: The Double Root Extended Automated Matcher (DREAM) algorithm classifies individuals based on pairs of subjects' free text word roots in common with those of standard classification systems and several explicitly defined linkages between term roots and aggregates. RESULTS: DREAM effectively analyzed free text from 5887 participants in a multisite chronic obstructive pulmonary disease prevention study (Lung Health Study). For a test set of 533 cases, DREAMs classifications compared favorably with those of a four-human panel. The humans rated the accuracy of DREAM as good or better in 80{\%} of the test cases. CONCLUSIONS: Automated text interpretation is a promising tool for analyzing large data sets for applications in data mining, research, and surveillance. Work descriptive information is most useful when it can link an individual to aggregate entities that have occupational health relevance. Determining the appropriate group requires considerable expertise. This article describes a new method for making such assignments using a computer algorithm to reduce dependence on the limited number of occupational health experts. In addition, computer algorithms foster consistency of assignments.",
author = "Harber, {Philip I} and Lori Crawford and Amarpreet Cheema and Levanto Schacter",
year = "2007",
month = "1",
doi = "10.1097/01.jom.0000251826.37828.2e",
language = "English (US)",
volume = "49",
pages = "41--49",
journal = "Journal of Occupational and Environmental Medicine",
issn = "1076-2752",
publisher = "Lippincott Williams and Wilkins",
number = "1",

}

TY - JOUR

T1 - Computer algorithm for automated work group classification from free text

T2 - The DREAM technique

AU - Harber, Philip I

AU - Crawford, Lori

AU - Cheema, Amarpreet

AU - Schacter, Levanto

PY - 2007/1

Y1 - 2007/1

N2 - OBJECTIVE: This study developed and tested a computer method to automatically assign subjects to aggregate work groups based on their free text work descriptions. METHODS: The Double Root Extended Automated Matcher (DREAM) algorithm classifies individuals based on pairs of subjects' free text word roots in common with those of standard classification systems and several explicitly defined linkages between term roots and aggregates. RESULTS: DREAM effectively analyzed free text from 5887 participants in a multisite chronic obstructive pulmonary disease prevention study (Lung Health Study). For a test set of 533 cases, DREAMs classifications compared favorably with those of a four-human panel. The humans rated the accuracy of DREAM as good or better in 80% of the test cases. CONCLUSIONS: Automated text interpretation is a promising tool for analyzing large data sets for applications in data mining, research, and surveillance. Work descriptive information is most useful when it can link an individual to aggregate entities that have occupational health relevance. Determining the appropriate group requires considerable expertise. This article describes a new method for making such assignments using a computer algorithm to reduce dependence on the limited number of occupational health experts. In addition, computer algorithms foster consistency of assignments.

AB - OBJECTIVE: This study developed and tested a computer method to automatically assign subjects to aggregate work groups based on their free text work descriptions. METHODS: The Double Root Extended Automated Matcher (DREAM) algorithm classifies individuals based on pairs of subjects' free text word roots in common with those of standard classification systems and several explicitly defined linkages between term roots and aggregates. RESULTS: DREAM effectively analyzed free text from 5887 participants in a multisite chronic obstructive pulmonary disease prevention study (Lung Health Study). For a test set of 533 cases, DREAMs classifications compared favorably with those of a four-human panel. The humans rated the accuracy of DREAM as good or better in 80% of the test cases. CONCLUSIONS: Automated text interpretation is a promising tool for analyzing large data sets for applications in data mining, research, and surveillance. Work descriptive information is most useful when it can link an individual to aggregate entities that have occupational health relevance. Determining the appropriate group requires considerable expertise. This article describes a new method for making such assignments using a computer algorithm to reduce dependence on the limited number of occupational health experts. In addition, computer algorithms foster consistency of assignments.

UR - http://www.scopus.com/inward/record.url?scp=33846258471&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846258471&partnerID=8YFLogxK

U2 - 10.1097/01.jom.0000251826.37828.2e

DO - 10.1097/01.jom.0000251826.37828.2e

M3 - Article

C2 - 17215712

AN - SCOPUS:33846258471

VL - 49

SP - 41

EP - 49

JO - Journal of Occupational and Environmental Medicine

JF - Journal of Occupational and Environmental Medicine

SN - 1076-2752

IS - 1

ER -