Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries

Gondy Augusta Leroy, James E. Endicott

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, 'term familiarity', which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages307-310
Number of pages4
Volume7008 LNCS
DOIs
StatePublished - 2011
Externally publishedYes
Event13th International Conference on Asia-Pacific Digital Libraries, ICADL 2011 - Beijing, China
Duration: Oct 24 2011Oct 27 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7008 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other13th International Conference on Asia-Pacific Digital Libraries, ICADL 2011
CountryChina
CityBeijing
Period10/24/1110/27/11

Fingerprint

Blogs
Digital libraries
Analog to digital conversion
Digital Libraries
Labeling
Education
Term
Digitization
User Studies
Text
Corpus

Keywords

  • Actual Difficulty
  • Health Informatics
  • Lexical Tags
  • Meta Information
  • Natural Language Processing
  • Perceived Difficulty

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Leroy, G. A., & Endicott, J. E. (2011). Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7008 LNCS, pp. 307-310). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7008 LNCS). https://doi.org/10.1007/978-3-642-24826-9_38

Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. / Leroy, Gondy Augusta; Endicott, James E.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7008 LNCS 2011. p. 307-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7008 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Leroy, GA & Endicott, JE 2011, Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7008 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7008 LNCS, pp. 307-310, 13th International Conference on Asia-Pacific Digital Libraries, ICADL 2011, Beijing, China, 10/24/11. https://doi.org/10.1007/978-3-642-24826-9_38
Leroy GA, Endicott JE. Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7008 LNCS. 2011. p. 307-310. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-24826-9_38
Leroy, Gondy Augusta ; Endicott, James E. / Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7008 LNCS 2011. pp. 307-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{304c1dde298e4abe93a15756d0b64fde,
title = "Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries",
abstract = "With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, 'term familiarity', which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.",
keywords = "Actual Difficulty, Health Informatics, Lexical Tags, Meta Information, Natural Language Processing, Perceived Difficulty",
author = "Leroy, {Gondy Augusta} and Endicott, {James E.}",
year = "2011",
doi = "10.1007/978-3-642-24826-9_38",
language = "English (US)",
isbn = "9783642248252",
volume = "7008 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "307--310",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries

AU - Leroy, Gondy Augusta

AU - Endicott, James E.

PY - 2011

Y1 - 2011

N2 - With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, 'term familiarity', which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.

AB - With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, 'term familiarity', which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.

KW - Actual Difficulty

KW - Health Informatics

KW - Lexical Tags

KW - Meta Information

KW - Natural Language Processing

KW - Perceived Difficulty

UR - http://www.scopus.com/inward/record.url?scp=80455144469&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80455144469&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-24826-9_38

DO - 10.1007/978-3-642-24826-9_38

M3 - Conference contribution

AN - SCOPUS:80455144469

SN - 9783642248252

VL - 7008 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 307

EP - 310

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -