Multilingual web retrieval: An experiment in English-Chinese business intelligence

Jialun Qin, Yilu Zhou, Michael Chau, Hsinchun Chen

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-by-word translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.

Original languageEnglish (US)
Pages (from-to)671-683
Number of pages13
JournalJournal of the American Society for Information Science and Technology
Volume57
Issue number5
DOIs
StatePublished - Mar 2006

Fingerprint

Competitive intelligence
Query languages
information retrieval
experiment
language
Experiments
Glossaries
Business intelligence
Experiment
World Wide Web
Cross-language information retrieval
dictionary
performance
news
expert
Language
Industry

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

Multilingual web retrieval : An experiment in English-Chinese business intelligence. / Qin, Jialun; Zhou, Yilu; Chau, Michael; Chen, Hsinchun.

In: Journal of the American Society for Information Science and Technology, Vol. 57, No. 5, 03.2006, p. 671-683.

Research output: Contribution to journalArticle

@article{ccb75a8023864709aab73806001c4a35,
title = "Multilingual web retrieval: An experiment in English-Chinese business intelligence",
abstract = "As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6{\%} improvement in precision over simple word-by-word translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0{\%} improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.",
author = "Jialun Qin and Yilu Zhou and Michael Chau and Hsinchun Chen",
year = "2006",
month = "3",
doi = "10.1002/asi.20329",
language = "English (US)",
volume = "57",
pages = "671--683",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "5",

}

TY - JOUR

T1 - Multilingual web retrieval

T2 - An experiment in English-Chinese business intelligence

AU - Qin, Jialun

AU - Zhou, Yilu

AU - Chau, Michael

AU - Chen, Hsinchun

PY - 2006/3

Y1 - 2006/3

N2 - As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-by-word translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.

AB - As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIR techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-by-word translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.

UR - http://www.scopus.com/inward/record.url?scp=33645018819&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33645018819&partnerID=8YFLogxK

U2 - 10.1002/asi.20329

DO - 10.1002/asi.20329

M3 - Article

AN - SCOPUS:33645018819

VL - 57

SP - 671

EP - 683

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 5

ER -