SpidersRUs: Creating specialized search engines in multiple languages

Michael Chau, Jialun Qin, Yilu Zhou, Chunju Tseng, Hsinchun Chen

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic.

Original languageEnglish (US)
Pages (from-to)621-640
Number of pages20
JournalDecision Support Systems
Volume45
Issue number3
DOIs
StatePublished - Jun 2008

Fingerprint

Search Engine
Search engines
Language
Spiders
Graphical user interfaces
Search engine
World Wide Web
Software
Module
Research

Keywords

  • Information retrieval
  • Multilingual search engines
  • Search engine development

ASJC Scopus subject areas

  • Management Information Systems
  • Information Systems
  • Information Systems and Management

Cite this

SpidersRUs : Creating specialized search engines in multiple languages. / Chau, Michael; Qin, Jialun; Zhou, Yilu; Tseng, Chunju; Chen, Hsinchun.

In: Decision Support Systems, Vol. 45, No. 3, 06.2008, p. 621-640.

Research output: Contribution to journalArticle

Chau, Michael ; Qin, Jialun ; Zhou, Yilu ; Tseng, Chunju ; Chen, Hsinchun. / SpidersRUs : Creating specialized search engines in multiple languages. In: Decision Support Systems. 2008 ; Vol. 45, No. 3. pp. 621-640.
@article{09ea9c9466b14b83a022521efd82a99a,
title = "SpidersRUs: Creating specialized search engines in multiple languages",
abstract = "While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic.",
keywords = "Information retrieval, Multilingual search engines, Search engine development",
author = "Michael Chau and Jialun Qin and Yilu Zhou and Chunju Tseng and Hsinchun Chen",
year = "2008",
month = "6",
doi = "10.1016/j.dss.2007.07.006",
language = "English (US)",
volume = "45",
pages = "621--640",
journal = "Decision Support Systems",
issn = "0167-9236",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - SpidersRUs

T2 - Creating specialized search engines in multiple languages

AU - Chau, Michael

AU - Qin, Jialun

AU - Zhou, Yilu

AU - Tseng, Chunju

AU - Chen, Hsinchun

PY - 2008/6

Y1 - 2008/6

N2 - While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic.

AB - While small-scale search engines in specific domains and languages are increasingly used by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we review related literature and suggest the criteria for an ideal search tool. We present the design of a toolkit, called SpidersRUs, developed for multilingual search engine creation. The design and implementation of the tool, consisting of a Spider module, an Indexer module, an Index Structure, a Search module, and a Graphical User Interface module, are discussed in detail. A sample user session and a case study on using the tool to develop a medical search engine in Chinese are also presented. The technical issues involved and the lessons learned in the project are then discussed. This study demonstrates that the proposed architecture is feasible in developing search engines easily in different languages such as Chinese, Spanish, Japanese, and Arabic.

KW - Information retrieval

KW - Multilingual search engines

KW - Search engine development

UR - http://www.scopus.com/inward/record.url?scp=44849099980&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44849099980&partnerID=8YFLogxK

U2 - 10.1016/j.dss.2007.07.006

DO - 10.1016/j.dss.2007.07.006

M3 - Article

AN - SCOPUS:44849099980

VL - 45

SP - 621

EP - 640

JO - Decision Support Systems

JF - Decision Support Systems

SN - 0167-9236

IS - 3

ER -