Using genetic algorithm in building domain-specific collections

An experiment in the nanotechnology domain

Jialun Qin, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

As the key technique to build domain-specific search engines, focused crawling has drawn a lot of attention from researchers in the past decade. However, as Web structure analysis techniques advance, several problems in traditional focused crawler design were revealed and they could result in domain-specific collections with low quality. In this work, we studied the problems of focused crawling that are caused by using local search algorithms. We also proposed to use a global search algorithm, the Genetic Algorithm, in focused crawling to address the problems. We conducted evaluation experiments to examine the effectiveness of our approach. The results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques. Furthermore, we used the concept of Web communities to evaluate how comprehensively the focused crawlers could traverse the Web search space, which could be a good complement to the traditional focused crawler evaluation methods.

Original languageEnglish (US)
Title of host publicationProceedings of the Annual Hawaii International Conference on System Sciences
EditorsR.H. Spraque, Jr.
Pages102
Number of pages1
StatePublished - 2005
Externally publishedYes
Event38th Annual Hawaii International Conference on System Sciences - Big Island, HI, United States
Duration: Jan 3 2005Jan 6 2005

Other

Other38th Annual Hawaii International Conference on System Sciences
CountryUnited States
CityBig Island, HI
Period1/3/051/6/05

Fingerprint

Nanotechnology
Genetic algorithms
Search engines
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Qin, J., & Chen, H. (2005). Using genetic algorithm in building domain-specific collections: An experiment in the nanotechnology domain. In R. H. Spraque, Jr. (Ed.), Proceedings of the Annual Hawaii International Conference on System Sciences (pp. 102)

Using genetic algorithm in building domain-specific collections : An experiment in the nanotechnology domain. / Qin, Jialun; Chen, Hsinchun.

Proceedings of the Annual Hawaii International Conference on System Sciences. ed. / R.H. Spraque, Jr. 2005. p. 102.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Qin, J & Chen, H 2005, Using genetic algorithm in building domain-specific collections: An experiment in the nanotechnology domain. in RH Spraque, Jr. (ed.), Proceedings of the Annual Hawaii International Conference on System Sciences. pp. 102, 38th Annual Hawaii International Conference on System Sciences, Big Island, HI, United States, 1/3/05.
Qin J, Chen H. Using genetic algorithm in building domain-specific collections: An experiment in the nanotechnology domain. In Spraque, Jr. RH, editor, Proceedings of the Annual Hawaii International Conference on System Sciences. 2005. p. 102
Qin, Jialun ; Chen, Hsinchun. / Using genetic algorithm in building domain-specific collections : An experiment in the nanotechnology domain. Proceedings of the Annual Hawaii International Conference on System Sciences. editor / R.H. Spraque, Jr. 2005. pp. 102
@inproceedings{61876e355ae34f63a7ba65279fbbccee,
title = "Using genetic algorithm in building domain-specific collections: An experiment in the nanotechnology domain",
abstract = "As the key technique to build domain-specific search engines, focused crawling has drawn a lot of attention from researchers in the past decade. However, as Web structure analysis techniques advance, several problems in traditional focused crawler design were revealed and they could result in domain-specific collections with low quality. In this work, we studied the problems of focused crawling that are caused by using local search algorithms. We also proposed to use a global search algorithm, the Genetic Algorithm, in focused crawling to address the problems. We conducted evaluation experiments to examine the effectiveness of our approach. The results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques. Furthermore, we used the concept of Web communities to evaluate how comprehensively the focused crawlers could traverse the Web search space, which could be a good complement to the traditional focused crawler evaluation methods.",
author = "Jialun Qin and Hsinchun Chen",
year = "2005",
language = "English (US)",
pages = "102",
editor = "{Spraque, Jr.}, R.H.",
booktitle = "Proceedings of the Annual Hawaii International Conference on System Sciences",

}

TY - GEN

T1 - Using genetic algorithm in building domain-specific collections

T2 - An experiment in the nanotechnology domain

AU - Qin, Jialun

AU - Chen, Hsinchun

PY - 2005

Y1 - 2005

N2 - As the key technique to build domain-specific search engines, focused crawling has drawn a lot of attention from researchers in the past decade. However, as Web structure analysis techniques advance, several problems in traditional focused crawler design were revealed and they could result in domain-specific collections with low quality. In this work, we studied the problems of focused crawling that are caused by using local search algorithms. We also proposed to use a global search algorithm, the Genetic Algorithm, in focused crawling to address the problems. We conducted evaluation experiments to examine the effectiveness of our approach. The results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques. Furthermore, we used the concept of Web communities to evaluate how comprehensively the focused crawlers could traverse the Web search space, which could be a good complement to the traditional focused crawler evaluation methods.

AB - As the key technique to build domain-specific search engines, focused crawling has drawn a lot of attention from researchers in the past decade. However, as Web structure analysis techniques advance, several problems in traditional focused crawler design were revealed and they could result in domain-specific collections with low quality. In this work, we studied the problems of focused crawling that are caused by using local search algorithms. We also proposed to use a global search algorithm, the Genetic Algorithm, in focused crawling to address the problems. We conducted evaluation experiments to examine the effectiveness of our approach. The results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques. Furthermore, we used the concept of Web communities to evaluate how comprehensively the focused crawlers could traverse the Web search space, which could be a good complement to the traditional focused crawler evaluation methods.

UR - http://www.scopus.com/inward/record.url?scp=27544485054&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544485054&partnerID=8YFLogxK

M3 - Conference contribution

SP - 102

BT - Proceedings of the Annual Hawaii International Conference on System Sciences

A2 - Spraque, Jr., R.H.

ER -