Using content-based and link-based analysis in building vertical search engines

Michael Chau, Hsinchun Chen

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.

Original languageEnglish (US)
Pages (from-to)515-518
Number of pages4
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3334
StatePublished - 2004

Fingerprint

Search Engine
Search engines
Learning systems
Websites
Vertical
Benchmarking
Backpropagation
Learning algorithms
Support vector machines
Machine Learning
Neural networks
Content Analysis
Back-propagation Neural Network
Feedforward
Research
Learning Algorithm
Support Vector Machine
Filtering
Benchmark
Evaluation

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

@article{204ebf8b1eb34f708cdbe05f3252fa1b,
title = "Using content-based and link-based analysis in building vertical search engines",
abstract = "This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.",
author = "Michael Chau and Hsinchun Chen",
year = "2004",
language = "English (US)",
volume = "3334",
pages = "515--518",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Using content-based and link-based analysis in building vertical search engines

AU - Chau, Michael

AU - Chen, Hsinchun

PY - 2004

Y1 - 2004

N2 - This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.

AB - This paper reports our research in the Web page filtering process in specialized search engine development. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. Instead of a bag of words, each Web page is represented by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. An evaluation study was conducted and showed that the proposed approaches performed better than the benchmark approaches.

UR - http://www.scopus.com/inward/record.url?scp=35048817047&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35048817047&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:35048817047

VL - 3334

SP - 515

EP - 518

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -