Internet Categorization and Search: A Self-Organizing Approach

Hsinchun Chen, Chris Schuffels, Richard Orwig

Research output: Contribution to journalArticle

187 Citations (Scopus)

Abstract

The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CMU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing.

Original languageEnglish (US)
Pages (from-to)88-102
Number of pages15
JournalJournal of Visual Communication and Image Representation
Volume7
Issue number1
DOIs
StatePublished - Mar 1996

Fingerprint

Internet
Servers
World Wide Web
Self organizing maps
Information retrieval
Clustering algorithms
Learning algorithms
Learning systems
Neural networks
Testing

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Internet Categorization and Search : A Self-Organizing Approach. / Chen, Hsinchun; Schuffels, Chris; Orwig, Richard.

In: Journal of Visual Communication and Image Representation, Vol. 7, No. 1, 03.1996, p. 88-102.

Research output: Contribution to journalArticle

@article{de60a3bbfdd44c9796974c85ded9a85f,
title = "Internet Categorization and Search: A Self-Organizing Approach",
abstract = "The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CMU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing.",
author = "Hsinchun Chen and Chris Schuffels and Richard Orwig",
year = "1996",
month = "3",
doi = "10.1006/jvci.1996.0008",
language = "English (US)",
volume = "7",
pages = "88--102",
journal = "Journal of Visual Communication and Image Representation",
issn = "1047-3203",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Internet Categorization and Search

T2 - A Self-Organizing Approach

AU - Chen, Hsinchun

AU - Schuffels, Chris

AU - Orwig, Richard

PY - 1996/3

Y1 - 1996/3

N2 - The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CMU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing.

AB - The problems of information overload and vocabulary differences have become more pressing with the emergence of increasingly popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet WWW software are based on either keyword search (e.g., the Lycos server at CMU, the Yahoo server at Stanford) or hypertext browsing (e.g., Mosaic and Netscape). This research aims to provide an alternative concept-based categorization and search capability for WWW servers based on selected machine learning algorithms. Our proposed approach, which is grounded on automatic textual analysis of Internet documents (homepages), attempts to address the Internet search problem by first categorizing the content of Internet documents. We report results of our recent testing of a multilayered neural network clustering algorithm employing the Kohonen self-organizing feature map to categorize (classify) Internet homepages according to their content. The category hierarchies created could serve to partition the vast Internet services into subject-specific categories and databases and improve Internet keyword searching and/or browsing.

UR - http://www.scopus.com/inward/record.url?scp=0030104572&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030104572&partnerID=8YFLogxK

U2 - 10.1006/jvci.1996.0008

DO - 10.1006/jvci.1996.0008

M3 - Article

AN - SCOPUS:0030104572

VL - 7

SP - 88

EP - 102

JO - Journal of Visual Communication and Image Representation

JF - Journal of Visual Communication and Image Representation

SN - 1047-3203

IS - 1

ER -