A parallel computing approach to creating engineering concept spaces for semantic retrieval: The Illinois digital library initiative project

Hsinchun Chen, B. Schatz, T. Ng, J. Martinez, A. Kirchhoff, Lin Chienting

Research output: Contribution to journalArticle

74 Scopus citations

Abstract

This resGarch presorts preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. We have experimented previously with such a technique for a smaller molecular biology domain (Worm Community System, with 10+ MBs of document collection) with encouraging results. In order to address Ihe scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we recently conducted experiments using the concept space approach on parallel supercomputers. Our test collection included 2+ GBs of computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and cooccurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16processor SGI Power Challenge were promising. Power Challenge was later selected to create a comprehensive computer engineering concept space of about 270,000 terms and 4,000,000+ links using 24.5 hours of CPU time. Our system evaluation involving 12 knowledgeable subjects revealed that the automatically-created computer engineering concept space generated significantly higher concept recall than the human-generated INSPEC computer engineering thesaurus. However, the INSPEC was more precise than the automatic concept space. Our current work mainly involves creating concept spaces for other major engineering domains and developing robust graph matching and traversal algorithms for cross-domain, concept-based retrieval. Future work also will include generating individualized concept spaces for assisting user-specific concept-based information retrieval.

Original languageEnglish (US)
Pages (from-to)771-782
Number of pages12
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume18
Issue number8
DOIs
StatePublished - 1996

Keywords

  • Concept association
  • Concept space
  • Digital library
  • Parallel computing
  • Semantic retrieval

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint Dive into the research topics of 'A parallel computing approach to creating engineering concept spaces for semantic retrieval: The Illinois digital library initiative project'. Together they form a unique fingerprint.

  • Cite this