Search the audio, browse the video - A generic paradigm for video collections

Arnon Amir, Savitha Srinivasan, Alon Efrat

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.

Original languageEnglish (US)
Pages (from-to)209-222
Number of pages14
JournalEurasip Journal on Applied Signal Processing
Volume2003
Issue number2
DOIs
StatePublished - Feb 1 2003

Fingerprint

Speech recognition
Automatic indexing
Speech analysis
Image processing
Semantics

Keywords

  • Automatic video indexing
  • Phonetic speech retrieval
  • Video and speech retrieval
  • Video browsing

ASJC Scopus subject areas

  • Hardware and Architecture
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Search the audio, browse the video - A generic paradigm for video collections. / Amir, Arnon; Srinivasan, Savitha; Efrat, Alon.

In: Eurasip Journal on Applied Signal Processing, Vol. 2003, No. 2, 01.02.2003, p. 209-222.

Research output: Contribution to journalArticle

@article{40c8b771132240849c90a2b9370c49a7,
title = "Search the audio, browse the video - A generic paradigm for video collections",
abstract = "The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.",
keywords = "Automatic video indexing, Phonetic speech retrieval, Video and speech retrieval, Video browsing",
author = "Arnon Amir and Savitha Srinivasan and Alon Efrat",
year = "2003",
month = "2",
day = "1",
doi = "10.1155/S111086570321012X",
language = "English (US)",
volume = "2003",
pages = "209--222",
journal = "Eurasip Journal on Advances in Signal Processing",
issn = "1687-6172",
publisher = "Springer Publishing Company",
number = "2",

}

TY - JOUR

T1 - Search the audio, browse the video - A generic paradigm for video collections

AU - Amir, Arnon

AU - Srinivasan, Savitha

AU - Efrat, Alon

PY - 2003/2/1

Y1 - 2003/2/1

N2 - The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.

AB - The amount of digital video being shot, captured, and stored is growing at a rate faster than ever before. The large amount of stored video is not penetrable without efficient video indexing, retrieval, and browsing technology. Most prior work in the field can be roughly categorized into two classes. One class is based on image processing techniques, often called content-based image and video retrieval, in which video frames are indexed and searched for visual content. The other class is based on spoken document retrieval, which relies on automatic speech recognition and text queries. Both approaches have major limitations. In the first approach, semantic queries pose a great challenge, while the second, speech-based approach, does not support efficient video browsing. This paper describes a system where speech is used for efficient searching and visual data for efficient browsing, a combination that takes advantage of both approaches. A fully automatic indexing and retrieval system has been developed and tested. Automated speech recognition and phonetic speech indexing support text-to-speech queries. New browsable views are generated from the original video. A special synchronized browser allows instantaneous, context-preserving switching from one view to another. The system was successfully used to produce searchable-browsable video proceedings for three local conferences.

KW - Automatic video indexing

KW - Phonetic speech retrieval

KW - Video and speech retrieval

KW - Video browsing

UR - http://www.scopus.com/inward/record.url?scp=0037301246&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037301246&partnerID=8YFLogxK

U2 - 10.1155/S111086570321012X

DO - 10.1155/S111086570321012X

M3 - Article

AN - SCOPUS:0037301246

VL - 2003

SP - 209

EP - 222

JO - Eurasip Journal on Advances in Signal Processing

JF - Eurasip Journal on Advances in Signal Processing

SN - 1687-6172

IS - 2

ER -