Robust spatiotemporal matching of electronic slides to presentation videos

Quanfu Fan, Jacobus J Barnard, Arnon Amir, Alon Efrat

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.

Original languageEnglish (US)
Article number5705574
Pages (from-to)2315-2328
Number of pages14
JournalIEEE Transactions on Image Processing
Volume20
Issue number8
DOIs
StatePublished - Aug 2011

Fingerprint

Cameras
Distance education
Hidden Markov models
Classifiers
Chemical analysis
Experiments

Keywords

  • Distance learning
  • homography constraint
  • matching slides to video frames
  • scale-invariant feature-transformation (SIFT) keypoints
  • video indexing and browsing

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Robust spatiotemporal matching of electronic slides to presentation videos. / Fan, Quanfu; Barnard, Jacobus J; Amir, Arnon; Efrat, Alon.

In: IEEE Transactions on Image Processing, Vol. 20, No. 8, 5705574, 08.2011, p. 2315-2328.

Research output: Contribution to journalArticle

@article{5f3a62f28d8445138b9fb3bb8c2f9fe7,
title = "Robust spatiotemporal matching of electronic slides to presentation videos",
abstract = "We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95{\%} in 13 presentation videos.",
keywords = "Distance learning, homography constraint, matching slides to video frames, scale-invariant feature-transformation (SIFT) keypoints, video indexing and browsing",
author = "Quanfu Fan and Barnard, {Jacobus J} and Arnon Amir and Alon Efrat",
year = "2011",
month = "8",
doi = "10.1109/TIP.2011.2109727",
language = "English (US)",
volume = "20",
pages = "2315--2328",
journal = "IEEE Transactions on Image Processing",
issn = "1057-7149",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "8",

}

TY - JOUR

T1 - Robust spatiotemporal matching of electronic slides to presentation videos

AU - Fan, Quanfu

AU - Barnard, Jacobus J

AU - Amir, Arnon

AU - Efrat, Alon

PY - 2011/8

Y1 - 2011/8

N2 - We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.

AB - We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.

KW - Distance learning

KW - homography constraint

KW - matching slides to video frames

KW - scale-invariant feature-transformation (SIFT) keypoints

KW - video indexing and browsing

UR - http://www.scopus.com/inward/record.url?scp=79960518181&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960518181&partnerID=8YFLogxK

U2 - 10.1109/TIP.2011.2109727

DO - 10.1109/TIP.2011.2109727

M3 - Article

C2 - 21292597

AN - SCOPUS:79960518181

VL - 20

SP - 2315

EP - 2328

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

SN - 1057-7149

IS - 8

M1 - 5705574

ER -