Challenges and Solutions

Home> Academic Documents> Challenges and Solutions

DOC PREVIEW

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 31 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Extracting Semantics from Multimedia Content:Challenges and SolutionsLexing Xie, Rong YanAbstract Multimedia content accounts for over 60% of traffic in the current inter-net [74]. With many users willing to spend their leisure time watching videos onYouTube or browsing photos through Flickr, sifting through large multimedia col-lections for useful information, especially those outside of the open web, is stillan open problem. The lack of effective indexes to describe the content of multi-media data is a main hurdle to multimedia search, and extracting semantics frommultimedia content is the bottleneck for multimedia indexing. In this chapter, wepresent a review on extracting semantics from a large amount of multimedia dataas a statistical learning problem. Our goal is to present the current challenges andsolutions from a few different perspectives and cover a sample of related work.We start with an system overview with the five major components that extracts anduses semantic metadata: data annotation, multimedia ontology, feature representa-tion, model learning and retrieval systems. We then present challenges for each ofthe five components along with their existing solutions: designing multimedia lex-icons and using them for concept detection, handling multiple media sources andresolving correspondence across modalities, learning structured (generative) mod-els to account for natural data dependency or model hidden topics, handling rareclasses, leveraging unlabeled data, scaling to large amounts of training data, andfinally leveraging media semantics in retrieval systems.1 IntroductionMultimedia data are being captured, stored and shared at an unprecedented scale,yet the technology that helps people search, use, and express themselves with theseL. Xie and R. Yan are with IBM T J Watson Research Center, Hawthorne, NY, e-mail: {xlx,yanr}@us.ibm.com This material is based upon work funded in part by the U. S. Government.Any opinions, findings and conclusions or recommendations expressed in this material are thoseof the author(s) and do not necessarily reflect the views of the U.S. Government.12 Lexing Xie, Rong Yanmedia is lagging behind. While no statistics is available about the total amount ofmultimedia content being produced, the following two statistics can provide us withan intuition about its scale : there are about 83 million digital still cameras sold in2006 [37], and video already account for more than half of the internet traffic, withYouTube alone taking 10% [2, 74, 30]. A typical internet user actively gleans infor-mation from the web with several searches per day, yet their consumption of videocontent mostly remains passive and sequential, due to the inefficacy of indexing intovideo content with current practices. As an article on Wired overtly put: “Search en-gines cannot index video files as easily as text. That is tripping up the Web’s nextgreat leap forward.” [3] The key to indexing into image and video files lies in theability to describe and compare the media content in a way meaningful to humans,i.e. the grand challenge of closing the semantic gap [80] from the perceived lightand sound to users’ interpretations.One crucial step that directly addresses the semantic indexing challenge is toextract semantics from multimedia data. The advance in storage and computationpower in recent years has made collecting and processing large amounts of im-age/video data possible – thus has shifted the solutions to semantic extraction fromknowledge-drive to data-driven, similar to what has been practiced in speech recog-nition for several decades [72]. Algorithms and systems for data-driven semanticsextraction are embodiments of statistical pattern recognition systems specialized inmultimedia data. They learn a computational representation from a training data cor-pus labeled with one or more known semantic interpretations (such as face, human,outdoors). Statistical learning of multimedia semantics has significantly advancedperformance and real-world practice in recent years, which made possible, for ex-ample, real-time face detectors [95].This paper is intended to survey and discuss existing approaches on extractingmultimedia semantics in a statistical learning framework. Our goal is to present thecurrent challenges and solutions from a few different perspectives and cover a sam-ple of related work. The scope of this chapter has two implications: (1) since thesize of target semantics from media is usually very large (e.g., objects, scene, peo-ple, events, ...), we put more emphasis on algorithms and systems designed genericsemantics than those specialized in one or a few particular ones (e.g., faces); (2) wefocus more on the new challenges for model design created by the scale of real-world multimedia data and the characteristics of learning tasks (such as rare classes,unlabeled data, structured input/output, etc.). Within this scope, the semantic ex-traction problem can be decomposed into several subproblems: the general process-ing steps of going from media data to features and then to semantic metadata; thesemantic concept ontology, and how to leverage it for better detection; the chal-lenge of dealing with multi-media, i.e. how to use a plurality of input types; dealingwith real-world annotated training dataset: rare semantics, sparseness of labels in anabundance of unlabeled data, scaling to large datasets and large sets of semantics;accounting for the the natural dependencies in data with structured input and output,and using semantics in search and retrieval systems.This said, learning to extract semantics from multimedia shall be of much broaderinterest than in the multimedia analysis community. Because (1) the abstract learn-Extracting Semantics from Multimedia Content: Challenges and Solutions 3ing problems are very similar to those seen in many other domains: stream datamining, network measurement and diagnosis, bio-informatics, business processingmining, and so on; (2) multimedia semantics can in turn enable better user ex-periences and improve system design in closely related areas such as computer-human interaction, multimedia communication and transmission, multimedia au-thoring, etc. This work is also distinct from several other surveys on multimediaindexing [80, 84, 76, 110, 11] in that we present an in-depth discussion on semanticextraction, an important component in an entire indexing system, from an algorith-mic perspective. For completeness,


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Please select your school