UMD LBSC 796 - Natural Language Processing & Information Retrieval

Unformatted text preview:

Page 1.NLP & IR ... a tutorial presented at ESSIR’95, Glasgow.  Alan F. Smeaton, 1995Natural Language Processing&Information RetrievalAlan F. SmeatonSchool of Computer ApplicationsDublin City UniversityGlasnevin, Dublin [email protected]://www.compapp.dcu.ie/~asmeaton/asmeaton.html... a tutorial presented at theSecond European Summer School in Information Retrieval(ESSIR’95)Glasgow, Scotland, September 1995Page 2.NLP & IR ... a tutorial presented at ESSIR’95, Glasgow.  Alan F. Smeaton, 19951. IntroductionIR is an old, mature area of research in computing / information science /library science ... it is not massivley popular like graphics or databases(based on counts at conferences) ... a homely bunch of individuals !It is based around a technology which delivers solutions to a market whichhas been in place for decades ... not great solutions, but ones which work.This is primarily boolean queries with operators like ADJ and worddistances as enhancements though as this summer school shows there arealternatives which are much more attractive.Originally and for a long time, the IR market was• libraries on dial-up lines• patent application offices• legal and para-legal officesBoolean IR was attractive because of its efficient implementation usinginverted files but• the difficulties of manipulating boolean logic,• the comparative complexity of search strategies for the untrained• the monetary costs associated with using computers in the early days... led to the emergence of the trained intermediary / librarian as a gobetween bridging the user and the IR system.Naturally, this was/is expensive and time-consumingPage 3.NLP & IR ... a tutorial presented at ESSIR’95, Glasgow.  Alan F. Smeaton, 1995Then the following developments happened:• The PC and networking came, bringing distributed processing to thedesktop ... users used tools themselves, directly, users got accessto data themselves and started/wanted to do IR, users gotcomfortable with direct access to powerful tools and dispensed withintermediaries, and now demanded more from IR• The volume of data, machine-readable text information, hasincreased staggeringly ... every newspaper, book, technicaldocument, office letter and memo, and newswire.The combination of these two means many users are looking at IR as abasic technology for underlying applications ... the numbers at ourconferences are starting to grow ... SIGIR and TREC and SDAIR andHIM and IR is q component in Hypermedia, DL, others ...... funding in our area is starting to flow ...• US Digital Libraries includes IR• DARPA TREC and to a lesser extent MUC, TIPSTER• CEC 4FP has Information Engineering and Language Engineering aswell as LIBRARIES in the Telematics Programme ... in the 3FPthere was LRE ... prior to that IR was banished to ESPRIT tocompete with everyone else in the “leftovers” bracketNLP has, like IR, had a long history but whereas IR has always beensmaller but constant, NLP has had many more ups and downs.The ups started with the hype of being able to do machine translation andintelligent IR in the 1960s ... remember the computing power available inthose days ? First attempts, and all that was computable for volumes oftext at that time, were simple dictionary lookup and even simpler rules.Translation by literal word transformation is ... bad ... time flies like anarrow etc ...Page 4.NLP & IR ... a tutorial presented at ESSIR’95, Glasgow.  Alan F. Smeaton, 1995... the initial up was hammered by the US ALPEC report in 1965 whichstated MT impossible and NLP and AI in general received massive cuts inresearch funding which continued for many years.Slowly, AI, or aspects of AI, pulled out of these doldrums and AI as asingle field split in all kinds of directions.... we have seen the rise and ‘fall’ of expert systems or rule-based systems... we are seeing the rise of neural nets / connectionism... etcThe history of NLP is tied very much to the history of AI as NLP was seenas the earliest AI application.After ALPEC, NLP went into decline in terms of funding, but there wasstill interest and as computing moved from processing numeric data toprocessing more and more text in applications like WP, NLP becamefashionable again.Now, NLP is a very large and strong field bridging computer science,linguistics, philosophy, psychology, metaphysics and software engineering.In February 1992 NSF organised a workshop of 23 invited specialists(IEEE Trans KDE, Feb’93) to identify near-term (5- years) prospects andneeds in Speech and Natural Language Processing ... top of the list wasthe Electronic Library and Librarian which would use IR technology... by 2000 technology will allow access to US Library of Congress sizedvolumes of data though WW has accelerated this even moreso... how can we retrieve effectively from that scale ... it is going to need togo beyond the current full-text retrieval systems and handle heterogeneouscollections, multimedia, etc and statistical approaches alone may beinadequate for this.Page 5.NLP & IR ... a tutorial presented at ESSIR’95, Glasgow.  Alan F. Smeaton, 1995An Overview ...1. Introduction ... this is it ! 2. Overview of IR and IR processes ... yeah, you’ve heqrd this in othertutorials but not my version ... this is about users and authors andinformation needs and where an IR process fits into the scheme ... thenature of text ... the inexact and imprecise nature of information retrieval... string searching vs using surrogates ... standard indexing by a bag ofwords ... desirable features of retrieval ... overview of standardmatching techniques Overview of NLP ... what is NLP ... stages ofNLP, lexical, syntactic,3. semantic, pragmatic and discourse levels ... NLP applications 4. Applications of NLP in IR ... indexing by base forms of words, byword senses and word sense disambiguation ... indexing by phrases orcoordinated terms ... handling ambiguity in noun phrases ... queryexpansion via linguistic structures ... knowledge representationformalisms like frames and conceptual information retrieval 5. Role of NLP in IR ... a generalisation of what NLP techniques canoffer IR and what they cannot and an almost philosophical discussion ofthe limitations of current NLP 6. Prospects ... for future developmentIn 3 hours we won’t get much done, certainly we won’t cover all of thesignificant efforts in the field but only a


View Full Document

UMD LBSC 796 - Natural Language Processing & Information Retrieval

Download Natural Language Processing & Information Retrieval
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Natural Language Processing & Information Retrieval and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Natural Language Processing & Information Retrieval 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?