Misc Topics 2TopicsOLAPData WarehousesData MiningInformation RetrievalMisc Topics 2Misc Topics 2Amol DeshpandeAmol DeshpandeCMSC424CMSC424TopicsTopicsOLAPData WarehousesInformation RetrievalOLAPOLAPOn-line Analytical ProcessingWhy ?Exploratory analysisInteractiveDifferent queries than typical SPJ SQL queriesData CUBEA summary structure used for this purpose–E.g. give me total sales by zipcode; now show me total sales by customer employment categoryMuch much faster than using SQL queries against the raw data–The tables are hugeApplications:Sales reporting, Marketing, Forecasting etc etcData WarehousesData WarehousesA repository of integrated information for querying and analysis purposesTend to be very very largeTypically not kept up-to-date with the real dataSpecialized query processing and indexing techniques are usedVery widely usedData MiningData MiningSearching for patterns in dataTypically done in data warehousesAssociation Rules:When a customer buys X, she also typically buys YUse ? Move X and Y together in supermarketsA customer buys a lot of shirtsSend him a catalogue of shirtsPatterns are not always obviousClassic example: It was observed that men tend to buy beer and diapers together (may be an urban legend)Other types of miningClassificationDecision TreesInformation RetrievalInformation RetrievalRelational DB == Structured dataInformation Retrieval == Unstructured dataEvolved independently of each otherStill very little interaction between the twoGoal: Searching within documentsQueries are different; typically a list of words, not SQLE.g. Web searchingIf you just look for documents containing the words, millions of them Mostly uselessRanking:This is the key in IRMany different ways to do itE.g. something that takes into account term frequenciesPagerank (from Google) seems to work best for
View Full Document