Topic 20 Data Structure Potpourri:Data Structure Potpourri: Hash Tables and Mapsp"hash collision n. [from the techspeak] (var. `hash clash') When used of people, signifies a confusion in associative memory or imagination, especially a persistent one (see thinko). True story: One of us was once on the phone with a friend about to move out to Berkeley. yWhen asked what he expected Berkeley to be like, the friend replied: "Well, I have this mental picture of naked women throwing Molotov cocktails, but I think that's just a collision in my hash ,j ytables." -The Hacker's DictionaryCS307 Hash Tables and Maps1The Hacker s Dictionary Programming Pearls by Jon BentleyJon was senior programmer on a large programming project. Senior programmer spend a lot of time helping junior programmers.Junior programmer to Jon: "I need help writing a sorting algorithm."CS307 Hash Tables and Maps2A ProblemFrom Programming Pearls (Jon in Italics)gg(J )Why do you want to write your own sort at all? Why not use a sort provided by your system? I need the sort in the middle of a large system and for obscureI need the sort in the middle of a large system, and for obscure technical reasons, I can't use the system file-sorting program. What exactly are you sorting? How many records are in the file? What is the format of each record?What is the format of each record? The file contains at most ten million records; each record is a seven-digit integer. Wait a minute. If the file is that small, why bother going to disk at yggall? Why not just sort it in main memory? Although the machine has many megabytes of main memory, this function is part of a big system. I expect that I'll have only about a megabyte free at that point. Is there anything else you can tell me about the records? Each one is a seven-digit positive integer with no other associated data and no integer can appear more than onceCS307 Hash Tables and Maps3data, and no integer can appear more than once. QuestionsWhen did this conversation take place?What were they sorting?How do you sort data when it won't all fit into main memory?aeoySpeed of file i/o?CS307 Hash Tables and Maps4A Solution SKDVH LQLWLDOL]H VHW WR HPSW\ IRU L > Q ELW>L@ ELW>L@ SKDVH LQVHUW SUHVHQW HOHPHQWV LQWR WKH VHW I K L L WK L W ILOIRU HDFK L LQ WKH LQSXW ILOH ELW>L@ SKDVH ZULWH VRUWHG RXWSXW IRU L > Q LI ELW>L@ LW L WK W W ILOLI ELW>L@ ZULWH L RQ WKH RXWSXW ILOH CS307 Hash Tables and Maps5Some Structures so FarALitArrayLists– O(1) access–O(N) insertion (average case) better at endO(N) insertion (average case), better at end– O(N) deletion (average case)LinkedLists– O(N) access– O(N) insertion (average case), better at front and backO(N) d l ti ( ) b tt t f t d b k–O(N) deletion (average case), better at front and backBinary Search Trees–O(log N) access if balancedO(log N) access if balanced– O(log N) insertion if balanced– O(log N) deletion if balancedCS307 Hash Tables and Maps6Why are Binary Trees Better?Divide and Conquer– reducing work by a factor of 2 each timeCan we reduce the work by a bigger factor? 10? 1000?An ArrayList does this in a way when accessingelementsaccessingelements– but must use an integer value–each position holds a single elementeach position holds a single elementCS307 Hash Tables and Maps7Hash TablesHash Tables overcome the problems of ArrayList while maintaining the fast access, insertion, and deletion in terms of N (number of elements already in the structure.)CS307 Hash Tables and Maps8Hash FunctionsHash: "From the French hatcher, which means 'to chop'. "to hash to mix randomly or shuffle (To cut up, to slash or hack about; to mangle)Hash Function: Take a large piece of data and reduce it to a smaller piece of data, a d educe o a s a e p ece o da a,usually a single integer. –A function or algorithmA function or algorithm– The input need not be integers!CS307 Hash Tables and Maps9Hash Function5/5/1967555389085 "Mike Scott" [email protected]"Isabelle"CS307 Hash Tables and Maps10Simple ExampleAssume we are using names as our key– take 3rd letter of name, take int value of letter (a = 0, b = 1, ...), divide by 6 and take remainderWhat does "Bellers" hash to?L -> 11 -> 11 % 6 = 5CS307 Hash Tables and Maps11Result of Hash FunctionMik (10 % 6) 4Mike = (10 % 6) = 4Kelly = (11 % 6) = 5Olivia = (8 % 6) = 2Olivia = (8 % 6) = 2Isabelle = (0 % 6) = 0David=(21%6)=3David = (21 % 6) = 3 Margaret = (17 % 6) = 5 (uh oh)Wendy = (13 % 6) = 1Wendy = (13 % 6) = 1This is an imperfect hash function. A perfect hash function yields a one to one mapping from the keys yppgyto the hash values.What is the maximum number of values this fti hhftl?CS307 Hash Tables and Maps12function can hash perfectly?More on Hash FunctionsNormally a two step process– transform the key (which may not be an integer) into an integer value– Map the resulting integer into a valid index for th h h t bl ( h ll th l tthe hash table (where all the elements are stored)Th t f ti f fThe transformation can use one of four techniques–mapping, folding, shifting, castingCS307 Hash Tables and Maps13Hashing TechniquesMapping– As seen in the example– integer values or things that can be easily converted to integer values in keyFolding– partition key into several parts and the integer values for the various parts are combined– the parts may be hashed first– combine using addition, multiplication, shifting, logical exclusive ORCS307 Hash Tables and Maps14More TechniquesShifting– an alternative to folding– A fold functionint hashVal = 0;int i=str length()-1;int i str.length() 1;while(i > 0){hashVal += (int) str.charAt(i);i--;;}results for "dog" and "god" ?CS307 Hash Tables and Maps15Shifting and CastingM li t d ith hiftiMore complicated with shiftingint hashVal = 0;int i = str.length() - 1;while(i > 0){ hashVal = (hashVal << 1) + (int) str.charAt(i);i--;}}different answers for "dog" and "god"Shifting may give a better range of hash values when compared to just foldingCasts Very simple– essentially casting as part of fold and shift when working with chars.CS307 Hash Tables and Maps16with chars.The Java String class hhCd thdhashCode methodpublic
View Full Document