DOC PREVIEW
UT CS 307 - Data Structure Potpourri

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Topic 20 Data Structure Potpourri:Data Structure Potpourri: Hash Tables and Mapsp"hash collision n. [from the techspeak] (var. `hash clash') When used of people, signifies a confusion in associative memory or imagination, especially a persistent one (see thinko). True story: One of us was once on the phone with a friend about to move out to Berkeley. yWhen asked what he expected Berkeley to be like, the friend replied: "Well, I have this mental picture of naked women throwing Molotov cocktails, but I think that's just a collision in my hash ,j ytables." -The Hacker's DictionaryCS307 Hash Tables and Maps1The Hacker s Dictionary Programming Pearls by Jon BentleyJon was senior programmer on a large programming project. Senior programmer spend a lot of time helping junior programmers.Junior programmer to Jon: "I need help writing a sorting algorithm."CS307 Hash Tables and Maps2A ProblemFrom Programming Pearls (Jon in Italics)gg(J )Why do you want to write your own sort at all? Why not use a sort provided by your system? I need the sort in the middle of a large system and for obscureI need the sort in the middle of a large system, and for obscure technical reasons, I can't use the system file-sorting program. What exactly are you sorting? How many records are in the file? What is the format of each record?What is the format of each record? The file contains at most ten million records; each record is a seven-digit integer. Wait a minute. If the file is that small, why bother going to disk at yggall? Why not just sort it in main memory? Although the machine has many megabytes of main memory, this function is part of a big system. I expect that I'll have only about a megabyte free at that point. Is there anything else you can tell me about the records? Each one is a seven-digit positive integer with no other associated data and no integer can appear more than onceCS307 Hash Tables and Maps3data, and no integer can appear more than once. QuestionsWhen did this conversation take place?What were they sorting?How do you sort data when it won't all fit into main memory?aeoySpeed of file i/o?CS307 Hash Tables and Maps4A Solution SKDVH  LQLWLDOL]H VHW WR HPSW\  IRU L > Q ELW>L@ ELW>L@   SKDVH  LQVHUW SUHVHQW HOHPHQWV LQWR WKH VHW I K L L WK L W ILOIRU HDFK L LQ WKH LQSXW ILOH ELW>L@   SKDVH  ZULWH VRUWHG RXWSXW  IRU L > Q LI ELW>L@  LW L WK W W ILOLI ELW>L@  ZULWH L RQ WKH RXWSXW ILOH CS307 Hash Tables and Maps5Some Structures so FarALitArrayLists– O(1) access–O(N) insertion (average case) better at endO(N) insertion (average case), better at end– O(N) deletion (average case)LinkedLists– O(N) access– O(N) insertion (average case), better at front and backO(N) d l ti ( ) b tt t f t d b k–O(N) deletion (average case), better at front and backBinary Search Trees–O(log N) access if balancedO(log N) access if balanced– O(log N) insertion if balanced– O(log N) deletion if balancedCS307 Hash Tables and Maps6Why are Binary Trees Better?Divide and Conquer– reducing work by a factor of 2 each timeCan we reduce the work by a bigger factor? 10? 1000?An ArrayList does this in a way when accessingelementsaccessingelements– but must use an integer value–each position holds a single elementeach position holds a single elementCS307 Hash Tables and Maps7Hash TablesHash Tables overcome the problems of ArrayList while maintaining the fast access, insertion, and deletion in terms of N (number of elements already in the structure.)CS307 Hash Tables and Maps8Hash FunctionsHash: "From the French hatcher, which means 'to chop'. "to hash to mix randomly or shuffle (To cut up, to slash or hack about; to mangle)Hash Function: Take a large piece of data and reduce it to a smaller piece of data, a d educe o a s a e p ece o da a,usually a single integer. –A function or algorithmA function or algorithm– The input need not be integers!CS307 Hash Tables and Maps9Hash Function5/5/1967555389085 "Mike Scott" [email protected]"Isabelle"CS307 Hash Tables and Maps10Simple ExampleAssume we are using names as our key– take 3rd letter of name, take int value of letter (a = 0, b = 1, ...), divide by 6 and take remainderWhat does "Bellers" hash to?L -> 11 -> 11 % 6 = 5CS307 Hash Tables and Maps11Result of Hash FunctionMik (10 % 6) 4Mike = (10 % 6) = 4Kelly = (11 % 6) = 5Olivia = (8 % 6) = 2Olivia = (8 % 6) = 2Isabelle = (0 % 6) = 0David=(21%6)=3David = (21 % 6) = 3 Margaret = (17 % 6) = 5 (uh oh)Wendy = (13 % 6) = 1Wendy = (13 % 6) = 1This is an imperfect hash function. A perfect hash function yields a one to one mapping from the keys yppgyto the hash values.What is the maximum number of values this fti hhftl?CS307 Hash Tables and Maps12function can hash perfectly?More on Hash FunctionsNormally a two step process– transform the key (which may not be an integer) into an integer value– Map the resulting integer into a valid index for th h h t bl ( h ll th l tthe hash table (where all the elements are stored)Th t f ti f fThe transformation can use one of four techniques–mapping, folding, shifting, castingCS307 Hash Tables and Maps13Hashing TechniquesMapping– As seen in the example– integer values or things that can be easily converted to integer values in keyFolding– partition key into several parts and the integer values for the various parts are combined– the parts may be hashed first– combine using addition, multiplication, shifting, logical exclusive ORCS307 Hash Tables and Maps14More TechniquesShifting– an alternative to folding– A fold functionint hashVal = 0;int i=str length()-1;int i str.length() 1;while(i > 0){hashVal += (int) str.charAt(i);i--;;}results for "dog" and "god" ?CS307 Hash Tables and Maps15Shifting and CastingM li t d ith hiftiMore complicated with shiftingint hashVal = 0;int i = str.length() - 1;while(i > 0){ hashVal = (hashVal << 1) + (int) str.charAt(i);i--;}}different answers for "dog" and "god"Shifting may give a better range of hash values when compared to just foldingCasts Very simple– essentially casting as part of fold and shift when working with chars.CS307 Hash Tables and Maps16with chars.The Java String class hhCd thdhashCode methodpublic


View Full Document

UT CS 307 - Data Structure Potpourri

Documents in this Course
Midterm 2

Midterm 2

15 pages

Midterm 1

Midterm 1

15 pages

Syllabus

Syllabus

24 pages

s

s

8 pages

Midterm 1

Midterm 1

14 pages

Midterm 2

Midterm 2

14 pages

Recursion

Recursion

14 pages

Midterm 1

Midterm 1

16 pages

Load more
Download Data Structure Potpourri
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Structure Potpourri and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Structure Potpourri 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?