CompSci 100e 6.1 Hashing: Log (10100) is a big number ! Comparison based searches are too slow for lots of data ! How many comparisons needed for a billion elements? ! What if one billion web-pages indexed? ! Hashing is a search method: average case O(1) search ! Worst case is very bad, but in practice hashing is good ! Associate a number with every key, use the number to store the key • Like catalog in library, given book title, find the book ! A hash function generates the number from the key ! Goal: Efficient to calculate ! Goal: Distributes keys evenly in hash table CompSci 100e 6.2 Hashing details ! There will be collisions, two keys will hash to the same value ! We must handle collisions, still have efficient search ! What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? ! Several ways to handle collisions, in general array/vector used ! Linear probing, look in next spot if not found • Hash to index h, try h+1, h+2, …, wrap at end • Clustering problems, deletion problems, growing problems ! Quadratic probing • Hash to index h, try h+12, h+22 , h+32 , …, wrap at end • Fewer clustering problems ! Double hashing • Hash to index h, with another hash function to j • Try h, h+j, h+2j, … 0 1 2 3 n-1 CompSci 100e 6.3 Chaining with hashing ! With n buckets each bucket stores linked list ! Compute hash value h, look up key in linked list table[h] ! Hopefully linked lists are short, searching is fast ! Unsuccessful searches often faster than successful • Empty linked lists searched more quickly than non-empty ! Potential problems? ! Hash table details ! Size of hash table should be a prime number ! Keep load factor small: number of keys/size of table ! On average, with reasonable load factor, search is O(1) ! What if load factor gets too high? Rehash or other method CompSci 100e 6.4 Hashing problems ! Linear probing, hash(x) = x, (mod tablesize) ! Insert 24, 12, 45, 14, delete 24, insert 23 (where?) ! Same numbers, use quadratic probing (clustering better?) ! What about chaining, what happens? 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 24 12 45 14 24 12 45 14CompSci 100e 6.5 What about hash functions ! Hashing often done on strings, consider two alternatives public static int hash(String s) { int k, total = 0; for(k=0; k < s.length(); k++){ total += s.charAt(k); } return total; } ! Consider total += (k+1)*s.charAt(k), why might this be better? ! Other functions used, always mod result by table size ! What about hashing other objects? ! Need conversion of key to index, not always simple ! Ever object has method hashCode()! CompSci 100e 6.6 Tools: Solving Computational Problems ! Algorithmic techniques and paradigms ! Brute-force/exhaustive, greedy algorithms, dynamic programming, divide-and-conquer, … ! Transcend a particular language ! Designing algorithms, may change when turned into code ! Programming techniques and paradigms ! Recursion, memo-izing, compute-once/lookup, tables, … ! Transcend a particular language ! Help in making code work • Avoid software problems (propagating changes, etc.) • Avoid performance problems CompSci 100e 6.7 Quota Exceeded ! You’re running out of disk space ! Buy more ! Compress files ! Delete files ! How do you find your “big” files? ! What’s big? ! How do you do this? CompSci 100e 6.8 Recursive structure matches code public static long THRESHOLD = 1000000L; // one million bytes! public static void findBig(File dir, String tab) {! File[] dirContents = dir.listFiles();! System.out.println(tab+"**:"+dir.getPath());! for(File f : dirContents){! if (f.isDirectory()) {! findBig(f,tab+"\t");! }! else {! if (f.length() > THRESHOLD){! System.out.printf("%s%s%8d\n",tab,f.getName(), f.length());! }! }! }! }!Does findBig call itself?CompSci 100e 6.9 Solving Problems Recursively ! Recursion is an indispensable tool in a programmer’s toolkit ! Allows many complex problems to be solved simply ! Elegance and understanding in code often leads to better programs: easier to modify, extend, verify (and sometimes more efficient!!) ! Sometimes recursion isn’t appropriate, when it’s bad it can be very bad---every tool requires knowledge and experience in how to use it ! The basic idea is to get help solving a problem from coworkers (clones) who work and act like you do ! Ask clone to solve a simpler but similar problem ! Use clone’s result to put together your answer ! Need both concepts: call on the clone and use the result CompSci 100e 6.10 Print words read, but print backwards ! Could store all the words and print in reverse order, but … ! Probably the best approach, recursion works too public void printReversed(Scanner s){ if (s.hasNext()){ // reading succeeded? String word = s.next(); // store word printReversed(s); // print rest System.out.println(word); // print the word } } ! The function printReversed reads a word, prints the word only after the clones finish printing in reverse order ! Each clone has own version of the code, own word variable ! Who keeps track of the clones? ! How many words are created when reading N words? • Can we do better? CompSci 100e 6.11 Exponentiation ! Computing xn means multiplying n numbers (or does it?) ! What’s the simplest value of n when computing xn? ! If you want to multiply only once, what can you ask a clone? public static double power(double x, int n){ if (n == 0){ return 1.0; } return x * power(x, n-1); } ! Number of multiplications? ! Note base case: no recursion, no clones ! Note recursive call: moves toward base case (unless …) CompSci 100e 6.12 Faster exponentiation ! How many recursive calls are made to computer 21024? ! How many multiplies on each call? Is this better? public static double power(double x, int n){ if (n == 0) { return 1.0; } double semi = power(x, n/2); if (n % 2 == 0) { return semi*semi; } return x * semi * semi; } ! What about an iterative version of this function?CompSci 100e 6.13 Back to Recursion ! Recursive
View Full Document