1Program Design&Hash TablesCS 2172Program design1. Problem statement and requirementsWhat is the problem?2. SpecificationDetailed description of what the system should do, not how3. DesignExplore design space, identify algorithms and key interfaces4. ProgrammingImplement it in the simplest possible way; use libraries5. TestingDebug and test until the implementation is correct and efficient enough6. IterateDo the design and implementation conform to the specification?3Design methodologies• Two important design methodologies top-down design, or stepwise refinement bottom-up design• Reality: use both top-down: what functionality do I need?Avoids designing and building useless functionality bottom-up: what functionality do I know how to provide?Avoids requiring impossible functionality• Iterate up and down over the design until everything is both useful and feasible sometimes overlaps with implementation phase4Stepwise refinement• Top-down designstarts with a high-level abstract solutionrefines it by successive transformations to lower-level solutionsrefinement ends at programming-language statements• Key idea: each refinement or elaborationmust be small and correctmust move toward final solution• Accompany refinements with assertions• Refinements use English & pseudocode, but ultimately result in code5Example: library books1. Problem statement:The circulation file has a line of author,title for each checked out bookNeed a program to find books checked out frequently2. SpecificationRead a text file; print out one copy of any line that appears 10 or more times3. Design: how many lines are in a typical circulation file?<findfreq> ≡<for each line of input><look up the line in the table (add it if not already there)><increment this line’s count><for each member of the table><if that member’s count ≥ 10><print the line>4. Programming: make forward progress by elaborating chunks6What modules?• ADT: string table• Modules: main.c handle command-line arguments (if any) and top-level loops<findfreq> ≡<includes><defines>int main(int argc, char *argv[]) {<locals><for each line of input><look up the line in the table (add it if not already there)><increment this line’s count><for each member of the table><if that member’s count ≥ 10><print the line>return EXIT_SUCCESS;} symtable.h interface for string table symtable.c implementation for string table7Elaboration• Some elaborations can be done without defining the ADTs<for each line of input> ≡while (fgets(line, MAXLINE, stdin))<defines> ≡#define MAXLINE 512<locals> ≡char line[MAXLINE];8ADT: string tablesymtable.h describes abstractoperations, not implementation; what, not howtypedef struct SymTable *SymTable_T;SymTable_T SymTable_new(void); /* create a new, empty table. */int SymTable_put(SymTable_Ttable, char *key, void *value);/* enter (key,value) binding in the table; else return 0 if already there */void *SymTable_get(SymTable_T table, char *key);/* look up key in the table, return value (if present) or else NULL */void SymTable_map(SymTable_T table, void (*f)(char *key, void *value, void *extra),void *extra);/* apply f to every key in the table ... */This was top-down design: specify just those operations necessary for client program9Next step: re-use, if possible• Avoid some work by searching for an existing module or library that can do the work of SymTable module• If found, then throw away symtable.h• Let’s pretend we didn’t find one10A bit of bottom-up design• Now that we’ve committed to create SymTable ADT, add more operations that make it useful in other applications.• Don’t get carried away! You’ll end up doing useless work• This step is optional: you can always do it later as needed.11More of symtable interfacevoid SymTable_free(SymTable_T table);/* Free table */int SymTable_getLength(SymTable_T table);/* Return the number of bindings in table.It is a checked runtime error for table to be NULL. */int SymTable_remove(SymTable_T table, char *key);/* Remove from table the binding whose key is key. Return 1 if successful, 0 otherwise. It is a checked runtime error for table or key to be NULL. */12Back to the client• ADT interface gives enough information to finish the client, main.c<locals> +≡SymTable_T table = SymTable_new();struct stats *v;<includes> +≡#include “symtable.h”;<global-defs> ≡struct stats {int count;}; (also must define makeStats...)<look up the line in the table (add it if not already there)> ≡v = (struct stats *)SymTable_get(table, line);if (!v) {v = makeStats(0);SymTable_put(table, line, (void *)v);}13Finishing the client<for each member of the table> ≡SymTable_map(table, maybeprint, NULL);<if that member’s count ≥ 10, print the line> ≡void maybeprint(char *key, void *stats, void *extra){if (((struct stats*)stats)->count >= 10)fputs(key, stdout);}14What the client main looks likeint main(int argc, char *argv[]) {char line[MAXLINE]; SymTable_T table = SymTable_new();struct stats *v; while (fgets(line, MAXLINE, stdin)) {v = (struct stats *)SymTable_get(table, line);if (!v) {v = makeStats(0);SymTable_put(table, line, (void *)v);}incrementStats(v,1);}SymTable_map(table, maybeprint, NULL);return EXIT_SUCCESS;}15ADT implementation• Now, begin to design the ADT implementation• Start with a simple algorithm / data structure It’s good for debugging and testing the interface Maybe it’s good enough for the production system -- that would save the work of implementing a clever algorithmtable “header” nodea long string\0value1 (belongs to client)value2 (belongs to client)another string\016Testing• 5. Testing: findfreq works, but runs too slowly on large inputs. Why? Improve symtable’s implementation; don’t change its interface• Solution: use a hash tableA symtable will be a pointer to an array of TABLESIZE linked lists“Hash” the string into an integer hlet i = h % TABLESIZEsearch the ith linked list for the string, oradd the string to the head of the ith list0TABLESIZE-117How large an array?Array should be long enough that “bucket” size is 1.If the buckets are short, then lookup is fast.If there are some very long buckets, then average lookup is slow.This is OK:0TABLESIZE-118The need for a good hash functionArray should be long enough that average “bucket” size is 1.If the buckets are short, then lookup is fast.If there are some very long buckets, then average lookup
View Full Document