Unformatted text preview:

Design and Implementation Objective To design and implement a program for a relatively small yet reasonably complicated problem To introduce and review a variety of implementation languages and to have students review the pros and cons of different implementation choices and languages Show me your flowcharts and conceal your tables and I shall continue to be mystified Show me your tables and I won t usually need your flowcharts they ll be obvious Frederick P Brooks The Mythical Man Month The examples in these slides come from Brian W Kernighan and Rob Pike The Practice of Programming Addison Wesley 1999 Themes Once the data structures are laid out the algorithms tend to fall into place and the coding is comparatively easy The choice of programming language is relatively unimportant to the overall design Comparing implementations demonstrates how languages can help or hinder and ways in which they are unimportant Topics Problem Generate random English text that reads well Program some data comes in some data goes out and the processing depends on a little ingenuity Implementations C C Java Perl Failed Attempts Generate random letters even with proper frequency Generate random words chosen from a dictionary Need a statistical model with more structure frequency of phrases The Markov Algorithm Learn Learn from input Look at all n word phrases prefixes Keep a list of words that follow one word suffices for each prefix Store the prefix suffix list in a dictionary The key entry is the prefix The satellite data associated w each prefix is the list of suffices Note we re faking a multi map Each prefix has potentially several possible suffices Sample The following example uses a subset of Dr Brooks quote Uses a prefix length of 2 words We won t bother w stripping out punctuation We won t worry about capitalisation so We and we are different strings We will use a special Null Null prefix to indicate the start Sample Markov Algorithm partial list Input prefix Suffix words that follow null null show null show me show me your me your flowcharts tables your flowcharts and flowcharts and conceal your tables and and will be mystified obvious be obvious null Sample notes We don t filter out duplicates E g and appears twice for the prefix your tables This is fine and is from the input a nice word to follow It should be a more probable choice We also use the suffix null to say Here s a decent place to end our story Markov Algorithm generate text set w1 and w2 to the first two words in the text print w1 and w2 loop randomly choose w3 one of the successors of w1 and w2 in sample text print w3 replace w1 and w2 by w2 and w3 Sample 1 2 3 4 Say the current prefix is me your Go to our dictionary find the entry Choose a random word from the suffix list say tables Write that word out Make a new prefix your tables Repeat until we choose the suffix null or decide we ve output enough words Implementation See lecture outline for links to 4 different implementations C see Makefile C Java Perl Python What are the pros and cons of the different implementations The Data Structures Python and Perl have everything we need built in Java and C provide appropriate containers in their standard libraries In C we ll need to roll these things ourselves The Data Structures Python The dictionary dict is given to us The prefixes the keys in the dictionary will be 2 element tuples immutable The satellite data will be stored in a list an array If a prefix doesn t already exist in the table we insert it along w an empty list We just append the new suffix onto the end of this list Hash Tables We need a dictionary to map a prefix to its list of suffices given a prefix what are the possible suffices In C we ll roll our own hash table The prefix key is hashed returning a value on 0 N 1 where N is the table size Each element in the table array is a bucket a linked list of prefixes Each prefix is associated w its own list of suffices The prefix in C Prefix stored in an array of strings ptrs to char char pref NPREF NPREF the of words in a prefix a global constant Note each string exists once in memory after call to strdup various objects just store pointers to these strings Hash table entries in C State Use chains linked list of entries to handle collisions Each entry or State is a node in a linked list It contains The key the prefix The satellite data pointer to a list of suffix pref 0 A pointer to the next State pref 1 suf next Satellite Data C More than one suffix may be associated w a given entry State Stored in a linked list List made of structs of type Suffix char word the suffix word itself pointer to another Suffix word next word next The hash table in C statetab the table itself Use chains linked list of States to handle collisions statetab is an array of pointers to States to the beginning of a linked list The hash table C a state pref 0 pref 1 suf next another state same hash key me your a suffix flowcharts word next pref 0 pref 1 suf next another suffix word next tables C eprintf These are simply some user defined functions mostly wrappers to help w the coding void eprintf char prints an error msg to stderr then exits void weprintf char prints a warning to stderr doesn t exit char estrdup char s calls strdup s to dynamically allocate memory and copy s exits if malloc fails C eprintf cont void emalloc size t n Calls malloc n Exits upon failure void erealloc void p size t n Calls realloc p n Exits upon failure void setprogname char If set uses program name in messages char progname void memmove Moves low level a block of memory memmove d s n Moves n bytes starting at s source to t target Given prefix w1 wn 1 with suffix wn memove prefix prefix 1 NPREF 1 sizeof prefix 0 prefix NPREF 1 suffix prefix is now w2 wn Just slides everybody down 1 to make get next prefix Choosing a suffix Consider this line in generate if rand nmatch 0 We walk the list of suffices For the first one we mod by 1 100 chance of choosing this one At the next one we mod by 2 50 chance of choosing this one At the 3rd 1 in 3 At the 4th 1 in 4 Convince yourself that each suffix has equal probability of being chosen


View Full Document

DREXEL CS 265 - markov

Loading Unlocking...
Login

Join to view markov and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view markov and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?