DOC PREVIEW
DREXEL CS 265 - markov

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Design and Implementation*ThemesTopicsFailed AttemptsThe Markov Algorithm - LearnSampleSample Markov Algorithm (partial list)Sample (notes)Markov Algorithm – generate textSlide 10ImplementationThe Data StructuresThe Data Structures - PythonHash TablesThe prefix in CHash table entries in C - StateSatellite Data – CThe hash table in CThe hash table – CC - eprintfC – eprintf (cont.)memmoveChoosing a suffixDesign and Implementation*Objective: To design and implement a program for a relatively small yet reasonably complicated problem. To introduce and review a variety of implementation languages and to have students review the pros and cons of different implementation choices and languages.“Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious” – Frederick P. Brooks, The Mythical Man Month*The examples in these slides come fromBrian W. Kernighan and Rob Pike,The Practice of Programming, Addison-Wesley, 1999.Themes“Once the data structures are laid out, the algorithms tend to fall into place, and the coding is comparatively easy”The choice of programming language is relatively unimportant to the overall design.Comparing implementations demonstrates how languages can help or hinder, and ways in which they are unimportant.TopicsProblem: Generate random English text that reads well.Program: some data comes in, some data goes out, and the processing depends on a little ingenuity.Implementations: C, C++, Java, PerlFailed AttemptsGenerate random letters (even with proper frequency).Generate random words chosen from a dictionaryNeed a statistical model with more structure (frequency of phrases)The Markov Algorithm - Learn"Learn" from input:Look at all n-word phrases (prefixes);Keep a list of words that follow (one-word suffices) for each prefixStore the prefix/suffix-list in a dictionaryThe key (entry) is the prefixThe satellite data, associated w/each prefix, is the list of sufficesNote, we're "faking" a multi-map. Each prefix has, potentially, several possible suffices.SampleThe following example uses a subset of Dr. Brooks' quoteUses a prefix length of 2 wordsWe won't bother w/stripping out punctuationWe won't worry about capitalisation (so, "We" and "we" are different strings)We will use a special (Null, Null) prefix to indicate the start.Sample Markov Algorithm (partial list)Input prefix Suffix words that follow(null) (null) show(null) show meshow me yourme your flowcharts tablesyour flowcharts andflowcharts and concealyour tables and andwill be mystified. obvious.be obvious (null)Sample (notes)We don't filter out duplicatesE.g., "and" appears twice for the prefix ("your", "tables")This is fine. "and" is, from the input, a nice word to follow. It should be a more probable choiceWe also use the suffix (null) to say "Here's a decent place to end our story."Markov Algorithm – generate textset w1 and w2 to the first two words in the textprint w1 and w2loop:randomly choose w3, one of the successors of w1 and w2 (in sample text)print w3replace w1 and w2 by w2 and w3SampleSay the current prefix is (me, your)1. Go to our dictionary, find the entry2. Choose a random word from the suffix list (say, "tables")3. Write that word out4. Make a new prefix: ("your", "tables")Repeat, until we choose the suffix (null), or decide we've output enough wordsImplementationSee lecture outline for links to 4 different implementationsC (see Makefile)C++JavaPerlPythonWhat are the pros and cons of the different implementations?The Data StructuresPython and Perl have everything we need built inJava and C++ provide appropriate containers in their standard librariesIn C we'll need to roll these things ourselvesThe Data Structures - PythonThe dictionary (dict) is given to usThe prefixes, the keys in the dictionary, will be 2-element tuples (immutable)The satellite data will be stored in a list (an array)If a prefix doesn't already exist in the table, we insert it, along w/an empty list, []We just append the new suffix onto the end of this listHash TablesWe need a dictionary to map a prefix to its list of suffices (given a prefix, what are the possible suffices?)In C we'll roll our own hash tableThe prefix (key) is hashed, returning a value on [0, N-1], where N is the table sizeEach element in the table (array) is a bucket, a linked-list of prefixesEach prefix is associated w/its own list of sufficesThe prefix in CPrefix – stored in an array of strings (ptrs to char):char* pref[ NPREF ];NPREF – the # of words in a prefix, a global constantNote, each string exists once in memory (after call to strdup); various objects just store pointers to these strings.Hash table entries in C - StateUse chains (linked list of entries) to handle collisionsEach entry (or State) is a node in a linked-list. It contains:The key (the prefix)The satellite data (pointer to a list of suffix)A pointer to the next Statepref[0]sufnextpref[1]Satellite Data – CMore than one suffix may be associated w/a given entry (State)Stored in a linked listList made of structs, of type Suffix:char* word, the suffix (word) itselfpointer to another SuffixwordnextwordnextThe hash table in Cstatetab – the table itselfUse chains (linked list of States) to handle collisionsstatetab is an array of pointers to States (to the beginning of a linked list)The hash table – C“me”“your”pref[0]sufnextpref[1]a state:pref[0]sufnextpref[1]another state(same hashkey):wordnextwordnexta suffix:another suffix:“flowcharts”“tables”C - eprintfThese are simply some user-defined functions (mostly wrappers) to help w/the coding:void eprintf( char *, ... );prints an error msg to stderr, then exitsvoid weprintf( char *, ... );prints a warning to stderr; doesn't exitchar* estrdup( char *s );calls strdup( s ) to dynamically allocate memory and copy s; exits if malloc failsC – eprintf (cont.)void* emalloc(size_t n);Calls malloc(n). Exits upon failurevoid* erealloc(void *p, size_t n);Calls realloc( p, n ). Exits upon failurevoid setprogname(char *);If set, uses program name in messageschar* progname(void);memmoveMoves (low-level) a block of memorymemmove( d, s, n )Moves n-bytes, starting at s (source) to t (target)Given prefix (w1, …, wn-1), with suffix wn:memove(prefix,


View Full Document

DREXEL CS 265 - markov

Download markov
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view markov and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view markov 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?