CMSC 341 Skip Lists Looking Back at Sorted Lists Sorted Linked List What is the worst case performance of find insert Sorted Array What is the worst case performance of find insert 8 3 2007 UMBC CMSC 341 SkipList 2 An Alternative Sorted Linked List What if you skip every other node Find follow skip pointer until target this skip element Resources Every other node has a pointer to the next and the one after that Additional storage Performance of find 8 3 2007 UMBC CMSC 341 SkipList 3 Skipping Every nd 2 Node The value stored in each node is shown below the node and corresponds to the the position of the node in the list It s clear that find does not need to examine every node It can skip over every other node then do a final examination at the end The number of nodes examined is no more than n 2 1 For example the nodes examined finding the value 15 would be 2 4 6 8 10 12 14 16 15 a total of 16 2 1 9 8 3 2007 UMBC CMSC 341 SkipList 4 Skipping Every nd 2 and th 4 Node The find operation can now make bigger skips than the previous example Every 4th node is skipped until the search is confined between two nodes of size 3 At this point as many as three nodes may need to be scanned It s also possible that some nodes may be examined more than once The number of nodes examined is no more than n 4 3 Again look at the nodes examined when searching for 15 8 3 2007 UMBC CMSC 341 SkipList 5 New and Improved Alternative Add hierarchy of skip pointers 8 3 2007 every 2i th node points 2i nodes ahead For example every 2nd node has a reference 2 nodes ahead every 8th node has a reference 8 nodes ahead UMBC CMSC 341 SkipList 6 Skipping Every 2i th node Suppose this list contained 32 nodes and we want to search for some value in it Working down from the top we first look at node 16 and have cut the search in half When we look again one level down in either the right or left half we have cut the search in half again We continue in this manner until we find the node being sought or not This is just like binary search in an array Intuitively we can understand why the max number of nodes examined is O lg N 8 3 2007 UMBC CMSC 341 SkipList 7 Some Serious Problems This structure looks pretty good but what happens when we insert or remove a value from the list Reorganizing the the list is O N For example suppose the first element of the list was removed Since it s necessary to maintain the strict pattern of node sizes it s easiest to move all the values toward the head and remove the end node A similar situation occurs when a new node is added 8 3 2007 UMBC CMSC 341 SkipList 8 Skip Lists Concept A skip list maintains the same distribution of nodes but without the requirement for the rigid pattern of node sizes 1 2 have 1 pointer 1 4 have 2 pointers 1 8 have 3 pointers 1 2i have i pointers It s no longer necessary to maintain the rigid pattern by moving values around for insert and remove This gives us a high probability of still having O lg N performance The probability that a skip list will behave badly is very small 8 3 2007 UMBC CMSC 341 SkipList 9 A Probabilistic Skip List The number of forward reference pointers a node has is its size The distribution of node sizes is exactly the same as the previous figure the nodes just occur in a different pattern 8 3 2007 UMBC CMSC 341 SkipList 10 Inserting a Node When inserting a new node we choose the size of the node probabilistically Every skip list has an associated and fixed probability p that determines the distribution of node sizes A fraction p of the nodes that have at least r forward references also have r 1 forward references 8 3 2007 UMBC CMSC 341 SkipList 11 Skip List Insert To insert node Create new node with random size For each pointer i connect to next node with at least i pointers int generateNodeSize double p int maxSize int size 1 while drand48 p size return size maxSize maxSize size 8 3 2007 UMBC CMSC 341 SkipList 12 An Aside on Node Distribution Given an infinitely long skip list with associated probability p it can be shown that 1 p nodes will have just one forward reference This means that p 1 p nodes will have exactly two forward references and in general pk 1 p nodes will have k 1 forward reference pointers For example with p 0 5 0 5 1 2 of the nodes will have exactly one forward reference 0 5 1 0 5 0 25 1 4 of the nodes will have 2 references 0 52 1 0 5 0 125 1 8 of the nodes will have 3 references 0 53 1 0 5 0 0625 1 16 of the nodes will have 4 references Work out the distribution for p 0 25 1 4 for yourself 8 3 2007 UMBC CMSC 341 SkipList 13 Determining the Size of the Header Node The size of the header node the number of forward references it has is the maximum size of any node in the skip list and is chosen when the empty skip list is constructed i e it must be predetermined Dr Pugh has shown that the maximum size should be chosen as log 1 p N For p the maximum size for a skip list with 65 536 elements should be no smaller than log 2 65536 16 8 3 2007 UMBC CMSC 341 SkipList 14 Performance Considerations The expected time to find an element and therefore to insert or remove is O lg N It is possible for the time to be substantially longer if the configuration of nodes is unfavorable for a particular operation Since the node sizes are chosen randomly it is possible to get a bad run of sizes For example it is possible that each node will be generated with the same size producing the equivalent of an ordinary linked list A bad run of sizes will be less important in a long skip list than in a short one The probability of poor performance decreases rapidly as the number of nodes increases 8 3 2007 UMBC CMSC 341 SkipList 15 More performance The probability that an operation takes longer than expected is function of the associated probability p Dr Pugh calculated that with p 0 5 and 4096 elements the probability that the actual time will exceed the expected time by more than a factor of 3 is less than one in 200 million The relative time and space performance depends on p Dr Pugh suggests p 0 25 for most cases If the …
View Full Document