DOC PREVIEW
CMU ISR 08732 - gnusort

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Abstraction and Filtration of GNU Sortby Jin S. Choi and Philip Greenspun Site Home : Software : Abstraction, Filtration, Comparison : Sort Example This is an example of Abstraction and Filtration (in preparation for Comparison) of a program in the C language: GNU sort. The function of the sort program is to take lines of data from one or more files, specified on the Unix command line, and output the sorted concatenation of the lines. The program also accepts options, e.g., which character to use as the delimeter between lines. Algorithmically, the program makes use of the External R-Way Merge, which divides up the input into chunks that will fit into memory, sorts each chunk in memory using a merge sort and writes it to a temporary file, then merges the resulting chunks together. Merge sort is advantageous in this program over, say, Quicksort, because it uses fewer comparisons, it is stable, and it is more efficient when the input is already sorted. Examples of using sort:  sort file1 file2 > file3 Lexicographically sort the concatenation of the lines in file1 and file2 and write it into file3. Lexicographic sort is a sort in "alphabetic order", with shorter strings coming before longer strings. "10" sorts before "2", for example.  sort -nr inputfile Sort the input file in reverse numeric order, treating any leading digits as numbers, not lexicographic strings. Output the result to the console. "10" will be considered to come after "2", although in this case we are asking for reverse order, so it will appear first. Abstraction Level 0 The 4626 lines of source code in the C programming language. Abstraction Level 1 We take the raw source code, including comments, and break it down into functional pieces. For this example, we look at the sequential_sort function which does the actual work of sorting lines. In this case, the comments specify all the requirements for the inputs, give a reference to the algorithm being employed, with a helpful notation indicating that a certain optimization was employed. The basic algorithm uses the technique of mathematical induction. /* Sort the array LINES with NLINES members, using TEMP for temporary space. Do this all within one thread. NLINES must be at least 2. If TO_TEMP, put the sorted output into TEMP, and TEMP is as large as LINES. Otherwise the sort is in-place and TEMP is half-sized. The input and output arrays are in reverse order, and LINES and TEMP point just past the end of their respective arrays. Use a recursive divide-and-conquer algorithm, in the style suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23. Use the optimization suggested by exercise 5.2.4-10; this requires room for only 1.5*N lines, rather than the usual 2*N lines. Knuth writes that this memory optimization was originally published by D. A. Bell, Comp J. 1 (1958), 75. */ static void sequential_sort (struct line *restrict lines, size_t nlines, struct line *restrict temp, bool to_temp) { if (nlines == 2) { Base case. /* Declare `swap' as int, not bool, to work around a bug <http://lists.gnu.org/archive/html/bug-coreutils/2005-10/msg00086.html> in the IBM xlc 6.0.0.0 compiler in 64-bit mode. */ swap is 1 if the first line is greater than the second line, 0 otherwise. Note from the first comment that the lines array is structured backwards from what we would expect, so negative indexes are used. int swap = (0 < compare (&lines[-1], &lines[-2])); Swap the lines if called for, into the temp space if requested, or in place otherwise. if (to_temp) { Page 1 of 5Abstraction and Filtration of GNU Sort10/1/2011http://philip.greenspun.com/software/abstraction-filtration-comparison/sort/swap is 0 or 1, so -1 - 0 = -1, -2 + 0 = -2, and -1 - 1 = -2, -1 + 1 = -1 temp[-1] = lines[-1 - swap]; temp[-2] = lines[-2 + swap]; } else if (swap) { temp[-1] = lines[-1]; lines[-1] = lines[-2]; lines[-2] = temp[-1]; } } else { Inductive step. Split the lines in half. size_t nlo = nlines / 2; size_t nhi = nlines - nlo; struct line *lo = lines; struct line *hi = lines - nlo; Sort the high half, which is guaranteed to contain at least two lines, as required (if there were two lines, we wouldn't be here. If there are 3 lines, because of the way we split in half, two lines will go in the high array and one in the low. If there are more than 3 lines, then obviously nhi will be more than 2). sequential_sort (hi, nhi, temp - (to_temp ? nlo : 0), to_temp); Sort the low half if necessary. if (1 < nlo) sequential_sort (lo, nlo, temp, !to_temp); else if (!to_temp) temp[-1] = lo[-1]; Merge the high and low buffers to the destination. struct line *dest; struct line const *sorted_lo; if (to_temp) { dest = temp; sorted_lo = lines; } else { dest = lines; sorted_lo = temp; } mergelines (dest, nlines, sorted_lo); } } Abstraction Level 2 For a higher level abstraction, we divide the single file of source code into two sections: (1) headers and definitions; (2) functions. Within each section, we summarize the purpose of each block of code. Headers and Definitions  copyright  inclusion of standard system headers and references to files that define data structures and functions outside of this file  definitions of constants and macros to account for variations in the target systems  definitions of constants to assign symbolic names to various parameters used by the program  definition of data structures that will be used by the algorithm  lines  buffers  key fields  months  merge tree nodes  merge node priority queues  various lookup tables  character-type tables Page 2 of 5Abstraction and Filtration of GNU Sort10/1/2011http://philip.greenspun.com/software/abstraction-filtration-comparison/sort/ months 


View Full Document

CMU ISR 08732 - gnusort

Documents in this Course
Notes

Notes

24 pages

Citron

Citron

63 pages

Load more
Download gnusort
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view gnusort and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view gnusort 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?