DOC PREVIEW
UH COSC 6385 - COSC 6385 Homework

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Edgar GabrielCOSC 6385 Computer Architecture - HomeworkEdgar GabrielSpring 2011COSC 6385 – Computer ArchitectureEdgar GabrielHardware performance counters• set of special-purpose registers built into modern microprocessors to store the counts of hardware-related activities within computer systems• low overhead compared to software based methods• types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations.2COSC 6385 – Computer ArchitectureEdgar GabrielOverflow handling• generate an overflow signal after every thresholdevents are counted– each counter has to be registered separately– the value of each registered hardware counter is maintained separately– (LONG_)LONG_MAX: • 32 bit: 2,147,483,647• 64 bit: 9,223,372,036,854,775,807• overflow_handler(): user-defined function to process overflow events.– function will be called by the PAPI library every time the threshold is reachedCOSC 6385 – Computer ArchitectureEdgar Gabriel• overflow_vector: a bit-array that can be processed to determined which event(s) caused the overflow– e.g. using PAPI_get_overflow_event_index()• Software vs. hardware overflow:– if processor does not support hardware overflow, software emulates it be periodically checking the counter values– software overflow handling inaccurate and more expensive than hardware handling– often implemented using a zero-crossing algorithm• value of counter is set to –threshold and increased accordingly3COSC 6385 – Computer ArchitectureEdgar Gabriel1stAssignment• Rules– Each team should deliver• Source code (.cpp, .h and Makefiles files)– Please: no .o files and no executables!• Documentation (.pdf, .doc, .tex or .txt file)– Deliver electronically to [email protected]– Expected by Friday, March 11, 11.59pm– In case of questions: • ask the TAs first, if he doesn’t know the answer, he will ask me.• Ask early, not the day before the submission is dueCOSC 6385 – Computer ArchitectureEdgar GabrielAbout the Project• Given the source code for sequential image segmentation code ( File jpeg-d.tar.gz).– You can open the archive with tar –xzvf jpeg-d.tar.gz– The archive contains the among others following files• Makefile /* To compile everything on Linux/Unix */• jpeg.c /* The main file */• huffman.c• parse.c4COSC 6385 – Computer ArchitectureEdgar GabrielAbout the Project• The JPEG Format is a method devised by the Joint Photographic experts group to reduce storage and transmission requirements.• A JPEG image is composed of 8 × 8 pixel blocks known as Minimum Coded Units (MCUs).• In encoding,– Each unit is converted to a frequency domain representation using discrete cosine transform.– The data is then coded with Huffman codes to allow more frequent values to be stored as shorter codes.• Decoding does the inverse,– The coded data is decoded using Huffman's algorithm with the data from huffman tables.– Frequency components are extracted and inverse discrete cosine transform (IDCT) is applied to recover the 8x8 MCU blocks.COSC 6385 – Computer ArchitectureEdgar GabrielAbout the Project• Start the application by– Compiling: just type make– Run: • allocate a node (see later in the lecture)• type: ./jpeg <name of image>• There are two images to test image1 is smaller (around 3MB) and image2 is larger (around 7MB)5COSC 6385 – Computer ArchitectureEdgar Gabriel• Part 1:Instrument the code in order to use hardware performance counters to determine the behavior of different portions of the code separately• unpack_block in (parse.c:314)• IDCT in (parse.c:319)• get_symbol in (huffman.c :53)• RGB_save in (jpeg.c : 268)COSC 6385 – Computer ArchitectureEdgar Gabriel• The hardware performance counters should be based on the PAPI library, and you could monitor the following values:– L1 and Level 2 Cache Hits and Cache misses– Translation look aside buffer misses– stall cycles waiting for various events – conditional branch instructions mispredicted• Whether you can access these values will depend on the processor you are really using!• You will have to add code to handle counter overflow or convince me otherwise that overflow does not occur. If you just ignore this item, you will loose points.6COSC 6385 – Computer ArchitectureEdgar Gabriel• Part 2:Run the modified code on the shark cluster. Generate graphs for at least 3-5 PAPI hardware counters showing the values for each part of the code identified in Part 1 separately for both images.. Please document (you can use PAPI to figure many of these things out!) :– Processor type, frequency– Operating System (as precisely as possible)– Cache sizes– Each team has a single accountCOSC 6385 – Computer ArchitectureEdgar Gabriel• Part 3 ( only for the two-person teams!)– Generate an estimate of the cache usage of the original code (without PAPI calls in it) using the valgrind toolkit with cachegrind, e.g.valgrind –tool=cachegrind ./jpeg <problemsize>– If possible, compare the data produced by valgrind to the data obtained with PAPI– Note: the execution of the application using valgrind/cachegrind will be significantly slower than without it!7COSC 6385 – Computer ArchitectureEdgar GabrielNotes• The PAPI version installed on shark is 3.7.2• On the front-end node you can find tons ton’s of examples in C and Fortran on how to use PAPI in /opt/papi-3.7.2/share/examples/ctests. E.g.– all_events.c -> how to check on a processor whether a counter is available– low-level.c -> how to use the low-level API of PAPI– memory.c -> how to extract information of the memory subsystem (e.g. cache sizes)– overflow_index.c -> how to handle overflow correctlyCOSC 6385 – Computer ArchitectureEdgar Gabriel1stAssignment• The Documentation should contain– (Brief) Problem description– Solution strategy– Results section• Description of resources used • Description of measurements performed• Results (graphs + findings)8COSC 6385 – Computer ArchitectureEdgar Gabriel1stAssignment• The document should not contain– Replication of the entire source code – that’s why you have to deliver the sources– Screen shots of every single measurement you made• Actually, no screen shots at all.– The slurm output filesCOSC 6385 – Computer ArchitectureEdgar GabrielHow to use a cluster• A cluster usually consists


View Full Document

UH COSC 6385 - COSC 6385 Homework

Download COSC 6385 Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view COSC 6385 Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view COSC 6385 Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?