DOC PREVIEW
WUSTL CSE 362M - Homework #1

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSE-362 Homework 1 September 2006 READ: • "Statement on Academic Integrity". • Heuring & Jordan (H & J), Chapters 1 & 2.0 - 2.3 . • Review: Appendix A: A.1 - A.9.5 (Logic Design Review) • VHDL reading (if needed): Yalamancheli, Chapters 1 through 3.3. DUE: Wednesday, Sept.13, 2006 (hand in at class meeting) CLASS WEB PAGE: http://www.cse.wustl.edu/~jbf/cse362.d/cse362.html 1. Clock Rates, Feature Sizes and Chip Component Counts Recently announced high performance desktop processors have clock rates in the GHz range. 1.1 What is the clock period associated with a processor that has a 3.8GHz clock ? 1.2 If the die size for such a processor is 1.6cm × 1.6cm (i.e., 1.6cm on a side), how much time does it take for a signal to traverse the diagonal of the die if it is limited only by the speed of light? 1.3 Say that due to electrical considerations (i.e., resistance and capacitance of conductors) signals on plain metal wires travel on average at about 1.2cm/ns (this calculation is actually much more complicated with delay varying approximately as the square of the distance when no repeaters are utilized). Approximately how long does it take to traverse the diagonal of the die, and how many clock periods does this represent? 1.4 The feature size, z, associated with a given VLSI (Very Large Scale Integration) technology designates the width of the smallest feature that can be "written" on a silicon die. This is directly related to the smallest transistor that can be fabricated, and is a key factor in determining the transistor speed and gate densities. Say that a (1.6cm)2 chip has 100 million transistors on it, and that 40% of the chip die is taken up with wires, input/output pads, and empty space. If a typical transistor takes an area of about (2 * z)2, what is the value of z? 1.5 If a simple logic component (e.g., a 2-input NAND gate) requires an area of about (12 * z)2, how many logic components would we have on this chip? Note: The original INTEL processors had feature sizes of about 5 microns. 12.0 Logic Delays in Multiplexors Consider a 4-to-1 Multiplexor implemented out of basic 2-input gates. 2.1 Draw the logic diagram for the 4-to-1 multiplexor. If the delay through the inverter, AND and OR gates are each 1 time unit, determine the maximum delay (in terms of time units) associated with the multiplexor (ignore wire delays). 2.2 Say the time unit equals 0.8ns (nanoseconds). If a design requires that the clock period be a least as large as the delay through such a multiplexor, what is the maximum clock frequency that can used? 2.3 Say that the multiplexor design can utilize 2 or 4-input gates. How does that affect the design? Draw the logic diagrams. 2.4 Design a 16-to-1 multiplexor using the 4-to-1 multiplexor as a component. 2.5 (Preliminary VHDL work) Write a VHDL (structural) program for the 4-to-1 multiplexor using 2-input gate components (you can use VHDL AND, OR, etc. primitives). 2.6 (Preliminary VHDL work) Write a VHDL (structural) program for the 16-to-1 multiplexor using the 4-to-1 multiplexor of 2.4 as a component. NOTE: 2.5 & 2.6 will be implemented and tested as part of the next assignment. If you wish to do it now as part of this HW that’s fine. 3.0 Heuring & Jordan: Problem 1.14a & 1.14b 4.0 Performance: A computer lacking floating point (FP) hardware instructions is being used in a real-time application. FP operations are implemented by calling a procedure (using available instructions) that simulates FP operations. Suppose that these FP procedure calls are responsible for 30% of program execution time. To obtain better performance it has been proposed that the computer be redesigned to include hardware floating point instructions and it has been estimated that the presence of such hardware would speed up the FP operations by a factor of 20. Another suggested approach suggested is to redesign the entire computer so that all instructions run twice as fast, but not to include any FP hardware instructions (FP procedure calls would still be utilized). Determine the speedup associated with each alternative and which is best. Assume the costs are equivalent. 25.0 3,2,1 & 0 Address Machines: 5.1 Write a program to implement the expression: A = B×C + (B-C)×D on 3-, 2-, 1- and 0-address machines. Do not rearrange the expression or change the values on any of the operands. Note the number of instructions required for each machine type. 5.2 Compute the total memory traffic in bytes for both instruction fetch and instruction execution for the code that implements the above expression for all four machines. Assume that opcodes occupy one byte, addresses two bytes, and data values two bytes. 5.3 Based on the results of 4.1 and 4.2, are there any conclusions one can draw from the comparison of the four machine styles? 6.0 Sorting on Chip Multiprocessors (CMP): Say that a four-processor CMP was available as shown below. The processors are identical. Say we have a random array of 1024 positive integers that we would like to sort in ascending order. Assume that, from the memory input/output port, we have a way of loading sequential sub-blocks of the 1024 integer array into the memory in whatever sections of memory that we choose. Develop an algorithm for utilizing the four processors to sort the array. The memory is 4MB in size and is divided into four equal sections with each processor assigned to one unique section. Within its section, the processor can both read and write to memory. Processors can also read from other memory sections. 6.1 Describe in words the manner in which the array is initially loaded into the memory. 6.2 Clearly describe in words and/or using a higher level language oriented pseudo-code how your algorithm operates. 6.3 Ideally, with four processors, one would expect to obtain an execution time speedup of 4 over a single processor system. What speedup does your algorithm achieve (with what assumptions)? What other factors need to be known to determine this speedup more precisely?


View Full Document

WUSTL CSE 362M - Homework #1

Download Homework #1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework #1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework #1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?