Deep3D Alice Reyzin and Elliott Prechter 6.111 Final Project 12/9/2004Abstract The Deep3D project was an attempt to create a hardware transformation and rasterization 3d rendering system. It would read a scene of triangle from a ROM and allow the user to navigate through the scene by modifying their position and orientation through a Nintendo controller. The triangles would be transformed and drawn in wire-frame to a VGA monitor in real-time. The entire pipeline would be run through twice with different eye coordinates and rendered once in red and once in blue for each eye, thus giving a stereoscopic view of the scene. The final project ran into many issues during testing and implementation, especially regarding timing. Both the rasterization system and the transformation system would not fit on a single 10K70 FPGA, and thus only ran separately. Also, the final project did not have two rendering passes for each eye, although this was mostly due to time constraints on the project deadline.Introduction System Brief Deep3D is a digital circuit designed to transform and render a 3D world in real-time. The user will control a floating camera with three degrees of freedom: two angles and forward/backward movement. The environment is pre-specified and loaded into ROM memory. Scene data is stored as a list of triangles that all exist in world-space. Each triangle consists of 3 vertices with x,y,z coordinates stored in 17-bit sign magnitude fixed-point format, with 8-bits decimal. General control flow consists of first clearing a back-buffer. Upon completion of this step, the system takes input from the Nintendo controller, and computes an inverse transform for the current position and orientation of the user. The system then sequentially looks up each triangle, and loads the associated vertices into a vertex buffer. A camera to world transform is performed on the vertices in the vertex buffer, and then they are projected into screen coordinates. At this point the clipping unit clips the three lines against the screen, and finally the rasterizer draws to the back-buffer. Once all the triangles are drawn, the video buffers are swapped, and the process is repeated. The front-buffer is continuously output to a VGA monitor.High-Level Design Module Description and Implementation Block Diagram Figure 2: High Level Block Diagram Control FSMs High Level Control Flow Figure 1 is a view of the system at the highest level of abstraction. The system is broken down into nine main units. The Major FSM is responsible for sending a start signal to the video output module so that it clears the back buffer, and then sending a start signal to the Draw FSM to begin drawing the scene. The Draw FSM first tells the transformation system to update the camera’s position/orientation by reading input from the Input Processing module. Then, it sequentially reads in the triangles from Data. Each triangle is sent to the transformation system to convert it into three final x and y coordinates for rasterization, and the triangle is drawn in wireframe by the rasterizationmodule. The DrawFSM cycles through all triangles and then sends a done signal, signaling the MajorFSM to flip the page_sel signal so that the front and back video buffers are swapped. The MajorFSM then repeats this process continuously. Breakdown by Partner Responsibility Section 1: Modules Designed/Implemented by Elliott Math Module In order to perform 3D transformation, Deep3D needed the following capabilities: a fixed-point number representation, transform generation, transform application, transform concatenation, sin and cosine calculation, multiply and accumulate, and a way to compute multiplicative inverses. The math unit performs all of these functions. The most notable input to the math unit is op, which can be one of the following values:OP_INV 0 OP_MUL 1 OP_MUL_AC 2 OP_GEN_TRANS 3 OP_GEN_ROT_NOD 4 OP_GEN_ROT_SHAKE 5 OP_MAT_MUL 6 OP_MAT_VEC_MUL 7 These values are mostly self-explanatory. OP_MUL_AC stands for multiply and accumulate, which tells the math unit to multiply the inputs val_a and val_b, and add that value to the last value that the MAC computed. This is useful for doing operations such as dot products. OP_GEN_ROT_NOD tells the math unit to generate a rotation matrix about the angle specified in val_a (see Trig section for details on angle format), and around the x axis, and to store it in the matrix mat_sel_a. OP_MAT_VEC_MUL will be explained in the section describing the Matrix/Vector multiplier. The main job of the Math unit is to provide control signals to its sub-modules, and to handle muxing of various signals between each sub-module depending on the operation being performed. For example, the MAC’s control signals are driven externally when performing OP_MUL or OP_MUL_AC, but when performing OP_MAT_MUL or OP_MAT_VEC_MUL, the math unit needs to drive the MAC’s control signal. MAC The heart of the math unit is the Multiply and Accumulate (MAC). The number format is 17-bits: 1 bit for positive/negative and 8.8 fixed-point for the rest. Inputs consist of 17-bit a and b inputs, start and clear signals, and a 17-bit z output and a 1-bit overflow and done output. The MAC unit can be further broken down into a 8-bit/8-bit multiplier, a 17-bit converter for both to and from 2’s complement, a mux to select appropriate bits from each 8-bit multiply to add to the sum, and an fsm for control flow. When start pulses high, clear is checked: if it is one then the operation is a multiply, and if it is zero then a multiply is performed and added to the previous total. At the next stage, a series of four multiplies are performed: one for the upper/upper 8-bits, upper/lower 8-bits, lower/upper 8-bits, and lower/lower 8-bits. An addition or subtraction to an internal total register is performed after each stage, as well as necessary shifting of the product. For upper/upper multiplication, the lower 8 bits of the product are selected and added as the upper 8-bits of a 17-bit fixed point. For upper/lower or lower/upper multiplication, the product bits 12 through 0 are selected and shifted down by four to be added to total. Finally, for lower/lower multiplication, the high 8 bits of the product are selected and shifted down by 8 before adding to total. The addition and multiplication stages are done in a pipelined fashion, so that the addition of the previous multiply is
View Full Document