Rice ELEC 525 - A Simulation Improving throughput and Reducing PCI bus Traffic

Unformatted text preview:

Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal of this project was to show how using a network processor with external memory for caching server requests could potentially provide a significant increase in server throughput and PCI bus bandwidth. Server applications are becoming more important as the internet continues to expand. We expect that the future will continue to bring more web users each year. These users will likely turn to the internet for the latest news updates, for online shopping, for research, and countless other areas of interest. As the number of users a web server must handle increases, server throughput will become increasingly important, especially for servers handling a large number of clients. Therefore, we believe that by replacing the standard NIC with a network processor connected to an external memory used for caching server requests, we can significantly increase the throughput while also benefitting PCI bus bandwidth. While this idea would benefit a large number of servers, there are some that might not reap benefits as great. Servers that primarily handle streaming media requests or dynamic web pages are not likely to see the maximum benefit our expected results show. We believe that servers dealing with static web pages are those that can expect to see the most improvement in throughput. However, the idea may still be applicable to non-static request handling servers with some modifications. Our simulator model, however, is based on a static2request handling server. Also of note, from a single user or client perspective, there may not be any discernible improvement in server performance. The primary benefit comes from being able to handle more requests in a shorter amount of time, thus allowing the server to increase its overall performance. So while a particular user may not be excited about the concept, websites that consistently see periods of high traffic on any given day would likely be interested in the results presented here. The idea is to use a network processor connected to a DRAM for caching frequently requested data packets coming from a server. As a request is made, the host machine will seek the requested data from its memory and send the data out to the network processor. The host will also signal the network processor to cache the data in its own memory. The next time a request is made for the same data, the host CPU will not need to send out the data again. Rather, it will simply send out header information for the packet and information detailing the location of the data in the network processor memory as well as the length of the data. The network processor will then concatenate the header with the requested packet data and complete the response by sending it out to the Ethernet. First, this allows the CPU to handle the next request sooner rather than having to prepare the packet again for transmission. Also, since the entire packet is not required to be sent to the network processor again, the traffic over the PCI bus is reduced by the size of the requested packet that could potentially be considerable over enough requests. Thus, we also improve the bandwidth over the PCI bus. 2 Simulator Architecture In order to test this idea, we have created a simulator for this system using C++. We have modeled the host CPU, host main memory (DRAM), host disk, network processor, network processor external memory (DRAM), PCI bus, packets, and Ethernet. We are particularly interested in measuring the average response time for a request for the system and3the total amount of traffic over the PCI bus. This will give us the percentage improvement of our network processor based simulations over our baseline simulation lacking a network processor. The CPU and NP are not modeled in detail. Rather, we have modeled a processing time for each component which is essentially a constant amount of time added to each request/response pair, and the clock times of each component. The processing time is the amount of expected overhead time to handle the request. The processing times and clock speeds can be changed in the configuration file (see attached sheet listing all parameters, note that listed parameters are default baseline parameters). The Ethernet is modeled in terms of a request frequency and number of requests. Both are parameters that may be varied. The number of requests is simply the total number of requests that will be sent through the system. The request frequency is the rate at which requests are made on the system in terms of requests per second. The throughput is measured as the amount of time it takes from the initiation of a request until a response for the request is completed. Each request is given a unique identification number in sequential order so that the total number of requests can be monitored. The requests are also given a second ID number that is not unique. If this number is the same as that on another request, it means the requests are for the same data. This is how we determine if the network processor memory contains the request or not. The PCI bus between the network processor and the host computer is modeled as a 64-bit bus. It is set up as a parameter, but is held as having an 8-byte transfer rate per cycle for our simulations. We monitor the total amount of data (number of bytes) that is transferred over the PCI bus in either direction. Also, we have not modeled any contention for the bus. Data can be traveling in both directions at the same time in our simulator. We would look to modeling the bus more accurately in the future.4 Packets are modeled for requests and responses. Request packets are modeled as having a default request size of 1024 bytes. This means all requests require that 1024 bytes be transferred over the PCI bus from the network processor to the host machine. The other major parameter for request packets is the number of different requests. This controls the number of different IDs that are available to requests. These IDs are attached to requests at random. As an example, if the number of different requests is set to 1, the request will always be processed from the network processor’s memory after the initial miss in its memory. The response packets on the other hand are modeled


View Full Document

Rice ELEC 525 - A Simulation Improving throughput and Reducing PCI bus Traffic

Download A Simulation Improving throughput and Reducing PCI bus Traffic
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Simulation Improving throughput and Reducing PCI bus Traffic and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Simulation Improving throughput and Reducing PCI bus Traffic 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?