UT EE 382C - Optimal Architectures for Massively Parallel Implementation of Hard Real-time Beamformers

Unformatted text preview:

1Optimal Architectures for Massively Parallel Implementation of HardReal-time BeamformersFinal ReportThomas Holme and Karen P. Watkins8 May 1998EE 382CEmbedded Software SystemsProf. Brian Evans2Optimal Architectures for Massively Parallel Implementation of HardReal-time BeamformersAbstractThis paper reports an experimental analysis of real-time computational architectures applied toa digital time delay beamformer implementation. The goal of this research has been to identify themost efficient multiprocessor utilization for a prototypical beamformer by modeling the signalprocessing and applying selected multiprocessor scheduling algorithms. The analytic methodcentered on modeling the most computationally intense core of the beamformer signal processingin Ptolemy. First, the Synchronous Dataflow (SDF) domain model used to implement thebeamformer core is presented. By exercising the model, the computational results were comparedto an existing equation-based model for validation. Secondly, the SDF model was translated intoa Code Generation in C (CGC) domain model, in order to evaluate multiprocessor schedulingperformance. Four well-known automated scheduling strategies were experimentally applied:declustering, a classical list scheduler, dynamic-level scheduler, and hierarchical scheduler. Amanual heuristic schedule was also postulated and evaluated. Several key metrics were applied injudging optimality. For this application, it is demonstrated that the hierarchical scheduler holdsmeasurable advantage over the other algorithms considered, including the manual method.Resulting metrics are reported along with an analysis of the underlying performance drivers.3I. Introduction.Computationally intense spatial filtering called beamforming is central to acoustic imaging.Typically, in order to achieve an update rate similar to slow scan television, beamforming mustoperate in real-time. The real-time nature of beamforming leads to parallel digital signalprocessing for state-of-the-art applications in diverse fields such as seismic exploration,aeroacoustic measurements, ultrasonic medical imaging, and underwater acoustics [1].Many times, these applications require a large number of processors running algorithms inparallel. Efficiency in memory size, processor count, and execution time is critical to achieving anoptimal system solution. Digital signal processor (DSP) attributes such as on-chip memory,instruction cache, and direct memory access (DMA) can be exploited to achieve optimization.Modeling and synthesis of the system, using dataflow graphs, can actualize these efficiencies.In particular, Synchronous Dataflow (SDF) is appropriate for systems exhibiting hard real-timedata-independent control flow, such as beamformers. The primary goal of this research has beento identify the most efficient multiprocessor utilization for a prototypical beamformer by modelingthe signal processing and applying selected multiprocessor scheduling algorithms. Here, wedetermine the optimal scheduling for the process where the criterion for optimality is minimizationof the required cycle counts with a constrained number of processors.In this paper, we first review SDF theory and multiprocessor scheduling algorithms. Aprototypical beamformer problem is defined along with its corresponding SDF model. We invokeselected multiprocessor scheduling algorithms in order to find an optimized solution. The resultsare presented along with an analysis of the performance of the algorithms.II. SDF Modeling Approach for Optimization.SDF is a natural representation for signal processing algorithms, such as beamforming, thatare amenable to compile time scheduling techniques [2] [3]. In SDF, when an actor is fired, itconsumes and produces a fixed number of tokens. Since these parameters are known at compile-time, algorithms represented by SDF can be statically scheduled. By scheduling, we mean thetask of assigning actors in the dataflow graph to processors, ordering execution of these actors,4and determining when an actor fires so that all data precedence constraints are met [3]. A validschedule is finite and fires each actor at least once, does not deadlock, and produces no netchange in the number of tokens queued on each edge [4].The schedule has a great impact on code size. Bhattacharyya, et al. illustrates this concept asshown in Figure 1 [4]. This figure lists several valid schedules including schedules with single andmultiple appearances of the actors. If each actor appears only once in the schedule, no codeduplication occurs, which results in an optimal graph.When more than a single processor is used, additional optimization can be realized whilemapping the nodes onto multiple processors. Prior research into optimal multiprocessorschedulers has produced several algorithms including declustering (DC), a classic list schedulerbased on Hu’s work (HU) [5], dynamic-level scheduling (DLS), and the hierarchical schedulingalgorithm. The DC algorithm targets systems using homogeneous architectures that communicatethrough global shared memory. This is a multi-pass algorithm that attempts to minimizeinterprocessor communication (IPC) cost by scheduling all communications as well as allcomputations [6]. The HU algorithm assigns each node a priority and places the nodes in a list,sorted in order of decreasing priority. A global time clock regulates the scheduling process byassigning nodes onto processors as they become available. This algorithm does not consider IPCcosts [7]. The DLS algorithm modifies the HU algorithm by eliminating the global clock in aneffort to reduce IPC costs. This allows the algorithm to consider all processors, includingprocessors still “busy”, as candidates for scheduling at each step [7].The hierarchical scheduling algorithm clusters nodes in a manual or automated process tooptimize the schedule [2]. The clustered subgraphs are then scheduled onto single processorsusing uniprocessor SDF schedulers. Clustering mitigates the problem of fully expanded graphsA BC20 10 20 10a) SDF Graph1. ABCBCCC2. A(2B(2C))3. A(2B)(4C)4. A(2BC)(2C)b) Periodic Schedules1. 502. 403. 604. 50c) Buffer SizesFig. 2. Periodic Schedule Listing for SDF Graph5exploding in size but can introduce deadlock during this process. This algorithm guarantees thatthe clustered graph will not deadlock by applying the SDF composition theorem.III. Time Delay Interpolation BeamformingThe beamformer function studied here is a


View Full Document

UT EE 382C - Optimal Architectures for Massively Parallel Implementation of Hard Real-time Beamformers

Documents in this Course
Load more
Download Optimal Architectures for Massively Parallel Implementation of Hard Real-time Beamformers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Optimal Architectures for Massively Parallel Implementation of Hard Real-time Beamformers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Optimal Architectures for Massively Parallel Implementation of Hard Real-time Beamformers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?