Integrated Systems GroupMassachusetts Institute of TechnologyH.264 Luma PredictorMaxine Lee, Alex MooreMay 17, 2006Integrated Systems Group 2Why H.264? End-to-end protocol Better compression Designed for efficient encoding ITU standard It’s on your iPodIntegrated Systems Group 3Project Scope Prediction module of H.264 Encoder Intraframe Prediction Interframe Prediction Transforms Luma only (no color information!) Why? 85%+ of encoder computation time Rich problem with lots of explorationIntegrated Systems Group 4IntraframePredictionMotivationIntegrated Systems Group 5IntraframePredictionBlock DiagramIntegrated Systems Group 6Interframe PredictionIntegrated Systems Group 7Intra-Frame Prediction Use spatial similarities to compress each frame Use neighboring pixels to make a prediction on a block Transmit the difference between actual and predicted Tradeoff : prediction accuracy vs. # control bits H.264 Answer : 4x4 and 16x16 prediction !homogenousHugegradientIntegrated Systems Group 8Intra – 4x4 Prediction 9 prediction modes Prediction proceeds left to right, top to bottom When not all boundary pixels available (i.e. we’re at border of picture), can’t predict with all the modesCurrent PixelsPreviously predictedand reconstructed blocksIntegrated Systems Group 9Intra - 16x16 PredictionaverageMode 0 : Vertical Mode 1 : HorizontalMode 2 : DC Mode 3 : PlaneIntegrated Systems Group 10Advantages/Disadvantages Encoder’s job to compare options and pick the best Exhaustive search … Uses a cost function to compare different modes 9 modes = 4 bits for every 16 pixels (!) 4 modes = 2 Good for detailed areas Lots of options Good for smooth areasIntra 4x4Intra 16x16Integrated Systems Group 11Block Diagram (Baseline)InputvideoDCTIDCTQuantIQuantChoose Prediction ModePicture ParsingGet 4x4PredictionResidualGet 16x16PredictionResidualConfigCompute4x4 CostCompute16x16 CostQPQPTry all 9 modesTry all 4 modesLoop through 164x4 blocksInitialize predictionvariablesGet best mode –Send to outputOutput(to entropy encoder)16x164x4Integrated Systems Group 12Intra – 16x16 Considerations Process Loop through the available*** modes Generate the prediction Compute cost of residual Cost ~ SAD ( sum of absolute diff ) ***What’s available? Depends on location in the frame!Get 16x16PredictionResidualCompute16x16 CostTry all 4 modesAll modespossibleOnly DCpossibleIntegrated Systems Group 13Intra – 4x4 Considerations Process: Loop through all 16 blocks For each block, loop through available modes Get ***cost = SAD + 4*P*λ(QP) Pick best mode – send to DCT Save reconstructed 4x4 block, so you can use it to predict the next 4x4 block Cost : f ( QP ), since overhead bits hurt more with higher compression P : most probable modeGet 4x4PredictionResidualCompute4x4 CostQPTry all 9 modesLoop through 164x4 blocksOverhead!!!ABIntegrated Systems Group 14Extra Concerns with Intra 4x4 Which boundary pixels do you use? Boundary depends on where in the picture you are AND which 4x4 block you’re working onOnly leftboundary available,and in anothermacroblockUpper right pixelsnot available(can extrapolate)Integrated Systems Group 15Storing Boundary Pixels To predict current macroblock, need pixels from FOUR neighbors (A-D) D can be stored in a register, since it is immediately used Pixels for previous row (A-C) have to be stored in a register file Also save A in register to limit regfile reads to 2BA CDIntegrated Systems Group 16Synthesis NumbersNote: not P+R – not enough RAM / hard disk (ask us tomorrow if you’re really curious about P+R numbers ) Total Area = 609,940 um^2 Clock Cycle = 7.27 ns (quant multiplications)9%Misc.15%Quant (with QP lookup tables )10%DCT/IDCT66%PredictorIntegrated Systems Group 17Only Three Regions of ChangeIntegrated Systems Group 18Interframe Prediction Use previous frame(s) to predict macroblocksof current frame Most of the time, majority of frame isn’t moving If change within macroblock is sufficiently small, just reproduce it exactly!Integrated Systems Group 19Interframe PredictionIntegrated Systems Group 20Interframe PredictionIntegrated Systems Group 21Interprediction Algorithm Use a motion vector to predict the current macroblock. Start at (0,0) – same block – and calculate error for each motion vector Full-Search algorithm. Try all possible motion vectors within a window Final prediction will be block given by motion vector with minimum errorIntegrated Systems Group 22Interprediction AlgorithmIntegrated Systems Group 23Interprediction AlgorithmIntegrated Systems Group 24Interprediction AlgorithmIntegrated Systems Group 25Interprediction AlgorithmIntegrated Systems Group 26Problem… Assume a window size of 16 (conservative) 1024 possible motion vectors to check per macroblock (vs. 9 for intra) 307200 possible motion vectors per frame!Integrated Systems Group 27Solution A better algorithm! Assume motion estimation gets better as we get closer to ideal motion vector. Diamond-shaped algorithm reduces points checked by ~80% with mean error per pixel about 3 (vs about 2) for FS. Hexagonal algorithm reduces by another ~35% (3.2 mean error vs 3.0)Integrated Systems Group 28Hexagonal AlgorithmIntegrated Systems Group 29Circuit Implementation ResidualAnd CostFrame BufferPredict ControlTransformsNetworkLayerIntegrated Systems Group 30Results… Results? What Results? H.264 predictor ~40x size of SMIPS processor Frame buffer adds ~18000 area (+4%) But we’re cheating (64x48 video size) Interprediction block adds ~35000 area (+7%) Performance evaluation TBAIntegrated Systems Group
View Full Document