![]() | Only 14 pages are availabe for public view |
Abstract Abstract Recent developments in real time image and video processing employed in multimedia and communication systems require fast image and video compression techniques. Two dimensional Discrete Cosine Transforms (2D-DCT) are widely used in modern compression standards for their high power compaction properties. Multiplier free approximate DCT transforms have been developed to proceed faster than the original DCT while maintaining comparative levels of power compaction. These approximate DCT transforms may be efficiently realized in digital very large scale integration hardware using addition, subtraction and shift operations. The approximate DCT transforms may be also efficiently implemented on parallel computing processors to achieve high speed up levels. The Graphics Processing Unit (GPU) is the most powerful parallel processing tool especially with the dedicated Compute Unified Device Architecture (CUDA) programming. GPU and CUDA are efficiently employed in many modern image and video based systems and computer applications. The GPU is much faster than the CPU for dealing with large data as in the case of image and video processing. The work presented in this thesis is twofold. The first part introduces a multiplierless efficient and low complexity 8-point approximate DCT transform. The proposed transform is derived by applying the signum function operator to a transform with high power compaction capabilities. The signum function cancels out the shift operators required by such transform and consequently reduces the hardware and software errors and speeds up implementation. A fast implementation of the proposed transform is provided. Only 17 additions are required for both forward and backward 8-point transformations. The compaction and II compression properties for the proposed transform are demonstrated through simulations employing data bases of gray images with different sizes. It is shown that the proposed algorithm outperforms the most recent competitive transforms. In the second part of the thesis, a fast and efficient GPU implementation for the proposed transform is provided employing CUDA programming. The details of the proposed GPU architecture and the employed CUDA modules are investigated. Performance evaluations show that the suggested implementation for the proposed transform explored in the first part is much faster than other approximate DCT transforms in real time Joint Photographic Experts Group (JPEG) like compression. The proposed GPU implementation has achieved high speed ups over conventional CPU implementation. The achieved speedup ranges from x151 to x202 according to different image sizes. |