![]() | Only 14 pages are availabe for public view |
Abstract Deep Neural Networks are a recent trend that is currently explored as an innovative solu-tion to complex problems in industrial markets and research as well. Image classification is from the core domains of deep learning that achieved signi cant enhancements in the accuracy of classification. However, these improvements are a result of going deeper in network design incorporating more network layers, which adds on the computational load of the network. Thus, the increasing sizes of today’s networks are causing a huge bottleneck in the networks’ training time and inference time. Hardware acceleration is used to reduce the overhead resulting from computations by moving the required tasks during training and inference from CPUs to a hardware platform. These hardware platforms include domain speci c architectures that are specially designed for similar network loads. The softmax layer is a well-known type of non-linear activation layers. It is considered a key layer not only in most image classification net-works, but also in other classification domains on a more general level. As the softmax layer is composed of complicated exponents and includes multiple division arithmetic operations, its acceleration is a challenging task to achieve efficiently. The purpose of this thesis is to propose an optimized architecture for softmax layer to be used in hardware acceleration for any image classification task of multiple categories. The target of the hardware model is to preserve the classification accuracy and achieve a balance for the trade-o between the design performance and its resulting cost. There are multiple schemes used in the design to optimize the load in computations by observing the input patterns to the softmax layer. Those patterns are used as a foundation for selecting the suitable input-downscaling method. The area overhead is also optimized by reducing the accuracy of some arithmetic operations based on their contribution to the classification accuracy from the mathematical representation. The architecture of the model proposed in this thesis is implemented in verilog HDL. A setup for assessment of the model is also implemented to provide sensible estimation of performance. from the open standard benchmarks available, a dataset is selected and used for the assessment in comparison with a standard reference and previous implementations of the layer. The methods used in this work resulted in achieving a 99.13% accuracy for the classifier using the hardware layer, where this accuracy level is the same predictive accuracy obtained by the reference layer. |