Author: Thabit,Doha Waleed Waheed El-Deen/ Title: An Architecture for Deep Learning Networks Acceleration\

Search In this Thesis

العنوان

An Architecture for Deep Learning Networks Acceleration\

المؤلف

Thabit,Doha Waleed Waheed El-Deen

هيئة الاعداد

باحث / ضحى وليد وحيد الدين ثابت

مشرف / محمد أمين ابراهيم الدسوقي

مشرف / محمد واثق على كامل الخراشى

مناقش / خالد على حفناوى شحاتة

تاريخ النشر

2024.

عدد الصفحات

70p.:

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2024

مكان الإجازة

جامعة عين شمس - كلية الهندسة - قسم هندسة الالكترونيات والاتصالات الكهربية

الفهرس

Only 14 pages are availabe for public view

from

110

from

110

Abstract

Deep Neural Network is the most trending approach among the different machine learning ones due to its ability to learn many features from a hierarchical data representation
for many complex classification applications. In these complex applications, the learning
process is performed on sufficiently large datasets by going deeper in network configuration consolidating more network layers to obtain high accuracy results. Subsequently, the
energy consumption, area, and latency increase, so hardware acceleration is employed
to minimize the computational overhead by relocating the necessary training and inference tasks from CPUs to dedicated hardware platforms that encompass specialized
architectures tailored for handling comparable network workloads.
The softmax layer is a widely recognized non-linear activation layer in most deep neural
networks, playing a crucial role in various classification domains more broadly. The softmax function comprises expensive exponential and division units, which cause overflow
problems, low accuracy, big area, and low throughput. Thus, it is a challenge to have
an efficient hardware implementation for the softmax layer with high accuracy and low
cost.
The purpose of this thesis is to present a high-accuracy implementation of the softmax
layer, designed for efficient hardware acceleration in image classification tasks involving
multiple categories without being resource-consuming. The key feature of this implementation is changing the exponential base of the softmax function and hence, the complex operations in the traditional softmax will be replaced by simple shift and addition
operations, with simpler Look-Up Tables and higher accuracy.
The implemented hardware model is instantiated using Verilog Hardware Description
Language, relies on single-precision floating-point arithmetic cores. Additionally, an
evaluation setup for the model is established to offer a meaningful performance estimate.
To assess the model, a dataset is chosen from open standard benchmarks, allowing for
a comparison with a standard reference and prior implementations of the layer. The
model achieves classification accuracy equal to 100% relative to a reference model and
has an area of 0.0802 mm2 with a power consumption of 8.93 mW when synthesized
under TSMC 28nm CMOS technology at the frequency of 1 GHz