Search In this Thesis
   Search In this Thesis  
العنوان
High Accuracy Database Segmentation for
Automatic Speech Recognition /
المؤلف
Gbaily، Manar Othman Mohammed El-Sayed.
هيئة الاعداد
باحث / منار عثمان محمد السيد جبيلي
مشرف / عمرو محمد رفعت
مشرف / جمال احمد السيد الشيخ
مناقش / أحمد على نشأت اسماعيل
الموضوع
qrmak
تاريخ النشر
2021
عدد الصفحات
102 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
الهندسة الكهربائية والالكترونية
تاريخ الإجازة
8/2/2021
مكان الإجازة
جامعة الفيوم - كلية الهندسة - الهندسة الكهربية
الفهرس
Only 14 pages are availabe for public view

from 102

from 102

Abstract

Nowadays automated segmentation of speech signals has been attracted many of researchers all-over the world, motivated by the great developments in computational facilities and mathematical tools. Many speech processing systems require segmentation of speech waveform into principal acoustic units. Segmentation is the process of breaking down speech signal into smaller units and considered a primary step in voiced activated systems like speech recognition and training of speech synthesis. Manual phonetic segmentation is time-consuming and expensive; it could take very long time per phone compared to that of the real time. Automatic segmentation of speech is about identifying boundaries of phonemes in each utterance. Automatic segmentation of speech is usually devoted to identify boundaries of phonemes in a given utterance. This process necessitates specification of the appropriate strategy for identification towards more accurate and precise results in extracting the intended features. In this research, TIMIT DataBase (DB) is utilized to carry on this process and justify its operation or results. Thus, this thesis presents a novel method of segmentation of speech phonemes, where the proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation. There are three main techniques of feature extraction used in our research; the first technique is the Mel Frequency Cepstral Coefficient (MFCC), the second technique is known by Best Tree Encoding (BTE), while the third is Image Normalized Encoder (INE), which is a hybrid technique between the Best Tree Image (BTI), and the Convolution Neural Network (CNN) ResNet-50. Then, data are trained using a hybrid model that consists of Hidden Markov Model (HMM), and Gaussian Mixture Model (GMM) to improve the performance of automatic speech recognition. The proposed model is tested and verified against the most widely used feature MFCC plus delta and delta-delta coefficients (39 parameters) to evaluate its performance. This approach has the potential to be used in applications such as automatic speech recognition and automatic language identification. The experimental results show that BTE technique achieved the highest success rate (𝜂) (92.64%) than using the INE technique. However, the INE technique gives confusion success rate for Transition (Tr) and Non-Transition (NTr) of values 97.1% and 99.1%, respectively.