الفهرس | Only 14 pages are availabe for public view |
Abstract This thesis is mainly concerned with text-independent Speaker Recognition (SR). Generally, the Automatic Speaker Recognition (ASR) system can be classified into two main categories: text-dependent SR and text-independent SR. In text-dependent SR, all speakers are committed to use the same sentence in both training and testing phases. On the other hand, in text-independent SR, speakers are free to use any sentences in the training and testing phases. The SR process in general depends on the extraction of features from the speech signals. The textindependent SR task is harder to implement than the text-dependent SR task. Two proposed approach are introduced in this thesis for text-independent SR. The first proposal depends on extracting features and utilization of Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to identify the speakers. The utilized features are Mel Frequency Cepstral Coefficients (MFCCs), spectrum magnitude bins, and log spectrum magnitude bins. The second proposal depends on the generation of spectrogram images from the speech signal patches. These spectrogram images are utilized in the classification process with a Convolutional Neural Network (CNN). The reverberation is a severe effect that exists in closed rooms. A proposed speech classification system is introduced to classify the speech signals into reverberant or not using the LSTM-RNN and the CNN. The effects of noise, reverberation, and interference are considered in this study. Moreover, speech enhancement techniques such as spectral subtraction and wavelet denoising are considered in this thesis to enhance the performance of the SR process. These enhancement methods are used as a pre-processing steps prior to the ASR system. In addition, Radon Transform (RT) is used for better representation of speech signals in the presence of noise as it is robust to the noise effect. The Radon projection of the spectrogram of speech signals is obtained at different orientation or angles, A DCT is then taken after applying Radon projection. The performance of the ASR system with Radon features is compared to that with MFCCs and spectrum. Also, the effect of interference on the ASR system is studied. The interference effect is cancelled with a signal separation algorithm that is used as a pre-processing step prior to the ASR system to boost its performance. For pattern security of the SR system, cancellable SR is presented in this thesis with an approach that depends on spectrogram patch selection based on a user-specific key. The Cancellable pattern is used to protect the user privacy and increase the its security. Simulation results prove the high efficiency of the proposed approaches for text-independent SR with the enhancement methods, Radon based features and blind signal separation. Also, the results reveal that, the suggested cancellable approach is practical, and satisfies the desired criteria of renewability, security [which means that the template can be changed if it is compromised], and high performance [which is near to the performance of the system with the original template]. |