Author: Elgbbas,Enas MahmoudMahmoud Mohamed/ Title: AnAdaptive Hybrid AlgorithmforDocument Images Binarization subject toComplex Background \

Search In this Thesis

العنوان

AnAdaptive Hybrid AlgorithmforDocument Images Binarization subject toComplex Background \

المؤلف

Elgbbas,Enas MahmoudMahmoud Mohamed

هيئة الاعداد

باحث / ايناس محمود محمود محمد الجباس

مشرف / حازم محمود عباس

مشرف / محمود إبراهيم خليل

مناقش / السيد عيسى عبده حميد

تاريخ النشر

2019

عدد الصفحات

139p.:

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2019

مكان الإجازة

جامعة عين شمس - كلية الهندسة - قسم هندسة الحاسبات والنظم

الفهرس

Only 14 pages are availabe for public view

from

177

from

177

Abstract

In this thesis, we propose two adaptive and hybrid methods for binarization
of historical document images. The rst method uses Otsu
multilevel algorithm, and depends on the image contrast and the
estimated stroke width. This method is proposed for binarization
of historical document images suering from various types of degradation,
such as non-uniform background, faint text, low contrast,
stain, bleed-through, and shadow. Solving all these problems eectively
is a challenge. Focus on noise elimination may cause loss of
faint text. On the contrary, faint or low contrast text extraction
may produce noisy images. Therefore, we classied the investigated
images into two groups using a suggested factor. The rst group
includes uniform background images that may contain faint text or
shadow, while the second group includes non-uniform background
images that may suer from stain, bleed-through, shadow or faint
text. To extract this factor, the background of the investigated image
is initially estimated, then global Otsu multilevel is applied and
dividing it into three regions. The dierence between the average
intensities of the darkest and brightest regions is an indicator of the
image class. For each group, an adaptive and hybrid binarization
technique is suggested. For the rst group, global Otsu is applied to
the grayscale image and the stroke width is estimated. Areas that
are more likely still contain missing text are identied adaptively ...
and binarized separately using a pseudo-local version of Otsu multilevel
method, and lost text recovered based on the stroke width.
Faint text, shadow or background noise are distinguished based on
the image contrast. The clarity of text is increased by using a dynamic
window size for local binarization (Niblack method is used
for thin pen stroke text and Otsu otherwise). For the second group,
non-uniform background and most of stain and bleeding-through are
removed by normalization, then global Otsu is applied and the stroke
width is estimated. Text lost during normalization is restored. The
remaining stain and bleed-through objects are detected depending on
estimated stroke width, then they are locally binarized. Finally, a
post-processing step based on the estimated stroke width is applied
to remove the shadow. The proposed method is evaluated using
seven databases DIBCO09, H-DIBCO’10, DIBCO’11, H-DIBCO’12,
DIBCO’13, H-DIBCO’14, and H-DIBCO’16. The average F-measure
for each database 90.7%, 89.1%, 88.9%, 88.7%, 88.9%, 93.3%, and
89.4% respectively.
The second proposed method is suitable for normal illuminated document
images that incorporate the advantages of Otsu and spectral
clustering algorithm. To overcome the noise problem, a preprocessing
step is applied to the document image. After that, the
resulted image is binarized using Otsu producing a binary image. As
a nal step, the spectral clustering algorithm is locally applied on the
original image, with the aid of the binary image, to retrieve faint text.
The proposed design of spectral clustering provides a signicant reduction
of the similarity matrix computing time and size used, without
aecting the quality of clustering. The proposed method is evaluated
using the uniform background images taken from DIBCO09,
H-DIBCO’10, DIBCO’11, H-DIBCO’12, DIBCO’13, H-DIBCO’14.