SARI Toufik

Publications internationales

2018

(2018), Tool for automatic tuning of binarisation techniques. IET Image Processing : The Institution of Engineering and Technology, http://digital-library.theiet.org/content/journals/10.1049/iet-ipr.2018.5132

Résumé: Most of the proposed binarisation methods include parameters that must be set correctly before use. The determination of the values of these parameters is made most of the time manually after several tests. However, the optimum parameter values differ from an image to another and therefore the parameterisation shall be carried out for each image separately. In fact, as this task is very difficult, even impossible for large collections of images, the tuning is usually done once for the entire image collection. In this study, the authors propose a tool for automatic and adaptive parameterisation of binarisation techniques for each image separately. The adopted methodology is based on the use of an artificial neural network (ANN) to learn the optimal parameter values of a binarisation method for a set of images (training set), based on their features, and to use the trained ANN to determine the optimal parameter values for other images not learned. Several experiments have been conducted on images of degraded documents and the obtained results are encouraging.

2016

(2016), Structural feature-based evaluation method of binarization techniques for word retrieval in the degr. International Journal on Document Analysis and Recognition (IJDAR) : Springer, http://link.springer.com/article/10.1007/s10032-015-0254-y

Résumé: One of the most important and necessary steps in the process of document analysis and recognition is the binarization, which allows extracting the foreground from the background. Several binarization techniques have been proposed in the literature, but noneof themwas reliable for all image types. Thismakes the selection of onemethod to apply in a given application very difficult. Thus, performance eval-uation of binarization algorithms becomes therefore vital. In this paper, we are interested in the evaluation of bina-rization techniques for the purpose of retrieving words from the images of degraded Arabic documents. A new evalu-ation methodology is proposed. The proposed evaluation methodology is based on the comparison of the visual fea-tures extracted from the binarized document images with ground truth features instead of comparing images between themselves. The most appropriate thresholding method for each image is the one for which the visual features of the identified words in the image are “closer” to the features of the reference words. The proposed technique was used here to assess the performances of eleven algorithms based on different approaches on a collection of real and synthetic images.

2014

(2014), Text Extraction from Historical Document Images by the Combination of Several Thresholding Technique. Advances in Multimedia Volume 2014 (2014), Article ID 934656 : Hindawi publishing corporation, http://www.hindawi.com/journals/am/2014/934656/

Résumé: This paper presents a new technique for the binarization of historical document images characterized by deteriorations and damages making their automatic processing difficult at several levels. The proposed method is based on hybrid thresholding combining the advantages of global and local methods and on the mixture of several binarization techniques. Two stages have been included. In the first stage, global thresholding is applied on the entire image and two different thresholds are determined from which the most of image pixels are classified into foreground or background. In the second stage, the remaining pixels are assigned to foreground or background classes based on local analysis. In this stage, several local thresholding methods are combined and the final binary value of each remaining pixel is chosen as the most probable one. The proposed technique has been tested on a large collection of standard and synthetic documents and compared with well-known methods using standard measures and was shown to be more powerful.

(2014), Foreground-Background Separation by Feed-forward Neural Networks in Old Manuscripts. Informatica, vol.38, n°4 : The Slovenian Society Informatika, http://www.informatica.si/PDF/38-4/13_Kefali%20-%20Foreground-Background%20Separation%20by%20Feed-forward%20Neural%20Networks%20in%20Old%20Manuscripts.pdf

Résumé: Artificial Neural Networks (ANNs) are widely used techniques in image processing and pattern recognition. Despite of their power in classification tasks, for pattern recognition, they show limited applicability in the earlier stages such as the foreground-background separation (FBS). In this paper a novel FBS technique based on ANN is applied on old documents with a variety of degradations. The idea is to train the ANN on a set of pairs of original images and their respective ideal black and white ones relying on global and local information. We ran several experiments on benchmark and synthetic data and we obtained better results than state-of-the art methods.

2011

(2011), Recognition-free Retrieval of Old Arabic Document Images. Computación y Sistemas 15(2) : SciELO, Mexico, http://www.scielo.org.mx/pdf/cys/v15n2/v15n2a6.pdf

Résumé: Searching of old document images is a relevant issue today. In this paper, we tackle the problem of old Arabic document images retrieval which form a good part of our heritage and possess an inestimable scientific and cultural richness. We propose an approach for indexing and searching degraded document images without recognizing the textual patterns in order to avoid the high cost and the difficult effort of the optical character recognition (OCR). Our basic idea consists in casting the problem of document images retrieval from the field of document analysis to the field of information retrieval. Thus, we can combine symbolic notation and semic representation and exploit techniques from the two fields, in particular, the techniques of suffix trees and approximate string matching. Each document of the collection is assigned an ASCII file of word codes. Words are represented by their topological features, namely, ascenders, descenders, etc. So, instead of searching in the image, we look for word codes in the corresponding file code. The tests performed on two types of documents, Arabic historical documents and Algerian postal envelopes, have showed good performance of the proposed approach.

2007

(2007), State-of-the-art of Off-line Arabic Handwriting Segmentation. International Journal of Computer Processing of Languages : IJCPOL @ World Scientific, http://www.worldscientific.com/action/doSearch?searchText=toufik+sari&publicationFilterSearch=ijcpol

Résumé: Computer processing of off-line Arabic handwriting is very difficult. The cursiveness of the script and the almost variability of the handwritten symbols make the automatic recognition of Arabic a very challenging task. Character segmentation is an important pre-processing stage in any intelligent character recognition system. This paper surveys the field of off-line Arabic character segmentation. Four classes of segmentation approaches are identified based on the used features and on where segmentation points are located. Text curves analysis, outer contour detection and following, stroke scrutiny and singularities vs regularities are the most used techniques. Instructive examples of each category are described and some comments are given. Keywords: Handwriting Arabic segmentation; Contour Following; Topological rules

(2007), Overview of Some Algorithms of Off-Line Arabic Handwriting Segmentation. International Arab Journal of Information Technology (IAJIT). vol.4, n°4, pp.289-300 : Colleges of Computing and Information Society (CCIS), Association of Arab Universities, Zarqa University, Jordan, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.167.2004&rep=rep1&type=pdf

Résumé: Abstract: We present in this paper an overview of realized works in the field of automatic segmentation of off-line Arabic handwriting. The Arabic writing is cursive in nature even printed or handwritten. The shapes of characters vary considerably according to their positions within the word. The word shapes change depending on whether letters are horizontally or vertically ligatured, i.e. superposed letters. This variability makes word decomposition in letters very delicate and not always assured, what explains the lack of robust commercialized systems. The objective of this paper is to realize a state of the art of the different techniques for off-line Arabic handwriting segmentation proposed in the literature. Keywords: Handwriting Arabic segmentation, contour following, topological rules, ligatures.

2006

(2006), Adaptive Instructional Planning Using Neural Networks in Intelligent Learning Systems. Int. Arab J. Inf. Technol : Zarqa University, Jordan, http://ccis2k.org/iajit/?option=com_content&task=view&id=188&Itemid=268

Résumé: This paper investigates the use of computational intelligence for adaptive lesson sequencing in a distance-learning environment. A connectionist method for adaptive pedagogical hypermedia document generation is proposed and implemented in a prototype called AppSys. The proposed methodology based on the use of ontologies and learning object metadata. The generated didactic plan is adapted to the learner’s goals, abilities and preferences. Several experiments have shown the effectiveness of the proposed method. Keywords: Intelligent learning environment, web-based course, adaptive and automatic course sequencing, learner model, domain ontology, neural networks.

2005

(2005), Cursive Arabic Script Segmentation and Recognition System. International Journal of Computers and Applications : Actapress, http://www.actapress.com/Abstract.aspx?paperId=20444

Livres

2016

(2016), Reconnaissance et correction des erreurs dans les textes arabes . ISBN-13:978-3-8416-7466-1 ISBN-10:3841674666 EAN:9783841674661 : Les Éditions universitaires européennes, https://www.morebooks.de/store/fr/book/reconnaissance-et-correction-des-erreurs-dans-les-textes-arabes/isbn/978-3-8416-7466-1

Résumé: Le langage est un moyen utilisé pour communiquer des idées par la parole, ou tout autres signes expressifs, tandis que l’écriture constitue l’acte de prescrire ces idées dans le but de les enregistrer. Vu que l’écriture est une transcription de la langue, les systèmes de reconnaissance de l’écriture doivent impérativement intégrer les traitements liés à la langue. Reconnaissance et correction doivent coopérer afin de prendre les bonnes décisions exploitant les connaissances linguistiques et contextuelles. La correction des erreurs de reconnaissance fait partie du processus de prise de décision vu le caractère pluridisciplinaire de ses techniques. La reconnaissance de l’écriture Arabe n’emploie pas encore des dictionnaires électroniques, statistiques d’utilisation des éléments de la langue et de vérificateurs orthographiques. Ainsi, nous visons le développement d’un système de reconnaissance automatique des textes arabes non contraint. Nous combinons les connaissances de la langue arabe avec les connaissances contextuelles des erreurs de reconnaissance. Les expérimentations effectuées démontrent les performances de notre démarche et ouvrent une voie de recherche très intéressante.

Communications internationales

2018

(2018), Trends in linked data-based educational studies: a review of contributions in SSCI journals. 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA) : IEEE/ACS, https://ieeexplore.ieee.org/abstract/document/8612842/

Résumé: 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA)

2012

(2012), An MLP for binarizing images of old manuscripts. Proceedings of International Conference on Frontiers in Handwriting Recognition, Bari Italy : IEEE, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6424400

Résumé: Ancient Arabic manuscripts' processing and analysis are very difficult tasks and are likely to remain open problems for many years to come. In this paper we tackle the problem of foreground/background separation in old documents. Our approach uses a back-propagation neural network to directly classify image pixels according to their neighborhood. We tried several multilayer Perceptron topologies and found experimentally the optimal one. Experiments were run on synthetic data obtained by image fusion techniques. The results are very promising compared to state-of-the-art techniques.

2010

(2010), Evaluation of several binarization techniques for old Arabic documents images. The First International Symposium on Modeling and Implementing Complex Systems MISC : Constantine University, Algeria, http://elearn.umc.edu.dz/vf/images/misc/session2A/14-2A-paper3-Kefali_1.pdf

Résumé: Binarization is an important stage in all process of images processing and analysis. Currently, lot of techniques were proposed in the literature for binarizing colored and gray-level images, of which each one is appropriate to a particular type of images, but unfortunately no one among them surpassed in the case of the binarisation of old manuscript documents. These lasts are characterized by their bad quality due to the various deteriorations suffered in the life cycle of documents and to materials used for their storage. This paper presents an evaluation of binarization techniques frequently cited in the literature. In the first stage we studied, implemented and tested twelve binarisation algorithms over old Arabic manuscripts document. We however tried to compare them by using a new method based on features extracted from each image. The goal of our work is in a middle term to do a comparative study and can choose thereafter the best binarisation algorithm to use it in our Arabic documents retrieval system

2008

(2008), A search engine for Arabic documents. Dixième Colloque International Francophone sur l'Ecrit et le Document, Rouen France : HAL archives ouvertes, http://hal.archives-ouvertes.fr/hal-00334402/

Résumé: This paper is an attempt for indexing and searching degraded document images without recognizing the textual patterns and so to circumvent the cost and the laborious effort of OCR technology. The proposed approach deal with textual-dominant documents either handwritten or printed. From preprocessing and segmentation stages, all the connected components (CC) of the text are extracted applying a bottom-up approach. Each CC is then represented with global indices such as loops, ascenders, etc. Each document will be associated an ASCII file of the codes from the extracted features. Since there is no feature extraction technique reliable enough to locate all the discriminant global indices modelling handwriting or degraded prints, we apply an approximate string matching technique based on Levenshtein distance. As a result, the search module can efficiently cope with imprecise and incomplete pattern descriptions. The test was performed on some Arabic historical documents and shown good performances.

2002

(2002), MOrpho-LEXical Analysis for Correcting OCR-Generated Arabic Words (MOLEX). IWFHR Canada : IEEE Computer Society , http://www.computer.org/portal/web/csdl/abs/proceedings/iwfhr/2002/1692/00/16920461abs.htm?CFID=1371

(2002), Off-Line Handwritten Arabic Character Segmentation Algorithm: ACSA. IWFHR Canada : IEEE Computer Society, http://www.computer.org/portal/web/csdl/doi/10.1109/IWFHR.2002.1030952

صاري توفيق

Publications internationales

Livres

Communications internationales