Résumé: Pronunciation teaching is an important stage in language learning activities. This article tackles the pronunciation scoring problem where research has demonstrated relatively low human-human and low human-machine agreement rates, which makes teachers skeptical about their relevance. To overcome these limitations, a fuzzy combination of two machines scores is suggested. The experiments were carried in the context of Algerian pupils learning to read Arabic. Although the native language of Algerian pupils is a dialect of Arabic, Modern Standard Arabic remains difficult for them with difficult sounds to master and letters close in their pronunciation. The article presents a fuzzy evaluation system including both oral reading fluency, and intelligibility. The fuzzy system has shown that despite the disparities between human ratings, its scores correspond at least to one of their ratings and most of the time its ratings are in favor of learners. Therefore, fuzzy logic, more favorable than thresholding systems, encourages learners to pursue their training.
Résumé: Songs retrieval process is complex and implies multiple facets. Nowadays, the most common way of searching songs is through a combination of factual and cultural metadata or by lyrics. Owing to the poor indexing techniques, these systems cannot completely satisfy the users' requests. In this paper, we suggest the use of domain ontology and semantic links to index a collection of songs and perform a conceptual search that offers a benefit way to complement metadata-based methods. A set of experiments were carried over a dedicated dataset and show the superiority of our approach when compared with classical one.
Résumé: Building a large vocabulary continuous speech recognition (LVCSR) system requires a lot of hours of segmented and labelled speech data. Arabic language, as many other low-resourced languages, lacks such data, but the use of automatic segmentation proved to be a good alternative to make these resources available. In this paper, we suggest the combination of hidden Markov models (HMMs) and support vector machines (SVMs) to segment and to label the speech waveform into phoneme units. HMMs generate the sequence of phonemes and their frontiers; the SVM refines the frontiers and corrects the labels. The obtained segmented and labelled units may serve as a training set for speech recognition applications. The HMM/SVM segmentation algorithm is assessed using both the hit rate and the word error rate (WER); the resulting scores were compared to those provided by the manual segmentation and to those provided by the well-known embedded learning algorithm. The results show that the speech recognizer built upon the HMM/SVM segmentation outperforms in terms of WER the one built upon the embedded learning segmentation of about 0.05%, even in noisy background.
Résumé: The aim of a computer assisted language learning (CALL) system is to improve the language skills of learners. Such systems often include, grammar and vocabulary components, while the pronunciation learning seems to be the hardest step in language learning process. Little attention has been paid to this aspect among the required ones in CALL systems. In pronunciation learning context, the learnerwould like to know if its pronunciation is good or bad. In the case where the pronunciation is bad, it will be suitable if some advices are given to him. The goal of this work is an early detection of pupils with reading difficulties and in the issue of decision whether their pronunciation is good or not is our particular interest. For this purpose, we consider the answer to this question as a classification problem and we use a statistical approach to make a decision; this approach allows us to pursue the investigation concerning the pronunciation of every phoneme in the word or in the sentence.
Résumé: One of the most important and necessary steps in the process of document analysis and recognition is the binarization, which allows extracting the foreground from the background. Several binarization techniques have been proposed in the literature, but none of themwas reliable for all image types. This makes the selection of one method to apply in a given application very difficult. Thus, performance evaluation of binarization algorithms becomes therefore vital. In this paper, we are interested in the evaluation of binarization techniques for the purpose of retrieving words from the images of degraded Arabic documents. A new evaluation methodology is proposed. The proposed evaluation methodology is based on the comparison of the visual features extracted from the binarized document images with ground truth features instead of comparing images between themselves. The most appropriate thresholding method for each image is the one for which the visual features of the identified words in the image are “closer” to the features of the reference words. The proposed technique was used here to assess the performances of eleven algorithms based on different approaches on a collection of real and synthetic images.
Résumé: Artificial Neural Networks (ANNs) are widely used techniques in image processing and pattern recognition. Despite of their power in classification tasks, for pattern recognition, they show limited applicability in the earlier stages such as the foreground-background separation (FBS). In this paper a novel FBS technique based on ANN is applied on old documents with a variety of degradations. The idea is to train the ANN on a set of pairs of original images and their respective ideal black and white ones relying on global and local information. We ran several experiments on benchmark and synthetic data and we obtained better results than state-of-the art methods.
Résumé: The paper describes a system of singing voice classification in the commercial music productions. A first step in our system is to separate the singer’s voice from the music. Based on the vocal part, two sets of parameters are formed, one for singing voice type and the other for the singing voice quality. Each set of parameters contains a number of MPEG-7 low-level descriptors and other descriptors; at the classification stage the paper suggests an extension of Gaussian Mixture Models (GMMs), by using the Type-2 FGMMs (Type-2 Fuzzy Gaussian Mixture Models). Results show substantial improvements when compared to similar works.
Chapitres de livres
Résumé: Speech disorders are human disabilities widely present in young population but also adults may suffer from such disorders after some physical problems. In this context, the detection and further the correction of such disabilities may be handled by Automatic Speech Recognition (ASR) technology. The first works on the speech disorders detection began early in the 70s and seem to follow the same evolution as those on the ASR. Indeed, these early works were more based on the signal processing techniques. Progressively, systems dealing with speech disorders incorporate more ideas from ASR technology. Particularly, Hidden Markov Models, the state-of-the-art approaches in ASR systems, are used. This chapter reviews systems that use ASR techniques to evaluate pronunciation of people who suffer from speech or voice impairments. The authors investigate the existing systems and present the main innovation and some of the available resources.