Speech Disfluency Enhancement and Augmentation for Improved Voice Technology Interaction
Speech disfluencies, common in disorders like stuttering, significantly impede interaction with voice technologies, as current Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems often fail to process atypical speech effectively. This project introduces a novel end-to-end pipeline designed to transform disfluent speech into fluent, natural-sounding audio while preserving the original speaker’s identity and affective qualities. Key contributions include: (1) a disfluency-aware ASR system utilizing Whisper[1] fine-tuned on disfluent corpora, reducing Word Error Rate (WER) by 33% (from 18.5% to 12.3%); (2) a hybrid disfluency detection framework combining signal processing and Random Forest[2] classifiers, achieving 89.5% accuracy; (3) a T5[3]-based text reconstruction module with high semantic fidelity (BLEU: 0.78, ROUGE-L: 0.82); and (4) a zero-shot TTS synthesis stage (Bark[4]/YourTTS[5]) yielding fluent speech with strong speaker similarity (cosine similarity: 0.86) and naturalness (Mean Opinion Score: 4.1). The system is implemented for real-time performance (latency < 500 ms) and evaluated on public datasets (SEP-28k[6], UCLASS[7]) using objective metrics and subjective listening tests. Driven by personal experience with severe stuttering, this research bridges the accessibility gap in voice technology, offering a deployable solution for inclusive human computer interaction. Potential applications span voice assistants, communication aids, and therapeutic tools.
Framework basé sur des graphes avec GNN pour la détection des troubles neurodégénératifs
Parkinson’s disease (PD), the second most common neurodegenerative disorder, affects nearly 10 million people worldwide. Early diagnosis remains challenging due to subtle initial symptoms and a misdiagnosis rate that can reach up to 50% among general practitioners. This study leverages artificial intelligence, particularly Deep Learning architectures (CNN, LSTM, Vanilla GNN, GCN), to enable early detection of PD based on keystroke dynamics collected from the Tappy dataset. By modeling the temporal and structural relationships between keystrokes, the Vanilla GNN and GCN models have demonstrated a notable ability to detect subtle motor signatures, achieving respective accuracies of 91.81% and 93%. To focus on early stages, only patients with mild forms of the disease were included, while moderate or advanced cases were excluded. The proposed pipeline, combining preprocessing, graph construction, and binary classification, offers a non-invasive, automated solution adaptable to clinical or mobile environments. Ethical concerns such as data privacy, as well as technical challenges like interindividual variability, are also addressed to ensure model robustness. Finally, the future integration of multimodal data and attention mechanisms paves the way for predictive, personalized, and accessible medicine capable of transforming the management of neurodegenerative diseases.
Graph-Based GNN Approach for Parkinson’s Disease Detection
This thesis explores the early detection of Parkinson’s Disease (PD) through a novel graph-based approach using keystroke dynamics a behavioral biometric that captures typing patterns. Parkinson’s, being a progressive neurological disorder, often eludes early diagnosis due to the subtle nature of its symptoms. Leveraging digital biomarkers such as hold time, flight time, and latency extracted from typing data, this work proposes a data-driven framework based on Graph Neural Networks (GNNs) to capture relational motor behavior across users. A graph is constructed from user feature similarities, allow- ing the GNN to model complex interactions and detect motor anomalies more effectively than traditional methods. Experimental results demonstrate that the GCN model out- performs conventional deep learning architectures like CNNs and LSTMs in accuracy and robustness, especially for early-stage cases. This research emphasizes the potential of combining behavioral biometrics and GNNs for non-invasive, accessible, and scalable PD diagnosis.