Audio Deep Fake Detection Using an Ensemble Learning Approach
الكلمات المفتاحية:
Audio Deepfake detection، Libyan dialect، Deep Learning، MFCC، Stacking Ensembleالملخص
The rapidly and continuously evolving tools and techniques based on artificial intelligence pose a serious and growing concern, especially regarding the spread of deepfake technologies and their impact on the credibility and authenticity of various audio files. To address this growing challenge and effectively counter these concerns, A hybrid model was specifically designed and developed to detect deepfake voices. This proposed model relies on the latest deep learning techniques, particularly leveraging the power of one-dimensional convolutional neural networks (CNN), in addition to integrating them with a multilayer perceptron (MLP) network and an optimized XGBoost classifier operating within an integrated stacking ensemble. The approach proposed in this model is based on extracting Mel-frequency cepstral coefficients (MFCC) from audio files in the dataset used for training and testing. Due to the lack of a suitable and specialized dataset of Arabic audio from the Libyan dialect publicly available for this specific purpose, an integrated dataset was created and compiled, containing a variety of audio clips, including genuine audio clips, as well as fake audio clips generated using various artificial intelligence techniques. When testing the model and evaluating its performance, the model demonstrated high accuracy across datasets, achieving 96.67% (three seconds), 97.50% (five seconds), and 97.00% (seven seconds), confirming its exceptional ability to detect subtle anomalies and hidden.
المراجع
Z. Khanjani, G. Watson, and V. P. Janeja, "Audio deepfakes: A survey," Frontiers in Big Data, vol. 5, p. 1001063, 2023.
D. R. Chandran, "Use of AI voice authentication technology instead of traditional keypads in security devices," Journal of Computer and Communications, vol. 10, pp. 11-21, 2022.
A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, et al., "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, vol. 12, p. 1, 2016.
Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, et al., "Tacotron: Towards end-to-end speech synthesis," arXiv preprint arXiv:1703.10135, 2017.
R. Ranjan, K. Pipariya, M. Vatsa, and R. Singh, "SynHate: Detecting Hate Speech in Synthetic Deepfake Audio," arXiv preprint arXiv:2506.06772, 2025.
J. Khochare, C. Joshi, B. Yenarkar, S. Suratkar, and F. Kazi, "A deep learning framework for audio deepfake detection," Arabian Journal for Science and Engineering, vol. 47, pp. 3447-3458, 2022.
O. A. Shaaban and R. Yildirim, "Audio Deepfake Detection Using Deep Learning," Engineering Reports, vol. 7, p. e70087, 2025.
S. A. Al Ajmi, K. Hayat, A. M. Al Obaidi, N. Kumar, M. S. Najim AL-Din, and B. Magnier, "Faked speech detection with zero prior knowledge," Discover Applied Sciences, vol. 6, p. 288, 2024.
Z. M. Almutairi and H. Elgibreen, "Detecting fake audio of arabic speakers using self-supervised deep learning," IEEE Access, vol. 11, pp. 72134-72147, 2023.
A. Hamza, A. R. R. Javed, F. Iqbal, N. Kryvinska, A. S. Almadhor, Z. Jalil, et al., "Deepfake audio detection via MFCC features using machine learning," IEEE Access, vol. 10, pp. 134018-134028, 2022.
D. M. Ballesteros, Y. Rodriguez-Ortega, D. Renza, and G. Arce, "Deep4SNet: deep learning for fake speech classification," Expert Systems with Applications, vol. 184, p. 115465, 2021.
R. Reimao and V. Tzerpos, "Synthetic speech detection using neural networks," in 2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2021, pp. 97-102.
التنزيلات
منشور
كيفية الاقتباس
إصدار
القسم
الرخصة
الحقوق الفكرية (c) 2025 Mona M Bouaisha، Mohammed Elsheh

هذا العمل مرخص بموجب Creative Commons Attribution 4.0 International License.
