![]()
Forensic Analysis of Deepfake Audio Detection
Girija Chiddarwar1, Nayan Bansal2, Sushanth Bangera3, Nikhilesh Sakhare4, Sakshi Pawar5
1Dr. Girija Chiddarwar, Department of Computer Engineering, MMCOE, Pune (Maharashtra), India.
2Nayan Bansal, Department of Computer Engineering, MMCOE, Pune (Maharashtra), India.
3Nikhilesh Sakhare, Department of Computer Engineering, MMCOE, Pune (Maharashtra), India.
4Sushanth Bangera, Department of Computer Engineering, MMCOE, Pune (Maharashtra), India.
5Sakshi Pawar, Department of Computer Engineering, MMCOE, Pune (Maharashtra), India.
Manuscript received on 15 May 2025 | First Revised Manuscript received on 02 June 2025 | Second Revised Manuscript received on 09 July 2025 | Manuscript Accepted on 15 July 2025 | Manuscript published on 30 July 2025 | PP: 32-37 | Volume-14 Issue-2, July 2025 | Retrieval Number: 100.1/ijrte.B826214020725 | DOI: 10.35940/ijrte.B8262.14020725
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The rise of deepfake audio technologies poses significant challenges to authenticity verification, necessitating effective detection methods. Traditional techniques, such as manual forensic analysis, basic machine learning approaches, speech-to-text conversion, and Short-Time Fourier Transform (STFT) analysis, have been employed to identify manipulated audio. However, these methods often fall short due to their timeconsuming nature, inability to handle complex sequential data, and susceptibility to high-quality synthetic audio. This paper presents an innovative approach that leverages Long Short-Term Memory (LSTM) networks and Mel-Frequency Cepstral Coefficients (MFCC) for deepfake audio detection. By harnessing the power of deep learning, LSTMs can effectively capture temporal dependencies in audio data, allowing for the identification of subtle anomalies that indicate manipulation. The use of MFCC enables the extraction of robust audio features that align closely with human auditory perception, thereby enhancing the model’s sensitivity to synthetic alterations. Additionally, our methodology incorporates enhanced preprocessing techniques to ensure high-quality input data, thereby further improving detection accuracy. The proposed system demonstrates a significant advancement in deepfake audio detection, providing a more reliable solution against increasingly sophisticated audio manipulations.
Keywords: Deepfake Audio Detection, Long Short-Term Memory (LSTM), Mel-Frequency Cepstral Coefficients (MFCC), Audio Pre-Processing, Authentic vs Fake Audio.
Scope of the Article: Artificial Intelligence and Methods
