Enhancing Audio Steganography Through CGAN-Generated Cover Audio and Adaptive LSB Embedding: A Hybrid Approach

Usman Ibrahim Musa; Farida Ridzuan; A H Azni; Nur Hafiza Zakaria; Ahmed A. AlSabhany

doi:10.33102/5n8ve769

Authors

Usman Ibrahim Musa Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Farida Ridzuan CyberSecurity and Systems Research Unit, Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
A H Azni Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Nur Hafiza Zakaria CyberSecurity and Systems Research Unit, Faculty of Science and Technology, Universiti Sains Islam Malaysia, Nilai 71800, Negeri Sembilan, Malaysia.
Ahmed A. AlSabhany Computer Center, University of Fallujah, Fallujah, Anbar, Iraq.

DOI:

https://doi.org/10.33102/5n8ve769

Keywords:

CGAN, LSB, Generative Adversarial Networks, Hybrid, audio steganography

Abstract

Audio steganography hides secret messages inside audio files, enabling covert communication without drawing attention. Audio steganography methods aim to achieve high imperceptibility, robust performance, and high payload capacity. While traditional techniques like Least Significant Bit (LSB) coding offer good imperceptibility, they are highly vulnerable to statistical steganalysis and signal manipulation. Existing hybrid methods suffer from maintaining quality across diverse audio and inadequate robustness mechanisms, struggling to balance imperceptibility, payload capacity, and robustness. This paper proposes a novel hybrid approach that combines Conditional Generative Adversarial Networks (CGANs) with LSB coding to address these limitations. The CGAN is trained with the LibriSpeech dataset to generate audio patterns that simulate spontaneous speech for use as adaptive covers. The model was implemented using PyTorch, with performance evaluated based on Signal-to-Noise Ratio (SNR), Perceptual Evaluation of Speech Quality (PESQ), Bit Error Rate (BER), and robustness to audio transformations. Experimental results showed a PESQ value of 4.05 and a mean SNR of 33.1 dB, representing excellent audio quality given the substantial payload capacity of 1.27 kbps. The method achieved a BER value of 2.23% and 87% robustness to compression, filtering, and resampling operations. The effectiveness of the CGAN-LSB hybrid method for enhancing capacity, imperceptibility, and robustness is achieved through the CGAN's ability to generate statistically natural audio covers, while adaptive LSB integration preserves data integrity during signal processing operations, making it highly suitable for secure audio communication and audio watermarking applications. While the generated covers exhibit distributional properties similar to genuine audio, direct validation against specific steganalysis detectors remains an important direction for future empirical evaluation.

Downloads

Download data is not yet available.

References

[1] W. Rehman and A. Waheed, “A Novel Approach to Image Steganography Using Generative Adversarial Networks,” arXiv, 2024, doi: 10.48550/arXiv.2412.00094.

[2] L. Chen et al., “Learning to Generate Steganographic Cover for Audio Steganography Using GAN,” IEEE Access, vol. 9, pp. 88098–88107, 2021, doi: 10.1109/access.2021.3090445.

[3] W. Zhou, J. Zhou, and S. Yang, “An Imperceptible and Robust Audio Watermarking Algorithm Based on SNGAN,” 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, pp. 1–8, 2024, doi: 10.1109/ijcnn60899.2024.

[4] A. Martín et al., “Evolving Generative Adversarial Networks to Improve Image Steganography,” Expert Systems with Applications, vol. 222, p. 119841, Jul. 2023, doi: 10.1016/j.eswa.2023.119841.

[5] V. Moorthy and R. Venkataraman, “Generative Adversarial Analysis Using U-LSB Based Audio Steganography,” 2021 IEEE 18th India Council International Conference (INDICON), pp. 1–6, Dec. 2021, doi: 10.1109/indicon52576.2021.9691515.

[6] H. M. El-Hoseny, M. A. Farahat, and N. A. El-Hag, “An Efficient Stego-OptDehaz Algorithm for Image Dehazing and Metadata Concealment,” Journal of Optics, vol. 53, pp. 2441–2451, 2024, doi: 10.1007/s12596-023-01364-x.

[7] Z. Chen et al., “A Generic Blockchain-Based Steganography Framework with High Capacity via Reversible GAN,” IEEE INFOCOM 2024 - IEEE Conference on Computer Communications, pp. 241–250, May 2024, doi: 10.1109/infocom52122.2024.10621377.

[8] Y. An et al., “ACGAN Based Coverless Image Steganography Method,” 7th International Symposium on Advances in Electrical, Electronics, and Computer Engineering, p. 63, Oct. 2022, doi: 10.1117/12.2639718.

[9] S. Liu et al., “CGAN BeiDou Satellite Short-Message-Encryption Scheme Using Ship PVT,” Remote Sensing, vol. 15, no. 1, p. 171, Dec. 2022, doi: 10.3390/rs15010171.

[10] D. Zhang, M. Ma, and L. Xia, “A Comprehensive Review on GANs for Time-Series Signals,” Neural Computing & Applications, vol. 34, pp. 3551–3571, 2022, doi: 10.1007/s00521-022-06888-0.

[11] R. N. Abirami et al., “Deep CNN and Deep GAN in Computational Visual Perception-Driven Image Analysis,” Complexity, vol. 2021, no. 1, 2021, doi: 10.1155/2021/5541134.

[12] C. Niloor, R. Agarwal, and P. Mishra, “Using MNIST Dataset for De-Pois Attack and Defence,” in Recent Trends in Communication and Intelligent Systems (ICRTCIS 2023), Algorithms for Intelligent Systems, Springer, Singapore, 2023, doi: 10.1007/978-981-99-5792-7_17.

[13] M. Veksler and K. Akkaya, “Good or Evil: Generative Adversarial Networks in Digital Forensics,” in Adversarial Multimedia Forensics, Advances in Information Security, vol. 104, Springer, Cham, 2024, doi: 10.1007/978-3-031-49803-9_3

[14] Y. Heng et al., “HLSNC-GAN: Medical Image Synthesis Using Hinge Loss and Switchable Normalization in CycleGAN,” IEEE Access, vol. 12, pp. 55448–55464, 2024, doi: 10.1109/access.2024.3390245.

[15] H. H. Ramandi et al., “VidaGAN: Adaptive GAN for Image Steganography,” IET Image Processing, vol. 18, no. 11, pp. 3139–3152, 2024, doi: 10.1049/ipr2.13177.

[16] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “LibriSpeech: An ASR Corpus Based on Public Domain Audio Books,” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210, 2015, doi: 10.1109/ICASSP.2015.7178964.

[17] C. Veaux, J. Yamagishi, and K. MacDonald, “CSTR VCTK Corpus: English Multi-Speaker Corpus for CSTR Voice Cloning Toolkit,” The Centre for Speech Technology Research (CSTR), University of Edinburgh, 2017.

[18] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM,” NIST Speech Disc 1-1.1, NASA STI/Recon Technical Report N, 93-27403, 1993.

[19] International Telecommunication Union, “Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs,” ITU-T Recommendation P.862, Feb. 2001.