Secure Cloud-based Audio Reverberation

What is Reverberation? Reverberation is a series of delayed and attenuated sound waves reflected within an acoustic environment which is perceived by the human ear in less than 0.1 seconds after the original sound wave. The human auditory system is unable to perceive the 0.1 second delay and interprets the original sound wave and delayed reflections as one prolonged sound. This effect is different from echo where delays are more than 0.1 seconds and the delayed sounds are perceived distinctly as decaying copies of the original sound. Reverberation effect is one of the most widely used delay effects, amongst others like flanging, phasing, chorus effects etc., for audio editing and reproduction. This effect adds an acoustic environment to an audio recording to make it sound realistic. The resulting reverb effected audio inherits characteristics from that acoustic environment and sounds as if the recording was created in that environment.

In this work, we securely add reverberation effects to an audio secret over cloud for the purposes of audio editing and reproduction. Our proposed scheme is based on (K,N) SSS and convolution reverb which is the digital convolution of the an audio recording and an impulse response of the target acoustic space. The audio secret to be reverb effected is encrypted by creating shares with the (K,N) SSS threshold scheme on the client's system. The client then uploads each of the N shares to N non-colluding CDCs. The CDC then performs convolution of the share and impulse response. Reverb impulse responses are public signals in plaintext. The client does not need to transmit or upload impulses response to the CDC since the CDC can obtain a library of all impulse responses. An authorized user then reconstructs the reverb effected secret by combining at least K out of N processed shares.

Modeled room impulse response

Audio secret

1st share

Processed 1st share

Reconstructed reverb effected audio (Processed in Encrypted Domain)

Reverb effected audio

(Processed in Plaintext Domain)

Below diagram represents our proposed scheme. We test our method with a modeled impulse response of a living room. After applying our proposed method to a sample audio file from our dataset, we also present in below diagram the audio secret, its shares, the impulse response, the processed shares after convolution and the reconstrusted reverd effected audio. After listening to the playback of the audio clips, it is evident that: (i) the shares and the processed shares are noise and do not reveal information about the audio secret over cloud and (ii) the reconstructed reverb effected audio sounds different than the original audio as a result of the addition of the impulse response.

Time domain plots of the sample audio file we processed in ED are presented below showing the audio secret, one of its shares, the reverb effected reconstructed secret in ED and the reverb effected signal in PD. The time series reveals that (1) the share is noise and likely to have equal power across all frequencies, (2) the amplitude series of the reverb effected reconstructed secret shows some low amplitude regions as compared to the audio secret. This results from the delay and decay effect of the impulse response which verifies that the audio secret has been reverb effected and (3) the reverb effected signal in ED is identical to the signal processed in PD.

Security and performance analysis suggest that: (i) our scheme is information theoretically secure meaning that an adversary with unlimited computing power cannot obtain any information about the secret, (ii) the transmission overhead of a share from client to CDC is not high and (iii) Our proposed method in ED for the addition of reverberation effect yield identical results to PD processing with minimal losses while maintaining security and privacy.

Secure Cloud-based Speech Noise Reduction >

< Secure Cloud-based Audio Storage