top of page

Secure Cloud-based Speech Noise Reduction

During the aquisition of speech by recording, noise might contaminate the signal which degrades it quality and makes it (i) unpleasant for human perception and (ii) causes inaccuracies in speech processing applications such as speech transcription, speech recogniton etc.

In this work we propose the secure denoising of three types of noise from quality degraded speech signals outsourced to CDC. The noise types are:  (i) White noise which is characterized by higher frequencies (ii) humming noise characterized by tonal frequencies of 60Hz harmonics and (iii) Wind noise characterized by lower frequencies below 500Hz. We choose these types of noise because they can be denoised with linear filters which are feasible with homomorphism in Encrypted domain. That is:

(1) Low pass filter to denoise white noise.

(2) Comb filter to attenuate hummin noise.

(3) High pass filter to denoise wind noise.

These filters are implemented in time domain with their difference equations and convolution with their impulse responses since they are linear time invariant (LTI) systems.

 

Below figure illustrates our proposed method for secure noise reduction over cloud. Shares of the quality-degraded speech signal (contaminated with noise) are created with the (K, N) SSS threshold scheme on the client system. The client then uploads each of N shares to N non-colluding CDCs. The CDC then performs the noise reduction operation on their hosted shares (that is, processing the encrypted speech signal without knowing the secret). The authorized user then reconstructs the enhanced (denoised) secret by putting at least K out of N processed shares together. This way the CDC stores and performs the quality enhancement operations on the encrypted signals without having access to the plaintext secret data.

 

We use Language Technologies Institute at CMU (Carnegie Mellon University) database to test our proposed method. We present in below table, for each noise type , speech files of the noisy speech secret, its 1st share, processed 1st share, denoised signal in ED and in PD. Listening to playback of the speech files reveal that: (1) the share and the processed share are noise and do not reveal information about the quality degraded speech secret, (2) the quality of the denoised signal in ED is much better than that of the quality degraded speech secret and (3) perceptually the denoised signals in ED and in PD sound identical. Security and performance analysis suggest that: (i) our scheme is information theoretically secure, (ii) efficient transmission of a share from client to CDC and (iii) Our proposed method in ED for noise reduction yield identical results to PD processing with negligible loss in accuracy while maintaining security and privacy.

 

bottom of page