> Mixing in the enhanced track as needed works well. Using only the enhanced track may sound artificial at times.
Funnily, in order to use this sort of model for tasks involving speech recognition it's often recommended in the literature to mix back in some of the original noisy audio. This reduces the impact of artifacts introduced by the enhancement which would otherwise reduce ASR quality due to domain shift in the data.
Guess humans and computers have similar needs in this case. :)
These are impressive results, the audio mostly sounds like you gave the guy a lapel mic. :P
Funnily, in order to use this sort of model for tasks involving speech recognition it's often recommended in the literature to mix back in some of the original noisy audio. This reduces the impact of artifacts introduced by the enhancement which would otherwise reduce ASR quality due to domain shift in the data.
Guess humans and computers have similar needs in this case. :)
These are impressive results, the audio mostly sounds like you gave the guy a lapel mic. :P