eGitty

Discover The Most Popular Algorithms

Understand Fragmentation Problem in Voice Activity Detection

Voice Activity Detection is also called VAD, which aims to detect which voice part is spoken by persons. It need some silence and noise.

VAD is a classification problem. In order to implement VAD, we need split a voice to some fragments, then judge each fragment is silence, noise or spoken by people.

However, VAD is different from common binary classification problems since the audio signal is characterized by continuity which means adjacent frames are highly correlated. This feature will cause fragmentation problem. A part of continue fragments may be split to some different classes.

How to solve this fragmentation problem?

Paper: End-to-End Speaker-Dependent Voice Activity Detection proposed a method to solve this problem.

Understand Fragmentation Problem in Voice Activity Detection

As to input, average some neighbor fragments

As to output, repeat the same output.