To ensure more natural dubbing results, it is important to meet the following conditions:
1️⃣ Optimal Speech Duration
Each speaker's voice should be present for at least 20 seconds.
If speech duration is too short, translation accuracy and voice generation quality may decrease.
2️⃣ Videos with Up to Two Speakers
The current dubbing feature offers the best results for videos with up to two speakers.
If a video has more than two speakers, the audio will still be translated, but voice cloning and speaker separation might be limited.
3️⃣ Try Videos Without Background Noise & Sound Effects
If background noise including non-verbal sounds (such as laughter) is present it is not currently filtered separately.
As a result, these sounds may be recognized as speech and translated.
4️⃣ Videos Without Noisy Environments & Fast-Paced Speech Work Best
Noisy environments (such as those with train sounds, cicadas, or background singing), may lower speech recognition and translation accuracy.
Sped up speech could lead to less accurate translations.
Meeting these conditions, will help you achieve a smoother, more natural result from the dubbing process. 😊