Last updated
How Text-to-Speech Works
Text-to-speech (TTS) converts written text into spoken audio. Modern TTS systems use neural networks trained on hours of human speech to produce natural-sounding voices. The Web Speech API provides browser-native TTS without any external dependencies, using the operating system's built-in voices. For higher quality, cloud services like Google Cloud TTS, Amazon Polly, and Azure Cognitive Services offer neural voices.
Web Speech API
// Basic TTS with Web Speech API
function speak(text, options = {}) {
if (!window.speechSynthesis) {
console.error('Web Speech API not supported');
return;
}
// Cancel any ongoing speech
speechSynthesis.cancel();
const utterance = new SpeechSynthesisUtterance(text);
utterance.lang = options.lang || 'en-US';
utterance.rate = options.rate || 1.0; // 0.1 to 10
utterance.pitch = options.pitch || 1.0; // 0 to 2
utterance.volume = options.volume || 1.0; // 0 to 1
// Select a specific voice
if (options.voiceName) {
const voices = speechSynthesis.getVoices();
utterance.voice = voices.find(v => v.name === options.voiceName);
}
utterance.onend = () => console.log('Speech finished');
utterance.onerror = (e) => console.error('Speech error:', e.error);
speechSynthesis.speak(utterance);
}
// List available voices
speechSynthesis.onvoiceschanged = () => {
const voices = speechSynthesis.getVoices();
voices.forEach(v => console.log(v.name, v.lang));
};
Browser Support and Limitations
- Chrome and Edge have the most voices (including neural voices on Windows).
- Firefox has limited voice selection on some platforms.
- Mobile browsers may require user interaction before speech can start.
- Long texts may be cut off — split into sentences and queue them.
- Voice availability varies by OS and installed language packs.