Skip to content

Text to Speech

Last modified: 13/02/2026

Overview

Text to Speech (TTS) can read card content aloud.

This helps with language study, listening practice, and accessibility.

Sprout also supports flag-aware voice routing with inline flag tokens. Flag assets are provided by HatScripts/circle-flags (MIT).

Supported card types

Card typeTTS supportWhat's read
BasicQuestion and/or answer fields
ClozeFull sentence or just the cloze deletion (configurable)
MCQQuestion stem and options
OrderedQuestion stem and items
Image OcclusionNot supported (visual cards)

If a card type is unsupported, no TTS audio is generated for that card.

Cloze TTS options

For cloze cards, choose what is read on the answer side:

OptionBehaviour
Deletion onlyReads only the hidden/deleted text (e.g. "Paris")
Full sentenceReads the complete sentence with the deletion filled in (e.g. "The capital of France is Paris")

Configure this in Settings → Audio → Cloze TTS mode.

Enabling TTS

  1. Go to Settings → Audio.
  2. Toggle Enable text-to-speech on.
  3. Choose voice and language (see Language Settings).
  4. TTS controls will appear in the study session interface.

Flag-aware voice routing

In Settings → Audio → Flag-aware routing:

  • Use flags for language and accent (default: on)
    • {{es}} and {{es-mx}} can switch spoken language/accent.
    • A single flag anywhere in the text can apply to the full spoken content.
    • Multiple flags in one field use segmented speaking (voice switches inline).
  • Speak language name before flag segments (default: off)
    • Adds a spoken language prefix before each flag-switched segment.
    • English variants use the label English while keeping regional accent (for example UK/US voice choice).

See Flags for token syntax and supported formats, and Flag Codes for available language/region codes.

During a session

When TTS is enabled:

  • A speaker icon appears on cards during review.
  • Click it to hear the current side read aloud.
  • Auto-play can be configured to read automatically when a card is shown or when the answer is revealed.

Audio quality

TTS uses your device's built-in speech engine. Voice quality and available voices depend on your OS:

  • macOS — High-quality voices available via System Preferences → Accessibility → Spoken Content.
  • Windows — Voices available via Settings → Time & Language → Speech.
  • Linux — Depends on the installed speech synthesis packages.
  • Mobile — Uses the device's built-in TTS engine.

If no suitable voice is installed for your chosen language, pronunciation quality may be poor.

Tips

  • For language learning, set the voice language to match your target language.
  • Use deletion only mode for cloze cards when you want to practise pronunciation of specific terms.
  • Use full sentence mode when context and sentence flow matter.
  • Use inline flags when a single card mixes languages or accents.

Released under the MIT License.