Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Amazon Polly icon

  • Amazon Polly is the opposite of Amazon Transcribe.
  • Definition:
    • This service allows you to turn text into lifelike speech using deep learning and enables you to create applications that will talk.
  • For example, if you write "Hi, my name is Stephane, and this is a demo of Amazon Polly," then the speech is going to be generated for you by Amazon Polly.

alt text

Advanced Features

Polly has several advanced features that may appear in the exam:

Lexicons

  • you Define how to read certain pieces of text
  • Example: you may Write "AWS" but want Polly to pronounce "Amazon Web Services"
  • Example: you may Write "W3C" but want Polly to say "World Wide Web Consortium"

SSML (Speech Synthesis Markup Language)

  • Markups that indicate how your text should be pronounced
  • Example: "Hello" + break + "how are you?" will say "Hello," then have a long break, then "how are you?"
  • It won't say "Hello, break, how are you?" – it understands the markup
  • Capabilities include:
    • Whispering
    • Pronunciation control
    • Abbreviation handling
    • Word emphasis
SSML Example

Voice Engines

Multiple voice engines available, from most historical to newest:

  1. Neural
  2. Standard
  3. Long-form
  4. Generative

The newest engines have very good human-like voices.

Speech Marks

  • Provides information about where audio elements occur
  • Shows where a word or sentence starts or ends in the audio
  • Polly gives you both the audio and the speech marks
  • Very helpful for:
    • Lip-syncing
    • Highlighting words as they are spoken