Audio loudness (Dialogflow)

LUFS (Loudness Units relative to Full Scale) is a standard that enables volume normalization across many genres and production styles. LUFS is a complicated algorithm based on perceived loudness of human hearing at a comfortable listening volume and lets audio producers avoid jumps in amplitude that would require users to constantly adjust volume. LUFS is also known as LKFS (Loudness, K-weighted, relative to Full Scale)

When playing back audio files using SSML, the average loudness should be -16 LUFS (Loudness Units Full Scale) for stereo audio content, which matches the average loudness of the Google Assistant TTS output. This level gives a good balance between overall volume control on the voice-activated speaker and ample headroom for material with variable dynamic range when compared to the Google Assistant.

For mono audio content, the average loudness should be -19 LUFS, rather than -16 LUFS. The loudness target for mono audio content is different than for stereo audio content because when mono audio content is converted to stereo, by duplicating the mono audio track on both channels of a stereo signal, this doubles the energy of the signal, which corresponds to an increase in the LUFS measurement of 3.01 Loudness Units (LU). Conversely, when a stereo signal is converted to mono for playing on a single speaker, the mono signal is typically constructed by averaging the signal from each channel, and that transformation decreases the LUFS measurement by exactly the same amount, 3.01 LU. So loudness measurements for mono and stereo content are not directly comparable, but need to be offset by 3.01 LUFS.

Some loudness meters have options to correct for this disparity; for example, if you're using the ffmpeg (see below), you can use the dual_mono (or dualmono) option, as recommended below. If you're using a loudness meter with such an option, and you have enabled that option, then the loudness target should be -16 LUFS regardless of whether the content is stereo or mono.

We recommend two options to measure and adjust audio loudness:

Using a DAW and LUFS meter

The following steps describe how to ensure your audio meets the -16 LUFS recommendation:

  1. Create all audio at consistently loud and balanced (equalized) levels for the entire duration of the audio, so that there are no spikes or dips in loudness.
  2. Setup a digital audio workstation (DAW) and LUFS meter to measure audio loudness compared to the Google TTS Loudness Reference.
  3. Measure and adjust the loudness of your audio so that it has an integrated average loudness of about -16 LUFS (or -19 LUFS if the content is mono).
  4. Ear check your audio by comparing its loudness to the Google TTS Loudness Reference.

Setup a DAW and LUFS meter

There are many DAWs and LUFS meters available as freeware and commercial products. If you already have a preferred DAW and LUFS meter, you can use that. Otherwise, we recommend Audacity for Windows and Linux or Reaper for Mac for DAWs and TBProAudio dpMeter II for an LUFS meter. The following sections assume you are using these tools.

Get the files

  1. Download and install a DAW:
  2. Download and install dpMeter II for your OS. This tool works with both Audacity and Reaper as a VST (Virtual Studio Technology) plugin.
  3. Download the Google TTS Loudness Reference audio file. The TTS audio reads: "The integrated loudness of this sentence is about -16 LUFS". This file serves as the test audio for the meter as well as an ear check reference.

Configure dpMeter II for Audacity (Windows/Linux)

  1. Open the Google TTS Loudness Reference audio file in Audacity.
  2. Open the dpMeter II plugin by clicking on the Effect tab and choosing Add/Remove Plug-ins.
  3. Find dpMeter2 in the list, click Enable, then OK. The dpMeter II plugin now appears in the Effect drop-down menu.
  4. Click dpMeter2 from the Effect drop-down menu to open the plugin. dpMeter II defaults to RMS mode (orange color scheme). Change the mode to EBU r128 (blue color scheme) to measure LUFS.

Configure dpMeter II for Reaper (Mac)

  1. Open the Google TTS Loudness Reference audio by clicking Insert > Media file.....
  2. Open the dpMeter II plugin by clicking on the green FX button (number 1 in the figure) on the left pane of the audio layer. An FX window appears.

  3. Click dpMeter2 in the list. dpMeter II defaults to RMS mode (orange color scheme). Change the mode to EBU r128 (blue color scheme) to measure LUFS.

Measuring and adjusting loudness

Different meters in different DAWs give slightly different readings. Audacity tends to measure the Google TTS Loudness Reference a little louder than other DAWs, at -15.1 LUFS, while Reaper gives a reading of -16.0 LUFS. As long as your DAW measures the loudness of the Google TTS Loudness Reference within +/-2 LUFS of -16, it should work fine for setting the loudness of your audio.

The basic steps for measuring and adjusting loudness are:

  1. Use dpMeter II to measure the loudness of the Google TTS Loudness Reference to establish a baseline LUFS reading. If your DAW is measuring higher or lower than -16 LUFS for the Google TTS Loudness Reference, match your audio to the baseline of your DAW. For example, in Audacity, dpMeter II measures an integrated loudness of -15.1 LUFS, so the new target loudness for your program should be -15.1 LUFS.
  2. After establishing a baseline, adjust your audio to match the baseline reading.

Measuring the Google TTS Loudness Reference

Click the green play button in dpMeter II or press play (spacebar) in your DAW (number 4 below) to measure the loudness of the file.

The following list describes major features you might use in dpMeter II:

  1. Mode: Set to EBU (instead of RMS) to measure loudness in LUFS
  2. Gain Control: Be sure this is set to 0.0 until you are ready to change the loudness of your program.
  3. Integrated Loudness: This is a measure of the average loudness of all of the audio that the plug-in has analyzed since the reset button (5) has been clicked. Click the reset button (5) before each loudness measurement to be sure you're measuring only the loudness of the current selection.
  4. Play: This begins the loudness analysis of the audio file. (This button does not appear in all DAWs. Clicking the main play button (space bar) in your DAW should have the same effect.)
  5. Reset: Click this button between each loudness measurement.
  6. Apply: When you are ready to change the loudness of your program material to match the Google TTS Loudness Reference, this button applies the loudness change set by the Gain Control (2).

Matching loudness to the Google TTS Loudness Reference

Now that you have measured the Google TTS Loudness Reference loudness, you can measure and adjust your audio's loudness:

  1. Open your audio file and click choose dpMeter2 from the Effect Menu.
  2. Click the Play button and let the integrated loudness value settle to an average value for your audio file.
  3. If the integrated loudness is different from the Google TTS Loudness Reference, adjust the gain of your audio to match the reference. For example, if your audio measures at an integrated loudness of -12, it's too loud, so decrease the gain by setting Gain Control to -4db and clicking Apply to bring it to the target range of the Google TTS Loudness Reference (-16 LUFS). You may need to measure and adjust the gain to get to the target loudness, because gain only approximates LUFS.

Using ffmpeg

FFmpeg is a media framework with a command line tool for media conversion. The tool includes a filter called loudnorm for loudness normalization. You can use loudnorm to output a version of your audio file at the appropriate -16 LUFS loudness using dual-pass mode.

  1. Download and install FFmpeg.
  2. Navigate to the installation directory and run FFmpeg with the loudnorm filter on your input file. Make sure to enable the dual_mono option.

    ./ffmpeg -i /path/to/input.wav \
        -af loudnorm=I=-16:dual_mono=true:TP=-1.5:LRA=11:print_format=summary \
        -f null -
    

    This instructs FFmpeg to measure the audio values of your media file without creating an output file. You will get a series of values presented as follows:

    Input Integrated:    -27.2 LUFS
    Input True Peak:     -14.4 dBTP
    Input LRA:             0.1 LU
    Input Threshold:     -37.7 LUFS
    
    Output Integrated:   -15.5 LUFS
    Output True Peak:     -2.7 dBTP
    Output LRA:            0.0 LU
    Output Threshold:    -26.2 LUFS
    
    Normalization Type:   Dynamic
    Target Offset:        -0.5 LU
    

    The sample values above indicate important information about the incoming media. For instance, the Input Integrated value shown indicates audio that is too loud. The Output Integrated value is much closer to -16.0. Both the Input True Peak and Input LRA, or loudness range, values are higher than our provided ceilings and will be reduced in the normalized version. Finally, the Target Offset represents the offset gain used in the output.

  3. Run a second pass of the loudnorm filter, supplying the values from step 1 as "measured" values in the loudnorm options.

    ./ffmpeg -i /path/to/input.wav -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-27.2:measured_TP=-14.4:measured_LRA=0.1:measured_thresh=-37.7:offset=-0.5:linear=true:print_format=summary output.wav
    

    A file, output.wav, is created containing a loudness-normalized version of your input file.

Listen to the following examples of an audio file before and after ffmpeg loudness normalization to hear how the tool works.

Before

After

Ear check your audio

Do an ear check to make sure your audio sounds good compared to the Google TTS Loudness Reference. To do this, toggle between listening to the files and notice any jumps in volume or balance and adjust the gain by ear if necessary.

Loudness should sound consistent for spoken words at -16 LUFS (stereo) or -19 LUFS (mono). However, if the frequency range of your audio is excessively high (like bird calls) or excessively low (like thunder), setting levels to -16 LUFS (stereo) or -19 LUFS (mono) might make this audio sound inconsistent with the Google TTS Loudness Reference. In this case, an ear check is particularly helpful in balancing all of the audio in your program.