Convert Audio to Text with Google Cloud Speech API

C
Published in: Google Apps Script - Google Cloud

The Online Dictation app uses the HTML5 Speech Recognition API to transcribe your voice into digital text. If you have a pre-recorded audio file, you can turn on speech recognition inside Dictation, play the audio file and get the speech as text.

Google offers a Cloud Speech API for developers to convert audio to text. You can upload the audio file in FLAC format to Google Cloud storage and the speech API will transcribe the audio to text. If you have audio in MP3 format, use the FFMpeg tool for converting the audio to the desired format.

Also see: Cloud Speech API with Google Service Account

In this example, we upload the .flac audio file to Google Drive (for those who don’t have Google Cloud Storage) and call the Cloud Speech API via the UrlFetchApp service. You need to enable billing in your Google Cloud console, enable the Speech API and also setup an API Key or a service account.

/*

Written by Amit Agarwal
email: amit@labnol.org
web: https://digitalinspiration.com
twitter: @labnol

*/

function convertAudioToText(flacFile, languageCode) {

  var file = DriveApp.getFilesByName(flacFile).next();
  var bytes = file.getBlob().getBytes();

  var payload = {
    config:{
      encoding: "LINEAR16",
      sampleRate: 16000,
      languageCode: languageCode || "en-US"
    },
    audio: {
      // You may also upload the audio file to Google
      // Cloud Storage and pass the object URL here
      content:Utilities.base64Encode(bytes)
    }
  };

  // Replace XYZ with your Cloud Speech API key
  var response = UrlFetchApp.fetch(
    "https://speech.googleapis.com/v1/speech:recognize?key=XYZ", {
      method: "POST",
      contentType: "application/json",
      payload: JSON.stringify(payload),
      muteHttpExceptions: true
    });

  Logger.log(response.getContentText());

}

Here’s another example that uses the CURL library to send speech recognition requests from the command line.

curl --silent --insecure --header "Content-Type: application/json"
"https://speech.googleapis.com/v1/speech:recognize?key=XYZ"
--data @payload.json

// Content of payload.json
  {
    "config": {
        "encoding":"FLAC",
        "sampleRate": 16000,
        "languageCode": "en-US"

    },
    "audio": {
        "uri":"gs://ctrlq.org/audio.flac"
    }
  }
📮  Subscribe to our Email Newsletter for Google tips and tutorials!
Published in: Google Apps Script - Google Cloud

Looking for something? Find here!

Meet the Author

Web Geek, Google Developer Expert
A
Amit Agarwal

Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India. He is the developer of Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory

Get in touch

Google Add-ons

Do more with your Gmail and GSuite account
G

We build bespoke solutions that use the capabilities and the features of Google Workspace for automating business processes and driving work productivity.

  1. Mail Merge with Attachments
    Send personalized email to your Google Contact with a Google Sheet and Gmail
  2. Save Emails and Attachments
    Download email messages and file attachments from Gmail to your Google Drive
  3. Google Forms Email Notifications
    Send email notifications to multiple people when a new Google Form is submitted
  4. Document Studio
    Create beautiful pixel perfect documents merging data from Google Sheets and Google Forms
  5. Creator Studio for Google Slides
    Turn your Google Slides presentations into animated GIFs and videos for uploading to YouTube