How to Play YouTube Audio From Your Alexa

How to Play YouTube Audio From Your Alexa

How to Play YouTube Audio From Your Alexa

  • by admin
  • Programming

Build an Alexa Skill to play audio from YouTube videos

Introduction

Have you ever wanted to ask Alexa to play the audio of a YouTube video? Just say “Alexa, Ask Hey Tube to play Whitesnake” and listen to a great Whitesnake track from YouTube.

In this article, I will show how to build a Custom Alexa Skill to connect your Amazon Echo device to YouTube.

Note: Keep in mind this skill will not be published in Amazon Store production, and can only be used in development mode. This is not an Amazon approved Skill and should only be used for educational purposes.


Alexa Custom Skill

The Alexa custom skill is a combination of two main components:

 

Image for post

https://github.com/bignerdranch/developing-alexa-skills-solutions/blob/master/coursebook/coursebook.pdf

The skill interface defines how your skill will behave:

  • A set of “intents” that represent actions that users can do with your skill.
  • A set of “sample utterances” that specify the words and phrases users can say to invoke those intents. You map these utterances to your intents — this mapping forms the “interaction model” for the skill.
  • An “invocation name that identifies the skill. The user includes this name when initiating a conversation with your skill.

The skill service contains your code logic to handle the intents and perform actions. The service is deployed as a backend resource. For this exercise — JavaScript deployed in AWS Lambda


Skill Flow

This is the custom skill flow to be developed:

 

Image for post

  • The flow starts with a welcome message and expects a video query name.
  • When asked to play a certain video, Alexa Interface calls the service using the query as the parameter.
  • The service calls YouTube API to search for a video, then returns the URL to Alexa.
  • Alexa starts playing the audio from the video URL.
  • At any time the user can ask Alexa to stop playing. Alexa will stop the audio track and say “Goodbye”.

Let’s Start

The easiest way to create a new Alexa Skill is by using ASK CLI.

Before starting, make sure you have an Amazon Developers Account, Node installed, AWS Account, AWS CLI, and ASK CLI installed.

For a complete detailed guide setup of your environment check out my previous article Quickly Build an Alexa Skill Using ASK CLI.

Let’s start by creating the new skill structure: $ ask new:

 

Image for post

ask new

Select:

  • NodeJS as the programming language
  • AWS Lambda as the backend resource
  • Hello world as the starter template
  • Choose a skill name: alexa-skill-heytube
  • Choose a folder name: alexa-skill-heytube

Change to your new directory:$ cd alexa-skill-heytube/lambda

Install your packages: $ npm install


Skill Service

Open your favourite code editor. I’m using Visual Studio Code.

This is your starting skill structure:

Image for post

vs code skill structure

The skill service files are located in the skill-package folder.

The skill package contains all the resources used by the ASK platform — skill manifest, interaction models, and assets.


Interaction Models

The interaction model resolves the spoken words into specific intent events. You define the words that should map to particular intent names in the interaction model by providing a list of sample utterances. A sample utterance is a string that represents one possible way a user may talk to the skill. These utterances are used to generate a natural language understanding model. This resolves users voice to our skills intents.

The name of the interaction model is <locale>.json, for example en-GB.json.

Note: My Alexa is configured for en-GB, update your file name according to your configuration, such as en-US or others.

Open iteractionModels/en-GB.json file:

As with any json file, pay close attention to formatting:

  • Update invocationName to the name of your skill:
    “invocationName”: “hey tube”
  • Delete HelloWorldIntent, AMAZON.NavigateHomeIntent
  • Create new intent to get a YouTube video:
{
   "name": "GetVideoIntent",
   "slots": [{
       "name": "videoQuery",
       "type": "VIDEOS"
   }],
   "samples": [
       "search for {videoQuery}",
       "find {videoQuery}",
       "play {videoQuery}",
       "start playing {videoQuery}",
       "put on {videoQuery}"
   ]
},

Add standard required intents for audio skills: AMAZON.PauseIntent and AMAZON.ResumeIntent:

{
   "name": "AMAZON.PauseIntent",
   "slots": [],
   "samples": []
},
{
   "name": "AMAZON.ResumeIntent",
   "slots": [],
   "samples": []
}

We will also need to define a type called VIDEOS.

Add the following in the types section:

{
   "name": "GetVideoIntent",
   "slots": [{
       "name": "videoQuery",
       "type": "VIDEOS"
   }],
   "samples": [
       "search for {videoQuery}",
       "find {videoQuery}",
       "play {videoQuery}",
       "start playing {videoQuery}",
       "put on {videoQuery}"
   ]
},

Note: If necessary rename or create a copy of your json file according to your Alexa configured language, like en-US.json.

Skill manifest

The skill manifest is the JSON representation of your skill and provides Alexa with all the required metadata. The interaction model and account linking schemas, if used, are separate.

Open skill.json. Update your locale's information with your skill summary, example phrases, name, and description. You can also add small and large icons of your skill in png format:

"locales": {
   "en-GB": {
      "summary": "Alexa Hey Tube Skill",
      "examplePhrases": [
         "Alexa open hey tube"
      ],
      "keywords": [
         "audio",
         "streaming",
         "youtube"
      ],
      "name": "Hey Tube",
      "description": "Alexa Skill to listen to youtube audio",
      "smallIconUri": "https://alexademo.ninja/skills/logo-108.png",
      "largeIconUri": "https://alexademo.ninja/skills/logo-512.png"
   }
}

Update the category to STREAMING_SERVICE:

"category": "STREAMING_SERVICE",

Enable the audioplayer interface API. This will enable the Alexa skill to play audio files:

"apis": {
   "custom": {
      "interfaces": [{
         "type": "AUDIO_PLAYER"
      }]
   }
}

The complete skill.json is available here:

 

skill.json


YouTube Data API key

Your application must have authorization credentials to be able to use the YouTube data API.

Using the YouTube API does not incur any monetary cost for the entity calling the API. If you go over your quota a 403 Error will be returned by the API. Free usage is limited to 100000 units per day — more than enough for our usage.

Follow the steps below to get your YouTube API key:

  1. Head over to the Google developers console
  2. Create a new project by clicking the Select project dropdown right next to the logo. Click the New Project button and give it a name.
  3. Select your project by choosing it in the Select Dropdown directly next to the logo in the header.
  4. Click the Enable APIs and Services button.
  5. Search for youtube data.
  6. Click on the Youtube Data API v3.
  7. Click the blue enable button.
  8. In the dashboard, click Credentials on the left sidebar.
  9. Click the Create Credential button.
  10. SelectAPI Key.
  11. Copy your new generated key and save it for now.

Securing the API key

Google provides the following guidelines to secure your API key:

When you use API keys in your applications, ensure that they are kept secure during both storage and transmission. Publicly exposing your credentials can result in your account being compromised, which could lead to unexpected charges on your account. To help keep your API keys secure, follow these best practices:

  • Do not embed API keys directly in code. API keys that are embedded in code can be accidentally exposed to the public. For example, you may forget to remove the keys from the code that you share. Instead of embedding your API keys in your applications, store them in environment variables or files outside your application’s source tree.
  • Do not store API keys in files inside your application’s source tree. If you store API keys in files, keep the files outside your application’s source tree to ensure your keys do not end up in your source code control system. This is particularly important if you use a public source code management system such as GitHub.
  • Set up application and API key restrictions. By adding restrictions, you can reduce the impact of a compromised API key.
  • Delete unneeded API keys to minimize exposure to attacks.
  • Periodically regenerate your API keys. You can regenerate API keys from the Credentials page by clicking the regenerate key for each key. Then, update your applications to use the newly-generated keys. Your old keys will continue to work for 24 hours after you generate replacement keys.
  • Review your code before publicly releasing it. Ensure that your code does not contain API keys or any other private information before you make your code publicly available.

Skill Code

Now we’re ready to develop our skill in NodeJS.

Open lambda/index.js. The index.js will first list your intent handlers created by ASK CLI.

Intent handlers

An “intent” represents an action that fulfills a user’s spoken request. This will trigger intent handlers defined in your lambda handler index.js. Intent handlers receive an incoming request and return an appropriate response.

  • Built-in intents are predefined for common actions that you can implement in your custom skill without providing any sample utterances. The standard intents are used for common, general actions such as stopping, canceling, and asking for help. Here’s a list of available standard built-in intents:

techdoc-template

Overview Integrate Alexa directly into your products.

developer.amazon.com

Let’s edit the standard intent handlers:

  • LaunchRequestHandler: this method will handle the launch of our skill.

“Alexa, open hey tube”

Edit your Welcome Message and add a re-prompt to repeat the welcome:

LaunchRequestHandler

  • CancelAndStopIntentHandler: This method will pause the audio from being played. It treats, “Stop”, “Cancel” and “Pause” as keywords

“Alexa, stop music”

CancelAndStopIntentHandler

Let’s create a custom intent handler to handle the GetVideo call.

GetVideoIntentHandlerthis method will handle the search for a YouTube video and return the audio stream so Alexa can play it.

I will use a controller helper function to make the code cleaner.

“Alexa, play Whitesnake”

GetVideoIntentHandler

Helper functions

The controller helper has three methods:

  • search: Calls YouTube API to search a video based on query received by the user. The search returns video information such as the video ID and title.
  • play: Once the video information is received, it calls another YouTube API to receive the stream and convert it to the 140 quality required by Alexa. The method will return a URL. It then calls addAudioPlayerPlayDirective to start playing the audio stream.
  • stop: Calls addAudioPlayerStopDirective to stop playing the audio stream.

Search videos

To search YouTube videos, I’m using the npm package youtube-search.

Let’s install it. In your lambda folder type: $ npm i --save youtube-search.

Add the import at the top of the file:

const search = require("youtube-search");

The complete method, getAudioInfo, is listed below:

getAudioInfo

Play audio

To play the audio stream I’m using another npm package, ytdl-core. Let’s install it.

In your lambda folder type $ npm i --save ytdl-core.

Add the import at the top of the file: const search = require("ytdl-core");

The complete method, getAudioUrl, is listed below:

getAudioUrl

Skill builder

The last part is to add the request handler to SkillBuilder. The skill builder is the entry point of your skill, routing all requests and responses:

The complete source code is available in my GitHub:

mlomboglia/alexa-skill-heytube

Alexa Skill to Play Audio from YouTube Videos. Contribute to mlomboglia/alexa-skill-heytube development by creating an…

github.com


Deploy Your Skill

Quickly deploy your skill with ASK CLI without going to the console. Go to your skill folder root and type: $ ask deploy.

ASK will automatically:

  • Deploy skill metadata: Package and upload the contents of the skill’s skill-package/ directory to Alexa Skills using the Skill Package API. If this is the first time you've deployed the skill, the command creates a new skill in the configured Amazon developer account.
  • Build a skill code under the lambda/ directory and package the build artifacts for deployment to AWS. This includes installing the skill's dependencies and producing any build artifacts. For a skill written for Node.js, this means running npm install to install dependencies declared in package.json and packaging it with the JavaScript source code.
  • Deploy skill infrastructure building artifacts to the configured AWS Lambda function. If this is the first time the skill has been deployed and a Lambda ARN has not been configured, the command creates a new Lambda function.
  • Enable skill

Image for post

After a while, your skill is deployed in AWS and ready to be used in development mode (not publicly available, yet).

Validate your deployment:

  • Sign in to AWS Console → Lambda

Verify your new skill as a lambda function is created.

Image for post

Sign in to the Alexa developer console and verify your new skill has been created. You can click on it to see the details, but I don’t recommend editing any fields through the console.

 

Image for post

Congratulations — your first skill is properly deployed in Alexa and AWS Lambda. Everything is done through the CLI without going through the console.

You can edit your code in Visual Studio Code, save it and simply type $ ask deploy to redeploy your skill. This will save you a lot of time while developing.


Last Configs

The last configuration is to add your environment variable to Lambda. Unfortunately, I couldn’t find a way to add environment variables with ASK CLI, so my way is going to from AWS Console to Lambda.

Click on the service name and scroll to the “Environment variables ”section. Add your variable, YOUTUBE_API_KEY, with the value being the key you created above in the Google Developers Console.

 

Image for post

Add your YOUTUBE_API_KEY

Also, increase your service timeout in “Basic settings” to 15 seconds.

Image for post

Increase timeout

Now let’s update the local service revisionId.

If you try to deploy your service now, because you’ve updated configuration through the console, you’ll get the following error:

[Error]: The current revisionId (The revision ID for Lambda ARN (arn:aws:lambda:eu-west-1:924338216746:function:ask-heytube-default-default-1595666765538) should be a4d5f786–96c3–414c-9b94–4b8ecade411b, but found 1dcd52fb-8626–4f63-a4cc-c5fc59454f0b. Please solve this revision mismatch and re-deploy again.

The easiest way to resolve this is:

  1. Open to your .ask/ask-states.json file
  2. Find the “revisionId”: “XXXX”
  3. Delete the value in the “revisionId”: “”
  4. Deploy the skill: $ ask deploy

Test the First Skill

To test your skill from the command line, run ask dialog and specify a locale supported by your skill:

$ ask dialog --locale en-GB

The command opens an interactive terminal in which you can simulate a multi-turn conversation with Alexa:

 

Image for post

Open your skill and simulate a conversation:

User  > Alexa open hey tube
Alexa > Welcome to Hey Tube. ask to play a video to start listening.
User  > play Whitesnake
Alexa > Playing  Whitesnake - Here I Go Again '87 (Official Music Video)
User  > .quit

The first test is done. If you have an Alexa device, you can test this conversation in your Alexa — you should get the same results and hear a great song from Whitesnake.

You can make any changes to your skill and redeploy it.

You can automate the replay by using a preformatted .json file to save some time in your tests.

Use .record to create a repeatable JSON test file:

User  > .record

I created happyPath.json as an example:

$ ask dialog --locale en-GB --replay happyPath.json

You can also test it through the Alexa Developers Console:

 

Image for post

Alexa Developers Console

 

Tags: Alexa