Enhancing Your Alexa Skill with GPT

Alexa and GPT

There are a myriad of possibilities that seem to be available with the explosion of Chat GPT, and like everyone else, I wanted to see what I could do with it as soon as I got access to the API.

Use Case

My first idea was to try and implement a way to ask Alexa for substitutions in the currently selected recipe. This is a really useful feature and simple enough to experiment with quickly.

Overview

In this case, we’re introducing GPT to an existing Alexa Skill, so the basic idea is:

  1. Add a new intent to the Alexa interaction model so we can ask questions and hand them off to GPT.
  2. Call OpenAI’s API from our skill with the user’s question
  3. Speak the answer back to the user via Alexa

1. Add a new Intent

We’ll add a new intent to the interaction model .json file called AskOpenAIIntent. This intent will use the AMAZON.SearchQuery SlotType, which supports capturing a simple phrase, similar to something you might enter into a search engine. The query captured in a variable called question, and we’ve added some samples to show how we might expect users to phrase their question. eg. “can I use plain flour instead of self raising?”:

{
  "name": "AskOpenAIIntent",
  "slots": [
    {
      "name": "question",
      "type": "AMAZON.SearchQuery"
    }
  ],
  "samples": [
    "what can I {question}"
  ]
},

2. Call OpenAI’s API

We’re using the excellent - but seemingly abandoned - Alexa Controls framework for our skill which has some nuances I’ve left out here. The main code here is applicable to regular Alexa node-js development too:

// import the openai libraries
import { Configuration, OpenAIApi } from "openai";

// create an OpenAI Configuration with our API key
const config = new Configuration({
  apiKey: "**-your-API-Key-here-**",
});
const openai = new OpenAIApi(config);

// grab the 'question' the user asked from the request
var question = getSlotValue(input.handlerInput.requestEnvelope, "question");

// real implementation finds the current recipe the user is interacting
// with but we can just hard code some data here for this example:
var recipe_title = "Oven Baked Mozzarella Wrapped in Parma Ham"
var ingredient_names = "parma ham, mozarella, onion, tomatoes, olive oil"

var prompt = `Given a recipe for ${recipe_title} with ingredients \
                    ${ingredient_names}, what can I ${question}?`;
}

// where we will construct the response to the user
var speakOutput;

try {
  const response = await openai.createCompletion({
    model: "text-davinci-003",
    prompt: prompt,
    temperature: 0.5,
    max_tokens: 1500,
    top_p: 1,
    frequency_penalty: 0.0,
    presence_penalty: 0.0,
  });

  speakOutput =
    response.data.choices[0].text + " What more would you like to know?";
} catch (error) {
  speakOutput = "Oh dear.. something went wrong. Please try again.";
}

// carry on here and say the contents of speakOutput to the user via Alexa
// ...

The key points here are we create an instance of OpenAIApi, then construct our prompt by prepending the question the user asked with the recipe title and ingredients so GPT has some context for the question.

So in this case, perhaps the user is engaging with a recipe for Oven Baked Mozzarella but on hearing the ingredients enumerated by Alexa realise they do not have parma ham, we might expect them to say

“Alexa, what can I use instead of parma ham?”

This would match our AskOpenAIIntent intent in the interaction model and then in our node app we would create a prompt as follows:

Given a recipe for Oven Baked Mozzarella Wrapped in Parma Ham with ingredients parma ham, mozzarella, onion, tomatoes, olive oil, what can I use instead of parma ham?

We then send this over to OpenAI using the provided createCompletion function and await the response as shown in the code snippet above.

3. Speak the answer back to the user via Alexa

Once we get a response from OpenAI, we have Alexa speak this back, and append a follow up question so the user is prompted to reply. In my testing, I would get something like this which is frankly… pretty good!

Prosciutto, bacon, or even salami would work as a substitute for parma ham. What more would you like to know? Prosciutto

Conclusion

This is a simple example, but really shows how you can leverage the power of OpenAI with just a few lines of code and a simple implementation.