Welcome to this workshop: An introduction to Voice Assistant with Google Cloud.
The workshop is split in two parts. Part one lets you explore Dialogflow to setup your own intent fulfillment, and in part two you will use Webhook intent fulfillment using some JavaScript and NodeJs.
You need no previous knowledge of either the Google Voice Assistant eco system or JavasScript and Node.js. But basic coding will be required in part two, but we will provide you with code examples and solutions if you are having any problems.
In this workshop you will create a voice assistant app for FINN.no's amazing new service: Ice cream cones delivered via drone, called Drone's Cream.
Drone's Cream's value proposition is quite simple. "We deliver the world's best ice cream to you immediately after you order it". To be able to fully exploit the ice cream hunger of the world we need to be able to take orders by voice from over heated people stuck outside in the sun.
The task is split into two parts. For the first part you need to set up a voice assistant app with both static and dynamic answers without hooking up to a separate backend. In part two you will have to connect the app from part one to a Node.js backend provided to you.
We will be using tools such as Actions on Google, Dialogflow, the Node.js Dialogflow Fulfillment Library and the Actions on Google Node.js client library all of which will be explained to you throughout.
What we will be creating today is an app for Google Assistant. Google Assistant is the software that enables different devices, such as Android phones and the Google Home smart speaker, to have fluent conversations with a user.
Google Assistant creates one common interface for all these pieces of hardware, allowing us to support them all with a single implementation. The only thing that might change between devices is what kind of inputs and outputs they support.
For instance: a Google Home smart speaker does not have a screen limiting the types of responses it can show to the user. But the same type of response is treated in the same way across devices.
This workshop will focus mainly on the audio interaction, but there are other possibilities such as cards with images, Call to Action elements and more.
To understand the different parts of an interaction between the user and our app we need to understand the concepts
conversation, intent and fulfillment
A conversation is the overarching and simplest concept we need to know. It is simply all communications between the user and Google Assistant, starting when the user requests which app to talk to, ending when either the user or the app requests the conversation to end.
So long as the conversation is active the app will respond to any request by the user. Before and after the conversation the general Google Assistant app is the one that will respond.
An intent can be understood as a single type of request. The user will state their intent to the app, and it must understand what the user intends and respond properly.
For instance, an intent can be the user saying ‘Hello' or ‘Good morning'. Even though those are different ways of greeting the app, the app can consider these to be the same intent and answer both with the same or a similar answer.
On the other hand. Should the user ask for the temperature outside that would be a different intent entirely.
Every user intent needs to be fulfilled. Most intents will probably not have custom fulfillment but rather a general answer such as "Sorry, I can't help you with that". As developers it is our job to identify which intents needs to be answered, and how to answer them.
To help us with this we have a set of tools provided us by Google. Making it a quite easy thing to get started at.
The first tool we will use is Actions on Google. Together with a tool called Dialogflow it will do most of the difficult magic concerning understanding user intents. It is through Actions on Google that we create our action which can be understood as the app.
Actions on google manages conversations and does speech-to-text and text-to-speech. It can communicate directly with custom backends, but leaves a lot of work to be done.
Dialogflow is a tool created to manage intents and train different agents to recognize the intent of the user. We simply need to specify which intents we want, what data will be supplied and examples of how they can be phrased by the user.
Dialogflow can also do simple intent fulfillment answering simple questions where no custom logic or data store is required. But in most cases we will want Dialogflow to use our existing or new backend service to answer many of, if not all, our intents.
So even though it is possible to create a Google Assistant app without using Dialogflow, it is absolutely preferable to use it.
For this workshop, you will need your own Google Account. If you do not already have one, create one here. You will not have to pay or add payment info for the services we use today.
For this workshop our first task is to create an action. To do this we need to be logged in our Google accounts.
We are now done creating our actions. Let's start creating dialogs with Dialogflow.
Review your account settings
We are now going to create our first agent and connect it to our Google project.
Click Create
In this part you will learn to use use Dialogflow to create your own intent fulfillment. The intents you create will be used in later stages of this workshop. So we recommend that you do not skip any of the tasks on this page.
Create an intent named welcome-drones-cream welcoming the user to the Drone's Cream virtual store using only Dialogflow. The intent should respond the user with a welcome prompt.
The intent should respond to prompts from the user such as "Hi", "hello", "good morning" etc.
The response should include the time of day if the prompt has it, i.e: If the prompt is "Good afternoon" the response should also start with "Good afternoon".
Hint: We want to replace the Default Welcome Intent, so start out by deleting it.
Create an intent named menu-drones-cream listing the inventory of the store when a user asks which flavours are available. The inventory for Drone's Cream is Vanilla, Chocolate, Mint and Strawberry.
The intent should respond to questions such as "What kinds of flavours do you have?" and "What ice creams do you offer?".
Create your own entity iceCreamFlavour with all the available flavours of Drone's Cream. If needed, add synonyms for the any or all flavours.
Create an intent named order-drones-cream to order a number of cones of ice cream using the entity created in the previous task. The user must supply one flavour and the number of cones as required parameters. In addition the user may supply the location he or she wants the ice cream delivered.
If either flavour or number of cones is missing from the request the user should receive a prompt to supply these.
The answer should include the order details including location if it is supplied.
You can find solutions to the tasks here.
To set up custom fulfillment for our Google Assistant app we will need to set up a webhook. A webhook is a simple http-POST endpoint capable of parsing the body of our request and create an appropriate response.
If our backend were to communicate directly with the actions requests, it would look something like below.

Actions on Google sends POST requests with a JSON payload with a bunch of information to communicate with the fulfillment service. The action will have done the rough work concerning speech to text, but the intent matching would be up to our app. Thankfully we have used Dialogflow to do that for us.
So in our case the communications look like the figure below.

Communicating with Dialogflow is a lot easier than communicating with Actions on Google. Since Dialogflow will have detected the user intent as well as having done work parsing any specified parameters into a manageable json format. Our backend will only need to parse that json and respond to the provided intent.
As stated earlier this workshop will use the Node.js Dialogflow Fulfillment Library. We could have solved the tasks in any programming language with tools to parse and create json. But Node.js has the most complete client library for both Dialogflow and Actions on Google, making the work a bit easier.
But in case you are curious to how you could solve this without the client libraries we have provided examples of the json requests and responses that the client library handles and creates in the tasks you will do.
To complete these tasks you will need to write some rudimentary JavaScript. What you need to know is:
You can put any type of data into a constant or variable. For instance it is perfectly legal to put a string into a variable to used to contain an integer:
let x = 5;
x = 'Hello';
So sometimes you will need to make sure that the variable you are reading contains the data type you want when you are doing comparisons or other actions. Maybe your x !== 5 should be x !== '5'
In the later versions of JavaScript functions can be written as both lambdas and explicit function types.
const myFunction = x => x + 1;
const myFunction2 = (x) => x + 1;
const myFunction3 = x => { return x + 1; }
const myFunction4 = (x) => { return x + 1; }
function myFunction5(x) { return x + 1; }
All the functions above do the same thing, but the lambda version tends to give the most compact code. Be aware that if you want more than one argument, version 1 and 3 would not work as more than one arguemnt must wbe wrapped in parathesis.
Equality in JavaScript can be done in two ways: == or ===. There are some differences to how they work, but in this workshop we suggest you only use === to avoid any confusion.
JavaScripts allows you to declare any object ad hoc. It needs no class definition and any member can be added or removed at any time.
For instance:
const obj = {
x: 2,
y: 4,
};
obj.z = 7;
obj.y = undefined;
is perfectly legal and will change the value of obj.y
If and else statements are written similarly to Java.
if (x === y) {
// do something
} else if (y === z) {
} else {
}
But one thing that can be used well is that a lot of non-boolean values will give true or false ansvers as well.
For instance we can check if values contain data or objects has certain members.
This can be weird. But if we need to access z in the following object:
const obj = {
x: {
y: {
z: 5
}
}
}
We might need to check that x and y are not null or undefined. If not the program would fail during runtime similarly to getting a NullPointerException in Java.
So what we do is:
if (obj && obj.x && obj.x.y) {
console.log(obj.x.y.z);
}
The main component to the client library is the `WebhookClient` class. It is imported from the `dialogflow-fulfillment` npm package like this:
const { WebhookClient } = require('dialogflow-fulfillment');
The way we use it is by sending request and response objects as arguments to the constructor of the class, and using its built in functions to most of what we need to do today.
app.post(‘/', (req, res) => {
const agent = new WebhookClient({ request: req, response: res });
...
};
const welcome = agent => {
... // some logic to answer the intent
}
const fallback = agent => {
... // some logic to answer that the intent was not recognized
}
let intentMap = new Map();
intentMap.set('Welcome', welcome);
intentMap.set(null, fallback);
agent.handleRequest(intentMap);
Here the function welcome would be used if the name of the intent coming from Dialogflow is "Welcome". Be aware that the matching is case sensitive. So the match need to be absolute. The null element of the map is used if no match for the intent name can be found in the map.
agent.add(‘Hi, this is my answer!');
If add is used multiple times for a single intent, all texts added will be read. The `handleRequest`-function
then contains the logic to turn everything added to the agent into a response and using the response-object passed through the constructor to respond to the POST-request.
agent.context.set({ name: ‘params', lifespan: 1, parameters: agent.parameters });
and then get them using:
agent.context.get(‘params').parameters;
We can also delete contexts if we would like to remove it even though it has a remaining lifespan of more than zero.
agent.context.delete(‘params');
Actions by Google also uses contexts quite a lot to specify things such as device capabilities. So in most real life cases there will always be 5-6 contexts set. But when debugging from Dialogflow only our own contexts are part of the requests and responses.
const { Permission } = require(‘actions-on-google');
...
const conv = agent.conv();
if (conv) { // conv can be null if the intent was not called from Google Assistant (say using the Dialogflow testing tool)
conv.ask(new Permission({
context: 'To know where you are',
permissions: 'DEVICE_PRECISE_LOCATION',
}));
agent.add(conv);
} else {
agent.add('Your current device does not support location data.');
}
The following intent will now include a quite deeply nested object called `originalRequest`. Depending on what kind of data we request it will probably be contained under either user (`agent.originalRequest.payload.user`) or device (`agent.originalRequest.payload.device`) data.
In the case of location it can be found under `agent.originalRequest.payload.device.location`. It will have data such as latitude, longitude and formatted address. It is recommended to play use some console logging to get familiar with this object.
In addition it is smart to check whether the permission actually was given, if not the data will be missing. That can be found in one of the contexts called `actions_intent_permission`:
agent.context.get(‘actions_intent_permission').parameters.PERMISSION // this is a boolean
<a name="installation&setup-1"></a>
##Installation & setup
<a name="installation"></a>
###Installation
Start by installing [Nodejs](https://nodejs.org/en/download/). You can verify that it is correctly installed by typing `node -v` in the terminal. You will also need [npm](https://www.npmjs.com).
Clone the repository to get access to the code:
git clone https://github.com/mathjoh/google-home-101.git
<a name="runyourapplication"></a>
###Run your application
To communicate with Dialogflow, you need to start both the node application and ngrok.
Running the application from the _node_-folder:
npm i && npm run dev
Setup https proxy forward to localhost:1234
./ngrok http 1234
```
Use the posted https address from ngrok in the tasks.
In this part you will need to adapt the intents created in the first tasks to be answered by a webhook instead of Dialogflow.
Enable webhook fulfillment for the intent created in task 1.
Extend the provided backend tasks/index.js to respond to the intent with the same responses as Dialogflow. In addition add fallback handling if unknown intents are routed to the webhook.
Test your webhook using either the testing tool in Dialogflow or Actions on Google.
Hint: You will have to enable webhook fulfillment in your intent. It might be a good idea to remove some of your responses to make sure your get your responses from the application and not one of the predefined responses.
Convert the intent from task 2 to be answered by your webhook. This time the answer should reflect the actual inventory of the Ice Cream store. We have provided you with a small service with in-memory inventory handling which should contain all the necessary business logic in the file tasks/store.js.
Have the answer to the prompt include both which flavours are in stock and which are out of stock. The number of cones remaining of each does not need to be included.
Test your webhook using either the testing tool in Dialogflow or Actions by Google.
Hint: The functions remainingFlavours and emptyFlavours both return an array of strings, and in javascript you can call array.join(', ') to merge the strings into a comma-separated string.
Convert the intent from task 4 to be answered by your webhook. This time the answer should reflect the actual inventory of the Ice Cream store.
If there is not enough cones of the requested flavour left the response should reflect that. If there are enough cones left the inventory numbers should be updated.
Test your webhook using either the testing tool in Dialogflow or Actions by Google.
Hint: The function order takes favour and number of cones as parameter and returns trueif the sale was successful and false if it is not.
Note: The backend only support 'vanilla', 'mint', 'chocolate' and 'strawberry' as flavours. And the spelling must be correct. If not it will not recognize that it is in the inventory. But feel free to add new flavours in `store.js` if you like.
In this task you need to update the intent fulfillment from the previous task.
As specified in task four, location is an optional parameter. But in order to fulfill and order, our drones need to know where to deliver the ice cream cones. To get this knowledge you will ask the user for permission to get the location of the device used to communicate the order.
If requests contain location we can use the same logic as in the previous task. Otherwise, request the permission to get device location from the user.
Create a new intent named _finsh_order_dronescream handling the users response to the permission request submitted by the user. This request must handle both getting the permission as well as being denied the permission.
Hint: Permission intents are triggered by event, not dialogue, and does not need any training phrases.
Now that we can get the location of the device our service is ready to start serving the community. Unfortunately it will keep asking for permission to get device location from users who have already consented.
Update your service to check if the user already have consented before asking for the location again.
If you like to learn more about making voice assistant apps we suggest you do one of the following things: