Build a Serverless Walkie Talkie App in One Sitting: Service (Part 1)

Do you like Discord and Zello, but find that they’re too free, easy, and reliable for you? Are you the DIY kind of person, and want to build and run your own voice chat server? This is the workshop for you.

In an hour of development time, we’ll build a functional, serverless web walkie-talkie application on AWS, using Cognito, AppSync, CDK, React, Amplify, and ApolloClient.

We’ll learn:

How to quickly get our infrastructure up and running with AWS CDK
How to configure an AppSync schema, resolvers, and data sources without using the AWS Console
How to use Cognito for AppSync authentication & authorization
How to configure a brand-new React application using AWS Amplify and pre-existing AWS resources
How to record, process, and play audio data in the browser, sending it to and receiving it from AppSync
How to integrate ApolloClient with AppSync GraphQL subscriptions

This workshop assumes a working knowledge of GraphQL, React, and TypeScript, both of which already have excellent tutorials out there.

We’ll be creating a voice chat application, where authenticated users can join one of a number of rooms and “push to talk,” trading voice messages with other users in that room.

The finished product is available here on Github.

Important Notes

Delete all resources once you’re done with this workshop! It will be hard for you to reach a bill of even $1 with this application, but as with any cloud application, if it’s heavily used or abused, the costs are unbounded.
This application is not production ready, fully secure, or accessible. It does come with the built-in utilities and safeguards of AWS, but also with its own set of bugs.
The client application is not mobile-ready, thanks to permissions restrictions by mobile browsers especially within iOS. We’ll cover the reasons for that and possible workarounds in Part 2.

Part 0: Getting Organized

While this project would be a good candidate for Yarn Workspaces and/or Lerna, we won’t use those, in the interest of simplicity.

Find a good spot for a new project directory, and create a folder for this project. The name doesn’t matter. Within that project folder, create another directory, service.

Also, ensure that you have an AWS account with an IAM user with enough rights to manage all the resources below. I used AdministratorAccess for mine. Then, create an AWS profile to store those access keys and default region.

Part 1: The Backend

Before getting started, I highly recommend the official AWS CDK Workshop, which provides a great primer on this polished new tool from AWS.

All steps in this part are working within the service subdirectory.

Set your AWS profile with export AWS_PROFILE=whatever, then run cdk init --typescript followed by cdk bootstrap. In init, name the project whatever you want - I used serverless-walkie-talkie.

We’ll be creating the following resources with CDK:

A simple DynamoDB table to list the available room ID’s, with an IAM role to access it
A GraphQL schema, and an AppSync schema which consumes it
AppSync data sources, which AppSync uses to receive and return data per the schema
A Cognito User Pool as the authorizer for AppSync access

Note that, as of writing, AppSync does not yet have “CDK Constructs” created for it. Constructs are an extra layer of architecture abstraction on top of standard CloudFormation, and allow creating common patterns with less configuration. So, we’ll essentially be writing CloudFormation in TypeScript for those select resources, but CDK still adds utilities that make it easier to develop than vanilla CF.

This is a small enough stack that it can fit all within one file, lib/whatever-you-named-the-project.ts. When we’re done, it will look like this

Install these dependencies, and import them at the top of that construct file:

import appsync = require("@aws-cdk/aws-appsync");
import cognito = require("@aws-cdk/aws-cognito");
import dynamo = require("@aws-cdk/aws-dynamodb");
import iam = require("@aws-cdk/aws-iam");

All of the code snippets below are within that file’s constructor.

First, let’s set an application name we can reference later on:

const applicationName = "serverless-walkie-talkie";

We can create a userPool with default settings in only 2 LoC:

const userPool = new cognito.UserPool(this, "UserPool", {
  autoVerifiedAttributes: [cognito.UserPoolAttribute.EMAIL]
});

const userPoolClient = new cognito.UserPoolClient(this, "UserPoolClient", {
  userPool
});

run cdk synth at this point to see how much CloudFormation output that generated. How great is that!

The email verification is required not by CF but because of defaults within AWS Amplify, whose packaged withAuthenticator views assume a verification step.

Let’s add some outputs to the end of the constructor to display the values we’ll need elsewhere:

new cdk.CfnOutput(this, "UserPoolId", {
  value: userPool.userPoolId
});

new cdk.CfnOutput(this, "UserPoolProviderUrl", {
  value: userPool.userPoolProviderUrl
});

new cdk.CfnOutput(this, "UserPoolClientId", {
  value: userPoolClient.userPoolClientId
});

Run cdk deploy at this point to make sure everything’s set up correctly. You’ll see those output values printed to console at the end!

Next, let’s add a DynamoDB table to store the room ID’s available. Alongside it, we’ll create an IAM role that appsync will use to read from/write to the table:

const roomTable = new dynamo.Table(this, "RoomTable", {
  partitionKey: {
    name: "id",
    type: dynamo.AttributeType.STRING
  },
  readCapacity: 5,
  writeCapacity: 1,
  removalPolicy: cdk.RemovalPolicy.DESTROY
});

const dynamoRole = new iam.LazyRole(this, "AppsyncDynamoRole", {
  assumedBy: new iam.ServicePrincipal("appsync.amazonaws.com"),
  roleName: `${applicationName}-appsync-dynamo-role`,
  managedPolicies: [
    iam.ManagedPolicy.fromAwsManagedPolicyName("AmazonDynamoDBReadOnlyAccess")
  ]
});

Note that in this scenario, AppSync won’t be writing any data to dynamo, and so AmazonDynamoDBReadOnlyAccess is enough for us. If you make changes, though, you may prefer to use AmazonDynamoDBFullAccess. This is a “managed policy,” one that is owned and maintained by AWS itself.

Run cdk deploy again, which will take a short time to create the table. Once that’s complete, use the DynamoDB console to create a couple of room ID’s within that table, like abc123. Sample data is enough for our purposes here, but in reality we could add additional resolvers to create and destroy rooms via AppSync.

Now for the good part: provisioning the AppSync resources. AppSync requires a GraphQL schema, which could be defined in an inline string. However, I’d rather define that schema in a separate .graphql file, which comes with the syntax checking and highlighting of my IDE. So, create a schema.graphql file somewhere in your client directory, maybe within a new src/ subdirectory. It should look like this.

Then, within your construct, add these resources:

const appsyncApi = new appsync.CfnGraphQLApi(this, "AppSyncApi", {
  name: `${applicationName}-api`,
  authenticationType: "AMAZON_COGNITO_USER_POOLS",
  userPoolConfig: {
    awsRegion: cdk.Stack.of(this).region,
    defaultAction: "ALLOW",
    userPoolId: userPool.userPoolId
  }
});

const appsyncSchema = new appsync.CfnGraphQLSchema(this, "AppSyncSchema", {
  apiId: appsyncApi.attrApiId,
  definition: require("fs")
    .readFileSync(path.join(__dirname, "../src/schema.graphql"))
    .toString()
});

Some things to note here:

As mentioned above, the Cfn prefix here indicates that these are the lower-level objects which map 1-to-1 with CF resources.
Within the AppSync schema, we read the GraphQL schema from disk in order to import it as a string. This makes for more readable code than defining it inline.
We connect the API to the UserPool. The defaultAction isn’t well-documented, but of the options DENY and ALLOW, DENY appears to deny access regardless of configuration. A bug in CloudFormation means that it isn’t marked as required, but indeed it is.

Let’s also add another output at the end of the constructor, to display the URL we’ll need to connect to our GraphQL endpoint:

new cdk.CfnOutput(this, "ApiEndpoint", {
  value: appsyncApi.attrGraphQlUrl
});

So, the schema is defined within AppSync and the API can listen for requests, but it doesn’t know how to actually resolve those requests. We’ll fix that with DataSource and Resolver.

Let’s start with the simplest one: listing all the available rooms, a simple Scan in DynamoDB.

const roomDataSource = new appsync.CfnDataSource(this, "RoomDataSource", {
  apiId: appsyncApi.attrApiId,
  name: "RoomDataSource",
  type: "AMAZON_DYNAMODB",
  serviceRoleArn: dynamoRole.roleArn,
  dynamoDbConfig: {
    awsRegion: "us-east-1",
    tableName: roomTable.tableName,
    useCallerCredentials: false
  }
});

roomDataSource.addDependsOn(appsyncSchema);

new appsync.CfnResolver(this, "ResolverRooms", {
  apiId: appsyncApi.attrApiId,
  dataSourceName: roomDataSource.name,
  fieldName: "rooms",
  requestMappingTemplate: ensureString(roomsRequest),
  responseMappingTemplate: ensureString(roomsResponse),
  typeName: "Query"
}).addDependsOn(roomDataSource);

Note:

The DataSource must “depend on” the schema, and the Resolver on the DataSource, to make sure they are provisioned in order. Without that, you’ll see errors in CloudFormation.
The DataSource is configured to read from the Dynamo table, using the role we provisioned before.
The Resolver resolves a field specified in the Query schema (rooms) against a DataSource (roomDataSource) using request and response “mapping templates.”
We’ll get to ensureString in a moment, but it’s a function I defined to JSON.stringify anything that isn’t already a string.

What about these mapping templates? Well, it turns out that these are much easier to create from scratch in the AWS Console, where there are templates and autocomplete. Once we deploy/create these resources, you should experiment with those to get feedback on changes.

For now, though, we can add these templates in new files under src/resolverMappings or whatever you’d like. I followed a practice of one file per resolver, with each exporting a request and response. This is the simple src/resolverMappings/rooms.ts:

export const request = {
  version: "2017-02-28",
  operation: "Scan"
};

export const response = "$util.toJson($context.result.Items)";

What this means is that when a new request hits our AppSync API, it will use the request to build its request to the underlying data source. The response is similarly used to transform the response into what the GraphQL client will receive as the data object. In this case, we want to retrieve all the rooms, so we use a simple Scan, the easiest of the DynamoDB operations. $util and $context are described in the documentation, and are used by AppSync to insert values from the template environment. In this case, $util.ToJson will convert the result to JSON, and $context.result.Items reads the Items field returned by DynamoDB.

In order to use those, you’ll need to import them in your construct file:

import {
  request as roomsRequest,
  response as roomsResponse
} from "../src/resolverMappings/rooms";

In those resolverMappings, you may choose to use either a JS object (for dynamic content or syntax highlighting/checking) or a multiline string. So, I added an ensureString function:

const ensureString = (value: any) => {
  if (typeof value === "string") {
    return value;
  } else {
    return JSON.stringify(value, null, 2);
  }
};

You might ask: why is that needed? (a) can’t we just “stringify” everything, or (b) write all the mappings as JS objects? Unfortunately, there are a lot of opaque errors possible here in AppSync. AppSync’s consumed Velocity Templating Language provides a number of utilities, like $util, which can output either strings or objects. Since $util isn’t defined in our application, our transpiler can’t evaluate it and return values from it. If we need the output of a function to be inserted into our rendered template without wrapping quotes, we have to write the template as a string, which won’t be evaluated on our end.

The list of rooms is the only data that we need to persist within our application. With our simple voice chat, we don’t store audio clips; we just “reflect” them from the sender to all the recipients who are actively listening (“subscribed”) to them.

So, when we “create” a new AudioSegment, it doesn’t need a data source:

const audioSegmentDataSource = new appsync.CfnDataSource(
  this,
  "AudioSegmentDataSource",
  {
    apiId: appsyncApi.attrApiId,
    name: "AudioSegmentDataSource",
    type: "NONE"
  }
);

audioSegmentDataSource.addDependsOn(appsyncSchema);

However, those queries still need transformation: requests (submitted voice clips) come in and must be transformed into responses to listeners. The resolver looks like:

new appsync.CfnResolver(this, "ResolverCreateAudioSegment", {
  apiId: appsyncApi.attrApiId,
  dataSourceName: audioSegmentDataSource.name,
  fieldName: "createAudioSegment",
  requestMappingTemplate: ensureString(createAudioSegmentRequest),
  responseMappingTemplate: ensureString(createAudioSegmentResponse),
  typeName: "Mutation"
}).addDependsOn(audioSegmentDataSource);

With this mutation, the resolver mappings get more interesting:

export const request = `{
  "version": "2017-02-28",
  "payload": {
  	"roomId": "$context.arguments.roomId",
    "data": "$context.arguments.data",
    "timestamp": "$util.time.nowEpochMilliSeconds().toString()",
    "userId": "$ctx.identity.sub",
  }
}`;

export const response = "${util.toJson($ctx.result)}";

Note that some of these utility functions get their value from the environment rather than a data source or the input. The timestamp is automatically generated, and the userId is pulled from the Cognito authentication token.

Note the request is now a multiline string rather than a JS object. That’s not required here, but in some cases it is, because of what the template helpers output. Consider this version, which is what we would use if we were using a data source:

// Demonstration only, don't use this in this workshop!

export const request = {
  "version": "2017-02-28",
  "operation": "PutItem",
  "key": {
    "roomId": $util.dynamodb.toDynamoDBJson($context.arguments.roomId),
    "timestamp": $util.dynamodb.toDynamoDBJson($util.time.nowEpochMilliSeconds().toString())
  },
  "attributeValues": {
    "data": $util.dynamodb.toDynamoDBJson($context.arguments.data),
    "userId": $util.dynamodb.toDynamoDBJson($ctx.identity.sub)}
  }
};

See how $util.dynamodb.toDynamoDBJson(...) has wrapping quotes? It renders within AppSync to a JS object, and so when DynamoDB receives the operation, it will try and fail to parse a string like "{ \"S\": \"value\"}" where it instead expected an object. You would see an error like “Unable to Parse” returned from the AppSync mutation. The difference becomes more apparent when writing these resolvers within the AWS Console.

Great, we’re almost done! Let’s talk about the last part here: the subscription. This is how listeners will be notified whenever someone adds a new audio clip. AppSync makes this simple, but there are sharp edges to look out for. No data source or resolver mapping is necessary, because the subscription will directly depend on one or more mutations. See again how we defined it in the schema.graphql:

type Subscription {
  onCreateAudioSegment(roomId: ID!): AudioSegment
    @aws_subscribe(mutations: ["createAudioSegment"])
}

The @aws_subscribe directive will link the subscription to the mutation. Any time that a mutation is received, its return query value is compared against the variables of the subscription. If all the variables match, the subscription fires an event to all of its subscribers.

The sharp edge here is that the subscription variables are not compared to the input values but to the output values. This means that one client can control which notifications a different client will receive.

For example, let’s say you subscribe to onCreateAudioSegment(roomId: 'abc123'). A different user, Bob, sends a mutation.

mutation {
  createAudioSegment(roomId: "abc123", data: "some-data-here") {
    timestamp
  }
}

He didn’t request the roomId back, maybe because he doesn’t need it in his application. You won’t receive a subscription event, since the response to Bob did not include the roomId field. However, if Alice sent a mutation like:

mutation {
  createAudioSegment(roomId: "abc123", data: "some-data-here") {
    roomId
    timestamp
  }
}

…you would receive the event. Weird, right? Does AWS know about this? You bet. It’s an undocumented feature.

Great! We’re done with the template. Deploy it with cdk deploy, and copy the userPoolClientId value to your clipboard.

Querying AppSync

This isn’t required, but now that the backend is set up, let’s take a moment to understand how it’s working.

First, we’ll need a Cognito user account. Open the AWS Console, navigate to Cognito, and create a new user.

Then, navigate to AppSync and find where you can run a query. Use the button there to log in with Cognito, pasting in the UserPoolClientID from the cdk deploy output (or from the Cognito console). Send a simple request for all the roomId’s available:

query {
  rooms {
    id
  }
}

You should see the list of the sample data that you added earlier. Then, open the same view in a new tab, so you have two queries open. Enter the subscription in one:

subscription {
  onCreateAudioSegment(roomId: "one-of-your-room-ids") {
    roomId
    data
    timestamp
    userId
  }
}

…and press “Play.” You should see that you’re now subscribed, and will see responses load as mutations come in. Then, in the second window, enter and run a mutation:

mutation {
  createAudioSegment(roomId: "the-same-room-id-as-the-subscription", data: "Testing data") {
    roomId
    timestamp
    userId
    data
  }
}

You should see that data pop up in the first window with the subscription! You can continue to send mutations with different values for data and see them appear in the subscription.

Successful Subscription

Here, data can be any string value, but once we build the client, we’ll be sending encoded audio data instead.

If you made it all the way here, congratulations! You have a fully functional, real-time-data-ready backend with authentication. In Part 2, we’ll build a web application client that can put it to work.

Disclaimer: opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

Published Aug 2, 2019

Are you sure this wasn't written by AI?