Build a Serverless Walkie Talkie App in One Sitting: Web Client (Part 2)

Great! So, after completing Part 1, you have a service which can store and retrieve a list of names, host an unbounded number of GraphQL subscriptions, and reflect incoming messages out to all subscribed clients.

Next, we’ll make the web application client for this service, using React, ApolloClient, the AppSync SDK, and AWS Amplify. This will be the “walkie talkie” you can actually use!

Demo Chat Room

This screenshot is a little misleading since mobile browser permissions prevent it from working on an actual mobile device, but hey, it looks good!

Within your base project directory, run npx create-react-app client --typescript to create a basically functional web application. Then, within that directory, immediately run npm run eject, because we’ll need to make changes that the pre-packaged create-react-app output doesn’t allow.

Adding AWS Amplify

AWS Amplify is a suite of utilities to help integrate your client (web/native) with AWS services. It’s tailored to greenfield projects, and “wants” to create the entire stack from the ground up. In this case (and many other cases), we have a pre-existing service to connect to, so we won’t be able to follow the flow they describe in documentation, like amplify add auth. We’ll have to manually configure that ourselves.

I wish I could just link to a solid tutorial that already exists and move on, but even the official documentation has significant omissions and mistakes.

Add the CLI tool and the first set of dependencies to your application:

npm i -g @aws-amplify/cli
npm i aws-amplify aws-amplify-react

Then initialize the application within the client directory with amplify init. The project name can be whatever you’d like, but the environment should be master - environments line up 1-1 with git branches. Select the defaults for the rest of the steps.

Let’s assume a project structure that looks like that in the final repo:

client/
  src/
    components/
    config/
    scenes/
    types/
    utilities/
    App.tsx
  ...

Within config, add amplify.ts, and swap in the values from the stack you created with CDK in Part 1:

export default {
  Auth: {
    region: "us-east-1",
    userPoolId: "us-east-1_xxxxxxx",
    userPoolWebClientId: "70jja1s66qa4u812b0xxxxxxxx",
    mandatorySignIn: true // This should stay true
  }
};

There are ways to automatically generate these values, but for now we’ll maintain this config manually.

Then configure Amplify in App.tsx, and wrap the exported component in withAuthenticator to get the pre-packaged authentication views:

import Amplify from "aws-amplify";
import { withAuthenticator } from "aws-amplify-react";
import amplifyConfig from "config/amplify";

Amplify.configure(amplifyConfig);

const App: React.FC = () => {
  ...
}

export default withAuthenticator(App);

Great! Now try and run it with npm run start…and discover that (at the time of writing) there are no types exported from aws-amplify-react. Amplify itself is typed, but that library is not. Without a @types/aws-amplify-react package available or volunteering to do that work for AWS, the best alternative is to add a declaration in a new file, types/aws-amplify-react.d.ts (or whatever you want) and insert a single line:

declare module "aws-amplify-react";

Now you’ll get a different TS error, and this is why we ejected from create-react-app. In your tsconfig.json, change "isolatedModules" to false, and restart npm run start. Great! On viewing your application, you should get redirected to a login utility, which should work with the user account from Part 1. If it doesn’t, add a comment and I will help to debug. It’s not just you - most time spent working with Cognito is spent wrangling opaque errors, but at least we’re not building it from scratch.

Once that’s working, we’re good to go! Drop-in authentication is in place and ready to talk to AppSync.

Adding AppSync integration

First, let’s install the dependencies: npm i apollo-client react-apollo graphql graphql-tag aws-appsync aws-appsync-react

If you’ve worked with ApolloClient before, you’ve probably configured your client’s caching, links, headers, etc in a client.ts file or similar. In this case, we won’t be using ApolloClient’s own client, but rather the AppSync SDK client, which (aside from a type mismatch) is effectively a drop-in replacement.

Configure that client in a new client/appsyncClient.ts file, and swap in the actual values from your CDK stack:

import AWSAppSyncClient from "aws-appsync";
import { Auth } from "aws-amplify";

const client = new AWSAppSyncClient({
  url:
    "https://<your identifier here.appsync-api.us-east-1.amazonaws.com/graphql",
  region: "us-east-1",
  auth: {
    type: "AMAZON_COGNITO_USER_POOLS",
    jwtToken: async () => {
      const session = await Auth.currentSession();
      const token = session.getIdToken().getJwtToken();
      return token;
    }
  },
  // This is necessary to prevent the QuotaExceededError DOMException
  //   triggered in part by writing text-encoded audio to the redux store.
  // We don't use offline functionality anyway.
  disableOffline: true
});

export default client;

Then import that client into your App.tsx and wrap your application in the standard ApolloProvider. You’ll also need the Rehydrated wrapper, per the official documentation.

import { ApolloProvider } from "react-apollo";
import { Rehydrated } from "aws-appsync-react";

const App: React.FC = () => {
  ...

  return (
    // @ts-ignore - it doesn't like the appsync client being passed to apollo
    <ApolloProvider client={client}>
      <Rehydrated>

      ...

Why do we need this, though? Why can’t we just use the standard ApolloClient components and flow, since it’s just normal GraphQL? Well - for queries and mutations, we could. However, subscriptions add the extra layer of complexity in needing to maintain a connection, and Apollo and AppSync have different mechanisms for doing that. Here, AppSync SDK performs that heavy lifting for us, so we can focus on making our sweet new app.

Receiving and Playing Audio

Since this isn’t a React tutorial, I will gloss over the general layout of the application, which you can find here. Rather, I’ll focus on two specific components. Within a ChatRoom view, I used a simple Subscription component to receive new subscription events and play their audio data. Two points about that:

It really isn’t necessary to use the component here, since the logic is imperative, and no state is maintained. Receive audio -> play audio -> done. We could easily have written a client.subscribe function outside of the entire component tree instead, but doing it this way allows it to affect the rendered component’s state, if we decide we want that in the future. For example, we could add a switch element to enable and disable the subscription.
Many times, you’d like the Subscription to actually be updating the results of a Query in real time to prevent polling, and so would probably opt for a Query component with a subscribeToMore prop. However, in this case, we don’t maintain any persistent state - once an AudioSegment is played, it’s discarded (just like a real walkie talkie. We’re not in a courtroom here!). So, a plain Subscription is a good fit.

Let’s take a look at that component (src/components/ChatRoom.tsx):

// I manually defined the types (ie AudioSegment) in `types/index.ts`
//   That was easy for this small application, but you can always use a codegen instead
interface Data {
  onCreateAudioSegment: AudioSegment;
}

<Subscription<Data, {}>
  subscription={ON_CREATE_AUDIO_SEGMENT}
  onSubscriptionData={({ subscriptionData: { data } }) =>
    data && playSoundData(data.onCreateAudioSegment.data)
  }
  variables={{ roomId }}
>
...display components

The src/utilities/playSoundData.ts utility is simple:

const playSoundData = (dataUrl: string) => {
  new Audio(dataUrl).play().catch(err => {
    console.error(
      `[PlaySoundData] Error playing sound:`,
      err,
      err.message,
      dataUrl
    );
  });
};

export default playSoundData;

There is a catch: for that Audio object to be able to play, the user has to interact with the page “enough,” which is browser-dependent. For Chrome on desktop, clicking on an element is “enough.” Read more about Chrome in particular here. Note that if a sound fails to play, you’ll see that error in the web console, and can diagnose the cause from that.

The bar is even higher on mobile devices, especially iOS, which is the driver for the “important” note at the beginning of Part 1. However, assuming that you’re running this on desktop, it will work great for our purposes.

How does it work? It simply receives a data url of any playable type, which in our case will be Base64-encoded, and plays it.

You can do a sanity check at this point by using the AWS Console to make mutations with sample data. You could download the short mp3 from here, encode it into a dataUrl here, then run the mutation like we did at the end of Part 1. You should hear the sound play on the client! If not, stop and diagnose, and add a comment if I can help!

Recording and Sending Audio

This part is more complex than playing audio. There are multiple ways to record and process audio in the browser - some deprecated, and some not yet fully supported. We’ll use the MediaRecorder API with a polyfill because of its ease of use.

First, we’ll work out how to record audio in segments, and then we’ll encode that data and send it off to AppSync to send to other users. There are two good intros into these APIs by Twilio and Google.

I chose to encapsulate all of the recording logic within the RecordButton itself, but it could certainly be broken out to use utility functions instead:

import React, { useState } from "react";
import { Button, Icon } from "semantic-ui-react";
import getUserMedia from "utilities/getUserMedia";
import binaryToBase64 from "utilities/binaryToBase64";

interface Props {
  onRecordAudio: (base64: string) => void;
}

const RecordButton: React.FC<Props> = ({ onRecordAudio }) => {
  const [recording, setRecording] = useState(false);
  const [mediaRecorderObject, setMediaRecorderObject] = useState();

  const startRecording = () => {
    setRecording(true);

    if (getUserMedia) {
      console.log("[RecordButton] getUserMedia supported");
      getUserMedia(
        { audio: true },
        stream => {
          const mediaRecorder = new MediaRecorder(stream, options);
          mediaRecorder.addEventListener("dataavailable", e => {
            // @ts-ignore (data property is unknown)
            binaryToBase64(e.data).then(data => onRecordAudio(data));
          });
          mediaRecorder.addEventListener("error", error =>
            console.error(`[MediaRecorder] Error`, error)
          );

          // Without this, the browser continues to believe that the application is "listening",
          //   and displays a warning banner to the user
          mediaRecorder.addEventListener("stop", e => {
            stream.getTracks().forEach((track: any) => track.stop());
          });

          setMediaRecorderObject(mediaRecorder);
          mediaRecorder.start(250); // Slice into chunks for processing
        },
        err => {
          console.error(`[RecordButton] Error getting audio device`, err);

          // For debugging only
          alert(
            `[RecordButton] Error getting audio device: ${JSON.stringify(err)}`
          );
        }
      );
    } else {
      console.error(`[RecordButton] getUserMedia not supported!`);
    }
  };

  const endRecording = () => {
    setRecording(false);
    try {
      mediaRecorderObject.stop();
    } catch (_) {}
  };

  return (
    <Button
      fluid
      size="huge"
      circular
      color={recording ? "red" : undefined}
      icon
      onClick={recording ? endRecording : startRecording}
    >
      <Icon
        name={recording ? "circle notched" : "microphone"}
        loading={recording}
      />
      &nbsp;{recording ? "Transmitting..." : "Push to Talk"}
    </Button>
  );
};

export default RecordButton;

Whoa! That’s a lot. Let’s break it down:

On the most basic level, the button renders differently based on whether or not we’re recording. Originally, I had made it a true “push to talk”, where it recorded only while the button was being held, using onTouch[Cancel|End|Start] and onMouse[Down|Up|Leave]. Unfortunately, that was very glitchy, especially in mobile browsers. Now, it’s a toggle switch, which is much more reliable.
When recording starts, we first check for getUserMedia, which I’ve added a polyfill for, and we’ll cover in a second. There are many situations where that’s not available, and in that case we won’t be able to record.
We call getUserMedia with the constraint { audio: true } to ask the browser for only audio and for any audio input available. When we do this the first time, it will prompt the user for permission. If the user denies permission, the third argument in getUserMedia, the error handler, will be called.
We receive the MediaStream stream and send it to the MediaRecorder, which abstracts away much of the processing for us. We add three listeners to it:
- error is straightforward
- stop tells the browser to turn off all monitoring streams associated with it, so the browser knows we aren’t listening any more. If we don’t do this, a big red banner pops up across the top of mobile devices, even with the browser window closed, telling the user we’re listening. That’s bad UX, and also it consumes memory to maintain those streams.
- The most important hook is dataavailable, which will be called periodically during recording, and also at the end of the recording. Each will make a Blob of recorded audio data available, which we will process and send off to the backend, thus streaming recording in real time.
We save the MediaRecorder to component state using a hook, since we’ll need to stop() it once the recording is ended
We start the MediaRecorder and, using its timeslice parameter, set it to call ondataavailable in intervals of 250 milliseconds. This interval is important for two reasons:
- It sets the “lag” between recording and playback, which, when shorter, will make for a better UX
- It drives the data size of each segment sent to AppSync
When recording ends, we record that to component state and then attempt to stop the MediaRecorder instance. This event may be fired multiple times due to browser behavior, and subsequent calls to stop() will result in errors, so we catch and discard those.

The data blob is encoded to a data url simply and easily using a FileReader:

const binaryToBase64 = (data: Blob) =>
  new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onloadend = () => {
      resolve(reader.result);
    };
    reader.readAsDataURL(data);
  });

export default binaryToBase64;

When that resolves, it’s sent by RecordButton’s onRecordAudio() to its parent component, ChatRoom where a Mutation sends it to the backend:

const ChatRoom: React.FC<Props> = ({ roomId }) => (
  <Mutation<MutationData, any> mutation={CREATE_AUDIO_SEGMENT}>
    {mutate => {
      const handleReceiveData = (data: string) => {
        mutate({ variables: { roomId, data } });
      };
    ...

  return (
    ...

    <RecordButton onRecordAudio={handleReceiveData} />
    ...
  )
...

Finally, let’s add that polyfill for getUserMedia. navigator.mediaDevices.getUserMedia is the modern implementation and returns a Promise, but it isn’t universal yet, so this polyfill falls back on the deprecated flavors of navigator.getUserMedia:

type MediaCallback = (stream: any) => void;

let getUserMediaFn;
let getUserMedia:
  | ((
      constraints: any,
      onSuccess: MediaCallback,
      onError: MediaCallback
    ) => Promise<any> | void)
  | undefined;

if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
  getUserMedia = (constraints, onSuccess, onError) =>
    navigator.mediaDevices.getUserMedia
      .bind(navigator.mediaDevices)(constraints)
      .then(onSuccess)
      .catch(onError);
} else {
  getUserMediaFn =
    navigator.getUserMedia ||
    // @ts-ignore
    navigator.webkitGetUserMedia ||
    // @ts-ignore
    navigator.mozGetUserMedia ||
    // @ts-ignore
    navigator.msGetUserMedia;
  getUserMedia = getUserMediaFn ? getUserMediaFn.bind(navigator) : undefined;
}

export default getUserMedia;

The typings in that absolutely stand to be improved.

You’ll also have to add the basic components for user flow. In my application, I use the excellent semantic-ui-react library to provide basic styling, but you can use whatever floats your boat. I used a basic RoomList component to display all rooms from a Query to AppSync, react-router to parse a roomId out of the active path, and the ChatRoom above to manage the chat activity within a room.

Before you try this out, you’ll have to make sure your webpack-dev-server is running on HTTPS, since browsers may not allow you to record audio on an insecure page, but will quietly fail and leave you scratching your head. It’s easy to do from this react template:

HTTPS=true npm run start

So, now, try it out! Navigate to https://localhost:8000, pick a room, and try to send a voice clip. You’ll likely receive a couple of errors:

Right now, this is only compatible with browsers that implement MediaRecorder. So, on Safari or Edge, it might fail entirely.
Only the first audio segment you send will work. Why? The standard MediaRecorder is only capable of recording in audio/webm mimetype, which has a header. That header is only included within your first chunk of any recording. You’ll probably see an error like "failed to load because no supported source was found" - the data urls for segments after the first one are corrupted!

We’re about to fix both of those problems with one line of code.

Andrey Sitnik created an excellent drop-in polyfill for MediaRecorder, audio-recorder-polyfill, which is trivial to add (although the README.md itself can be confusing). Thanks, Andrey! It takes one line added in your index.tsx file, which wraps and overwrites the window’s MediaRecorder class. Make sure to install it first with npm i audio-recorder-polyfill.

window.MediaRecorder = require("audio-recorder-polyfill");

Done. Now when you use your app you’ll notice two things:

The data urls, if you’re inspecting them, are now encoded as audio/wav, which is both good and bad:
- It’s good in that no header is necessary, and so the “chunks” processed by MediaRecorder can be played back independently.
- It’s also good in that they’re lossless.
- It’s bad in that they’re significantly larger in size than audio/webm was, which means higher costs and latency.
Now this application works with Safari!

There’s a caveat here; we’re not quite ready to go to market against Zello. Mobile browser permissions are so restrictive, especially on iOS, that we’re essentially dead in the water there, making this functionally a desktop-only web application. One problem is that audio cannot play without direct user input, which doesn’t line up with a walkie-talkie application, where user input (clicks) happen independently from audio arriving. There also is not always support for the API’s we’re calling. So, you can add a guardrail in your application to prevent users from being confused:

...
import getUserMedia from "utilities/getUserMedia";
...

const App: React.FC = () => {
  if (!getUserMedia || typeof MediaRecorder === "undefined")
    return (
      <Segment placeholder>
        <Header icon>
          <Icon name="thumbs down outline" />
          Your browser or device doesn't support this app. Sad.
        </Header>
      </Segment>
    );

    ...

Note that in this case I’m using style elements from semantic-ui-react, which you can import like I did, use standard elements, or use your own library of choice.

Deployment

Let’s deploy this out to the real world! Amplify makes this too easy. Once it’s building like you want with npm run build, run amplify add hosting and follow the prompts. Make sure to include CloudFront, since you’ll need HTTPS for audio recording. Once that’s configured, run amplify publish to build and deploy your app to S3 and CloudFront! That will take 15 minutes or more to deploy, but when it’s complete, it will display the CloudFront URL.

So that’s it! Hopefully you now have a ~~fully~~ mostly functioning live voice chat/walkie talkie application. It’s rapidly scalable, stable, and secured, and all managed for you.

Cost Analysis

How much does it cost? Let’s run the numbers from the official pricing guide:

It would take 285 users connected 24/7 to cost you $1/month for the subscription connections
Each quarter-second clip is 25kB or so. At 0.09$ per GB, would take 10,000 seconds to cost you $1 in data transfer charges.
So, if those 285 users were connected constantly, sending data both in and out, every 35 seconds of talk would cost you $1 in data transfer. That could get expensive quickly. Some ways you could bring that cost down:
- Better encoding/compression (ie mpeg rather than wav)
- Limiting chat room size
- Recognizing and disconnecting idle users
Four 250ms clips per second, with 284 recipients, means ~1000 subscription updates per second. At $2 per million subscription updates, that means $1 every 500 seconds of talking. You could improve that by lengthening MediaRecorder’s timeslice to batch audio segments into fewer, larger calls.

The Road Ahead

How could we improve this application? Fork the repo, experiment with some of these, and show off with a link in the comments!

Basic Functionality: filter received audio segments by userId, and don’t play back audio from yourself!
Mobile Web Browser Compatibility: Queue up received audio segments, and play them within rendered <audio /> elements. On first render, present a button which plays a short, inaudible sound through each one, thus “authorizing” it within the browser to play more audio in the future. Then, pull items off the audio segment queue in order, set the someAudioElement.src value to your dataUrl, and play away! If you have separate queues for each userId, and introduce a brief delay before playing the first clip in each, you can also reduce jitter in the audio in between clips, and prevent overlapping clips as well.
Mobile Application: Create a native or react-native application to get around the limitations of mobile browsers
Apollo and AppSync: Add createRoom and destroyRoom mutations to be able to create new chat rooms from the web application
Authorization: Add a Cognito identity pool for app admins only and limit certain mutations (like createRoom or deleteRoom, maybe) to them using a resolver mapping.
UI: Instead of just a simple button, make an SVG image of a radio, and hook up the RecordButton to that graphic.

I hope you enjoyed this workshop and better yet, learned something from it. Please comment, star, and send PR’s to the demo repo on GitHub. Happy coding!

Disclaimer: opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

Published Aug 2, 2019

Are you sure this wasn't written by AI?