Great! So, after completing Part 1, you have a service which can store and retrieve a list of names, host an unbounded number of GraphQL subscriptions, and reflect incoming messages out to all subscribed clients.
Next, we’ll make the web application client for this service, using React, ApolloClient, the AppSync SDK, and AWS Amplify. This will be the “walkie talkie” you can actually use!
This screenshot is a little misleading since mobile browser permissions prevent it from working on an actual mobile device, but hey, it looks good!
Within your base project directory, run npx create-react-app client --typescript
to create a basically functional web application. Then, within that directory, immediately run npm run eject
, because we’ll need to make changes that the pre-packaged create-react-app
output doesn’t allow.
AWS Amplify is a suite of utilities to help integrate your client (web/native) with AWS services. It’s tailored to greenfield projects, and “wants” to create the entire stack from the ground up. In this case (and many other cases), we have a pre-existing service to connect to, so we won’t be able to follow the flow they describe in documentation, like amplify add auth
. We’ll have to manually configure that ourselves.
I wish I could just link to a solid tutorial that already exists and move on, but even the official documentation has significant omissions and mistakes.
Add the CLI tool and the first set of dependencies to your application:
npm i -g @aws-amplify/cli
npm i aws-amplify aws-amplify-react
Then initialize the application within the client
directory with amplify init
. The project name can be whatever you’d like, but the environment should be master
- environments line up 1-1 with git branches. Select the defaults for the rest of the steps.
Let’s assume a project structure that looks like that in the final repo:
client/
src/
components/
config/
scenes/
types/
utilities/
App.tsx
...
Within config, add amplify.ts
, and swap in the values from the stack you created with CDK in Part 1:
export default {
Auth: {
region: "us-east-1",
userPoolId: "us-east-1_xxxxxxx",
userPoolWebClientId: "70jja1s66qa4u812b0xxxxxxxx",
mandatorySignIn: true // This should stay true
}
};
There are ways to automatically generate these values, but for now we’ll maintain this config manually.
Then configure Amplify in App.tsx
, and wrap the exported component in withAuthenticator
to get the pre-packaged authentication views:
import Amplify from "aws-amplify";
import { withAuthenticator } from "aws-amplify-react";
import amplifyConfig from "config/amplify";
Amplify.configure(amplifyConfig);
const App: React.FC = () => {
...
}
export default withAuthenticator(App);
Great! Now try and run it with npm run start
…and discover that (at the time of writing) there are no types exported from aws-amplify-react
. Amplify itself is typed, but that library is not. Without a @types/aws-amplify-react
package available or volunteering to do that work for AWS, the best alternative is to add a declaration in a new file, types/aws-amplify-react.d.ts
(or whatever you want) and insert a single line:
declare module "aws-amplify-react";
Now you’ll get a different TS error, and this is why we ejected from create-react-app
. In your tsconfig.json
, change "isolatedModules"
to false
, and restart npm run start
. Great! On viewing your application, you should get redirected to a login utility, which should work with the user account from Part 1. If it doesn’t, add a comment and I will help to debug. It’s not just you - most time spent working with Cognito is spent wrangling opaque errors, but at least we’re not building it from scratch.
Once that’s working, we’re good to go! Drop-in authentication is in place and ready to talk to AppSync.
First, let’s install the dependencies: npm i apollo-client react-apollo graphql graphql-tag aws-appsync aws-appsync-react
If you’ve worked with ApolloClient before, you’ve probably configured your client’s caching, links, headers, etc in a client.ts
file or similar. In this case, we won’t be using ApolloClient’s own client, but rather the AppSync SDK client, which (aside from a type mismatch) is effectively a drop-in replacement.
Configure that client in a new client/appsyncClient.ts
file, and swap in the actual values from your CDK stack:
import AWSAppSyncClient from "aws-appsync";
import { Auth } from "aws-amplify";
const client = new AWSAppSyncClient({
url:
"https://<your identifier here.appsync-api.us-east-1.amazonaws.com/graphql",
region: "us-east-1",
auth: {
type: "AMAZON_COGNITO_USER_POOLS",
jwtToken: async () => {
const session = await Auth.currentSession();
const token = session.getIdToken().getJwtToken();
return token;
}
},
// This is necessary to prevent the QuotaExceededError DOMException
// triggered in part by writing text-encoded audio to the redux store.
// We don't use offline functionality anyway.
disableOffline: true
});
export default client;
Then import that client into your App.tsx
and wrap your application in the standard ApolloProvider
. You’ll also need the Rehydrated
wrapper, per the official documentation.
import { ApolloProvider } from "react-apollo";
import { Rehydrated } from "aws-appsync-react";
const App: React.FC = () => {
...
return (
// @ts-ignore - it doesn't like the appsync client being passed to apollo
<ApolloProvider client={client}>
<Rehydrated>
...
Why do we need this, though? Why can’t we just use the standard ApolloClient components and flow, since it’s just normal GraphQL? Well - for queries and mutations, we could. However, subscriptions add the extra layer of complexity in needing to maintain a connection, and Apollo and AppSync have different mechanisms for doing that. Here, AppSync SDK performs that heavy lifting for us, so we can focus on making our sweet new app.
Since this isn’t a React tutorial, I will gloss over the general layout of the application, which you can find here. Rather, I’ll focus on two specific components. Within a ChatRoom
view, I used a simple Subscription
component to receive new subscription events and play their audio data. Two points about that:
Receive audio -> play audio -> done.
We could easily have written a client.subscribe
function outside of the entire component tree instead, but doing it this way allows it to affect the rendered component’s state, if we decide we want that in the future. For example, we could add a switch element to enable and disable the subscription.Subscription
to actually be updating the results of a Query
in real time to prevent polling, and so would probably opt for a Query
component with a subscribeToMore
prop. However, in this case, we don’t maintain any persistent state - once an AudioSegment is played, it’s discarded (just like a real walkie talkie. We’re not in a courtroom here!). So, a plain Subscription
is a good fit.Let’s take a look at that component (src/components/ChatRoom.tsx
):
// I manually defined the types (ie AudioSegment) in `types/index.ts`
// That was easy for this small application, but you can always use a codegen instead
interface Data {
onCreateAudioSegment: AudioSegment;
}
<Subscription<Data, {}>
subscription={ON_CREATE_AUDIO_SEGMENT}
onSubscriptionData={({ subscriptionData: { data } }) =>
data && playSoundData(data.onCreateAudioSegment.data)
}
variables={{ roomId }}
>
...display components
The src/utilities/playSoundData.ts
utility is simple:
const playSoundData = (dataUrl: string) => {
new Audio(dataUrl).play().catch(err => {
console.error(
`[PlaySoundData] Error playing sound:`,
err,
err.message,
dataUrl
);
});
};
export default playSoundData;
There is a catch: for that Audio
object to be able to play
, the user has to interact with the page “enough,” which is browser-dependent. For Chrome on desktop, clicking on an element is “enough.” Read more about Chrome in particular here. Note that if a sound fails to play, you’ll see that error in the web console, and can diagnose the cause from that.
The bar is even higher on mobile devices, especially iOS, which is the driver for the “important” note at the beginning of Part 1. However, assuming that you’re running this on desktop, it will work great for our purposes.
How does it work? It simply receives a data url of any playable type, which in our case will be Base64-encoded, and plays it.
You can do a sanity check at this point by using the AWS Console to make mutations with sample data. You could download the short mp3 from here, encode it into a dataUrl here, then run the mutation like we did at the end of Part 1. You should hear the sound play on the client! If not, stop and diagnose, and add a comment if I can help!
This part is more complex than playing audio. There are multiple ways to record and process audio in the browser - some deprecated, and some not yet fully supported. We’ll use the MediaRecorder
API with a polyfill because of its ease of use.
First, we’ll work out how to record audio in segments, and then we’ll encode that data and send it off to AppSync to send to other users. There are two good intros into these APIs by Twilio and Google.
I chose to encapsulate all of the recording logic within the RecordButton
itself, but it could certainly be broken out to use utility functions instead:
import React, { useState } from "react";
import { Button, Icon } from "semantic-ui-react";
import getUserMedia from "utilities/getUserMedia";
import binaryToBase64 from "utilities/binaryToBase64";
interface Props {
onRecordAudio: (base64: string) => void;
}
const RecordButton: React.FC<Props> = ({ onRecordAudio }) => {
const [recording, setRecording] = useState(false);
const [mediaRecorderObject, setMediaRecorderObject] = useState();
const startRecording = () => {
setRecording(true);
if (getUserMedia) {
console.log("[RecordButton] getUserMedia supported");
getUserMedia(
{ audio: true },
stream => {
const mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.addEventListener("dataavailable", e => {
// @ts-ignore (data property is unknown)
binaryToBase64(e.data).then(data => onRecordAudio(data));
});
mediaRecorder.addEventListener("error", error =>
console.error(`[MediaRecorder] Error`, error)
);
// Without this, the browser continues to believe that the application is "listening",
// and displays a warning banner to the user
mediaRecorder.addEventListener("stop", e => {
stream.getTracks().forEach((track: any) => track.stop());
});
setMediaRecorderObject(mediaRecorder);
mediaRecorder.start(250); // Slice into chunks for processing
},
err => {
console.error(`[RecordButton] Error getting audio device`, err);
// For debugging only
alert(
`[RecordButton] Error getting audio device: ${JSON.stringify(err)}`
);
}
);
} else {
console.error(`[RecordButton] getUserMedia not supported!`);
}
};
const endRecording = () => {
setRecording(false);
try {
mediaRecorderObject.stop();
} catch (_) {}
};
return (
<Button
fluid
size="huge"
circular
color={recording ? "red" : undefined}
icon
onClick={recording ? endRecording : startRecording}
>
<Icon
name={recording ? "circle notched" : "microphone"}
loading={recording}
/>
{recording ? "Transmitting..." : "Push to Talk"}
</Button>
);
};
export default RecordButton;
Whoa! That’s a lot. Let’s break it down:
onTouch[Cancel|End|Start]
and onMouse[Down|Up|Leave]
. Unfortunately, that was very glitchy, especially in mobile browsers. Now, it’s a toggle switch, which is much more reliable.getUserMedia
, which I’ve added a polyfill for, and we’ll cover in a second. There are many situations where that’s not available, and in that case we won’t be able to record.getUserMedia
with the constraint { audio: true }
to ask the browser for only audio and for any audio input available. When we do this the first time, it will prompt the user for permission. If the user denies permission, the third argument in getUserMedia
, the error handler, will be called.stream
and send it to the MediaRecorder
, which abstracts away much of the processing for us. We add three listeners to it:
error
is straightforwardstop
tells the browser to turn off all monitoring streams associated with it, so the browser knows we aren’t listening any more. If we don’t do this, a big red banner pops up across the top of mobile devices, even with the browser window closed, telling the user we’re listening. That’s bad UX, and also it consumes memory to maintain those streams.dataavailable
, which will be called periodically during recording, and also at the end of the recording. Each will make a Blob
of recorded audio data available, which we will process and send off to the backend, thus streaming recording in real time.MediaRecorder
to component state using a hook, since we’ll need to stop()
it once the recording is endedMediaRecorder
and, using its timeslice
parameter, set it to call ondataavailable
in intervals of 250 milliseconds. This interval is important for two reasons:
MediaRecorder
instance. This event may be fired multiple times due to browser behavior, and subsequent calls to stop()
will result in errors, so we catch and discard those.The data blob is encoded to a data url simply and easily using a FileReader:
const binaryToBase64 = (data: Blob) =>
new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onloadend = () => {
resolve(reader.result);
};
reader.readAsDataURL(data);
});
export default binaryToBase64;
When that resolves, it’s sent by RecordButton
’s onRecordAudio()
to its parent component, ChatRoom
where a Mutation sends it to the backend:
const ChatRoom: React.FC<Props> = ({ roomId }) => (
<Mutation<MutationData, any> mutation={CREATE_AUDIO_SEGMENT}>
{mutate => {
const handleReceiveData = (data: string) => {
mutate({ variables: { roomId, data } });
};
...
return (
...
<RecordButton onRecordAudio={handleReceiveData} />
...
)
...
Finally, let’s add that polyfill for getUserMedia
. navigator.mediaDevices.getUserMedia
is the modern implementation and returns a Promise, but it isn’t universal yet, so this polyfill falls back on the deprecated flavors of navigator.getUserMedia
:
type MediaCallback = (stream: any) => void;
let getUserMediaFn;
let getUserMedia:
| ((
constraints: any,
onSuccess: MediaCallback,
onError: MediaCallback
) => Promise<any> | void)
| undefined;
if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
getUserMedia = (constraints, onSuccess, onError) =>
navigator.mediaDevices.getUserMedia
.bind(navigator.mediaDevices)(constraints)
.then(onSuccess)
.catch(onError);
} else {
getUserMediaFn =
navigator.getUserMedia ||
// @ts-ignore
navigator.webkitGetUserMedia ||
// @ts-ignore
navigator.mozGetUserMedia ||
// @ts-ignore
navigator.msGetUserMedia;
getUserMedia = getUserMediaFn ? getUserMediaFn.bind(navigator) : undefined;
}
export default getUserMedia;
The typings in that absolutely stand to be improved.
You’ll also have to add the basic components for user flow. In my application, I use the excellent semantic-ui-react
library to provide basic styling, but you can use whatever floats your boat. I used a basic RoomList
component to display all rooms from a Query
to AppSync, react-router
to parse a roomId
out of the active path, and the ChatRoom
above to manage the chat activity within a room.
Before you try this out, you’ll have to make sure your webpack-dev-server
is running on HTTPS, since browsers may not allow you to record audio on an insecure page, but will quietly fail and leave you scratching your head. It’s easy to do from this react template:
HTTPS=true npm run start
So, now, try it out! Navigate to https://localhost:8000, pick a room, and try to send a voice clip. You’ll likely receive a couple of errors:
audio/webm
mimetype, which has a header. That header is only included within your first chunk of any recording. You’ll probably see an error like "failed to load because no supported source was found"
- the data urls for segments after the first one are corrupted!We’re about to fix both of those problems with one line of code.
Andrey Sitnik created an excellent drop-in polyfill for MediaRecorder, audio-recorder-polyfill, which is trivial to add (although the README.md itself can be confusing). Thanks, Andrey! It takes one line added in your index.tsx
file, which wraps and overwrites the window’s MediaRecorder
class. Make sure to install it first with npm i audio-recorder-polyfill
.
window.MediaRecorder = require("audio-recorder-polyfill");
Done. Now when you use your app you’ll notice two things:
audio/wav
, which is both good and bad:
audio/webm
was, which means higher costs and latency.There’s a caveat here; we’re not quite ready to go to market against Zello. Mobile browser permissions are so restrictive, especially on iOS, that we’re essentially dead in the water there, making this functionally a desktop-only web application. One problem is that audio cannot play without direct user input, which doesn’t line up with a walkie-talkie application, where user input (clicks) happen independently from audio arriving. There also is not always support for the API’s we’re calling. So, you can add a guardrail in your application to prevent users from being confused:
...
import getUserMedia from "utilities/getUserMedia";
...
const App: React.FC = () => {
if (!getUserMedia || typeof MediaRecorder === "undefined")
return (
<Segment placeholder>
<Header icon>
<Icon name="thumbs down outline" />
Your browser or device doesn't support this app. Sad.
</Header>
</Segment>
);
...
Note that in this case I’m using style elements from semantic-ui-react
, which you can import like I did, use standard elements, or use your own library of choice.
Let’s deploy this out to the real world! Amplify makes this too easy. Once it’s building like you want with npm run build
, run amplify add hosting
and follow the prompts. Make sure to include CloudFront, since you’ll need HTTPS for audio recording. Once that’s configured, run amplify publish
to build and deploy your app to S3 and CloudFront! That will take 15 minutes or more to deploy, but when it’s complete, it will display the CloudFront URL.
So that’s it! Hopefully you now have a fully mostly functioning live voice chat/walkie talkie application. It’s rapidly scalable, stable, and secured, and all managed for you.
How much does it cost? Let’s run the numbers from the official pricing guide:
mpeg
rather than wav
)MediaRecorder
’s timeslice to batch audio segments into fewer, larger calls.How could we improve this application? Fork the repo, experiment with some of these, and show off with a link in the comments!
userId
, and don’t play back audio from yourself!<audio />
elements. On first render, present a button which plays a short, inaudible sound through each one, thus “authorizing” it within the browser to play more audio in the future. Then, pull items off the audio segment queue in order, set the someAudioElement.src
value to your dataUrl
, and play away! If you have separate queues for each userId, and introduce a brief delay before playing the first clip in each, you can also reduce jitter in the audio in between clips, and prevent overlapping clips as well.react-native
application to get around the limitations of mobile browserscreateRoom
and destroyRoom
mutations to be able to create new chat rooms from the web applicationcreateRoom
or deleteRoom
, maybe) to them using a resolver mapping.RecordButton
to that graphic.I hope you enjoyed this workshop and better yet, learned something from it. Please comment, star, and send PR’s to the demo repo on GitHub. Happy coding!
Disclaimer: opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.