---
date: "2024-01-26T20:37:31.000Z"
title: "2024-01-26"
draft: false
---

For several days now, I've been looking into recording audio in a browser and streaming it to a backend over a websocket with the intent to do speech to text translation with an AI model.
I know the pieces are all there and I've done something like this before (streamed audio from a Twilio IVR to a node backend, the send that to a Google [Dialogflow CX](https://cloud.google.com/dialogflow/cx/docs) agent).
The current challenge is finding which pieces I want to connect.
I've used a lot of Next.js lately.
I like the developer experience.
It enjoyable to use to build frontends.
It also has [route handlers](https://nextjs.org/docs/app/building-your-application/routing/route-handlers), which are backend functions that are deployed on Lambda if you deploy on Vercel.
These route handlers can't really support a websocket backend because they aren't designed to be long lived, something I learned when I worked around if by creating a secondary route handler as an [async function](/til/nextjs/async-functions).
Apparently, these can now run for up to five minutes.[^1]
Route handlers on Vercel can now run for a maximum of five minutes, which is an increase from the previous limit. This allows for more complex operations to be handled directly within these functions.
I would need to stand up a separate backend.
That seemed fine and fair enough, so I started looking at Deno, which I've also used recently and enjoyed.
Deno supports websockets out of the box.
It also supports importing npm modules -- I plan to use `@google-cloud/speech` to do speech to text conversion.
The remaining question is how I can stream audio captured in the browser with `navigator.getUserMedia` over a websocket to forward to Google to convert to text.

[^1]: https://vercel.com/changelog/serverless-functions-can-now-run-up-to-5-minutes