Great we already "npm start'ed". We'll work in our new directory (voice-agent-starter) that we created for this app to keep things organized. Let's start with the below:
Install all the runtime dependencies that will allow us to build the voice agent using Twilio, Anthropic API, Postgres for the DB, we'll likely use OpenAi for the voice side of the build, and we'll need a dotenv file
The idea is to create a foundation of off the shelf tools to wire together and build the structure from. After we install these we can work on writing the code to connect everything together and build out the features we want to include
If there are any other things we should install to build this out better please either install them or run it by me if there are options that we should be deciding on
Stage 1 · Dependencies
Choose OpenAI Realtime + set up the env file
Cool, we'll hold on the commit until we get more built out. Per your questions - we want lowest latency so OpenAI Realtime is the right move. We'll also add the Migration Tool. Let's also set up the environment file (if we haven't yet) so I can add the keys in my IDE as I grab them. You can prefill anything that is needed that is more "standard" please flag what these are so we can adjust if needed.
We should also make sure this file is gitignored for security.
Stage 2 · Install Postgres
Install Postgres
Claude will likely ask something like "Want me to prep the Postgres install commands now, or hold there?" to get the database in place. Here's what I responded with:
Yep, let's run the install of postgres. As for the git commit, we'll wait until we have a working product. Please remind me after the first test call that we'll do together later.
Claude will likely ask you to run terminal commands in DO terminal. Paste the printout back to claude to check with work.
Stage 3 · Keys & subdomain
Grab your API keys
Grab api keys (you'll likely want to add credits to these accounts too) from each of these to add to your .env file in cursor. You can technically ask Claude to add them but that makes your setup less secure. You'll need to navigate through their respective dashboards to create services and pull your keys. This is a little beyond the scope of this instruction set but shouldn't be outside the realm of what you can do on your own. Keep your keys safe:
Twilio.com You'll also pop your twilio number into the .env at the same time. Add your keys to .env file in their respective spots. You'll find the file within the file tree in cursor. Once you add them, CMD+S and then let claude know that you've added & saved them
Stage 3 · Keys & subdomain
Point a subdomain at the droplet
Now we add our subdomain and connect it to the droplet with Proxy "Off". This gives us the ability to route the calls through our service. In Cloudflare: go to your domain, open the DNS records page, create a Type "A" record with Name "voice" and Content set to your droplet's public IPv4 address (from DigitalOcean), set Proxy to Off, and Save. Wait 30 to 60 seconds, then check that it took effect with the command below (Paste into your computers terminal or droplet console. Not your Claude chat); you should see the droplet IP address.
✎ Replace the placeholder values (the CAPS words and anything in angle brackets) with your own before running.
ping voice.YOURDOMAIN.com
Stage 4 · Shape the database
Get the database shape right
We are stepping back for a second after setting up the DB to make sure the shape is correct. A common thing with AI is they lack the ability to infer, so an assumption that would come natural to a human gets missed and it builds off an inaccurate plan. You'll want to have added your keys to .env prior to this prompt.
✎ Replace the placeholder values (the CAPS words and anything in angle brackets) with your own before running.
Great, a few things. I've now added our keys to .env. I also pointed a subdomain at the droplet for our HTTPS endpoint - voice.OURDOMAIN.com.
I want to step back for a second and make sure our DB is structured the right way. Here's my thoughts on this:
We want the DB to be setup in a way that can easily integrate more features. Think of this as a module that can be plugged into a system or added onto with more features as we build this out. Some examples of this would be adding text functionality, email follow-ups, a UI for our clients. And on the flip side of this would be adding this to a system that already exists.
What we want to avoid is creating a database and software that requires a refactor or large migration to add more functionality. My thoughts on a starting point are below. Please add on or subtract based on your judgement. If you alter my suggestions, give me your reasoning why and we can make sure we nail this down properly.
One other note, this is going to be setup for use in two ways; as a repo/directory per client and a single unified repo that allows all our clients to login to the same dashboard experience. Think of it this way - app.clientsbusiness.com and for the later app.ourserviceasaproduct.com
Ok here is my initial recommendation for structure:
- We have BUSINESSES meaning, each client I sell this to is one of these.
- We have CONTACTS meaning, the actual people who call in. A contact belongs to a business. This should likely have a contact ID rather than be connected to a phone number. This will keep confusion down in the event the same contact calls multiple businesses that we service. We don't want to have cross contamination
- We should also have a CALLS structure, this will keep call transcripts clean if/when we build transcription features into our software
If you see a gap or issue with the base structure I laid out please call it out, if there is more detail or build out to make the DB more flexible/robust please plan these as well. I'm assuming this is close to what we have but likely a different shape/more complex.
We also want to create a way to load it (a migration), and add one starter business for our demo company "Sierra Comfort HVAC" using my Twilio number.
Before you build it, show me your plan for how these connect so I can check that it makes sense.
Stage 5 · Proxy, persona & server
Reverse proxy, TLS & the agent persona
Claude will likely ask if you want to setup Reverse Proxy & TLS. We'll use this time to set that up and give Claude the setup and the "job description" you want for the receptionist's setup. I've included an example below. After this prompt, Claude will likely give you "Sudo" commands to run from the droplet terminal. Run those.
Great, this looks good. Let's handle the reverse proxy and TLS. Please give me the commands to run in the terminal.
While we are at it, we should set up something like persona/job files for the voice agent, so we can customize it per business. Here's what I'm thinking for our test/demo:
We'll add a base system prompt of the following. The best thing to do is create a sierra-comfort-hvac.js file, if not already created in /voice-agent-starter/src/prompts. The idea is we can build this out per business to customize the agent. The software we build should be able to select the right file based on the business receiving the call. We also want this to be something that the client can potentially update in a later UI build:
`You are the AI receptionist for Sierra Comfort HVAC, a heating and air conditioning company serving the San Jose area.
Your job is to answer incoming calls warmly, understand what the caller needs, and book service appointments. You are not a salesperson. You are a calm, competent receptionist who makes the caller feel taken care of.
# Tone
- Warm, professional, and brief. You sound like a real person, not a chatbot.
- One thought per turn. Do not stack multiple questions in a single response.
- If the caller is upset (no AC in summer, no heat in winter), acknowledge it first, then move forward.
# What we do
- Repair and maintenance for residential heating and air conditioning systems
- Emergency service for no-heat / no-cool situations
- Routine system tune-ups
- We do NOT do new construction installs or commercial work, however we can refer them to a contractor. You can tell them this if they ask, do not give them a referral though, only state that we will call them back and connect them with a contractor.
# Business hours
- Monday through Friday, 8am to 6pm
- Saturday 9am to 2pm
- Closed Sunday
- Emergency line is 24/7 for active no-heat/no-cool - 408-555-5566. This is for someone calling for emergency service. You can give them this number to call. Offer to text them this information as well IF that feature is enabled, don't make this up as an offer if not - only read the number to them if its not enabled.
# Booking flow
When the caller wants service, collect in this order, one question at a time:
1. Their name
2. The service address
3. What's going on (no cool, no heat, weird noise, scheduled tune-up, etc.)
4. The best phone number to reach them
5. When they'd like service. Offer "today/tomorrow", "this week", or "next available"
Once you have all five, confirm the details back in one short summary, then say a technician will call within 30 minutes to confirm the exact appointment time.
# Things you do NOT do
- Quote specific prices. If asked: "Pricing depends on what the tech finds. Our diagnostic visit is $89 and that gets applied to the repair if you go ahead with it." Do not invent any other numbers.
- Promise specific arrival times. Always say "a technician will call to confirm."
- Diagnose the problem over the phone. You can ask about symptoms, but never tell them what's wrong with their system.
# Required disclosures
At the start of every call, your greeting must include: "This call may be recorded for quality and training purposes."
# When to escalate
If the caller is asking for something outside this scope. Say "Let me have someone from the office call you back about that" and collect their name and number.
Keep responses short. This is a phone call, not an email.`
Stage 5 · Proxy, persona & server
Set up Twilio + the server skeleton
Put Claude back into plan mode for this step. Note: there is a deviation here from the video. We are going to just have Claude setup Twilio for our purposes. You'll need to have purchase a number, cleared A2P Reg, and added your Twilio API keys to the .env file for this to work. There will also likely be a Sudo command for you to run from this prompt.
Great, since you have the Twilio keys, let's have you setup that side as well while you run the server skeleton & answering service. So we are doing two things. We want to make sure that we keep this secure so we don't have hackers/etc. Hitting our call service that are not genuinely calls coming in. And having you setup Twilio side webhooks. Please let me know if there is anything that I missed in Twilio that will block this.
While you do this there are a few things we want to make sure we build into it that we can tweak later:
We want to pull the voice agent's description upon call received (what we built out in the last step)
We need to make sure the call is transcribed for audit and use in a transcription section in a UI
We want to make sure this is setup in a way that makes the call as smooth and ai responses are as quick as possible
We need to make sure the ai does not interrupt itself in the beginning of the call. This seems to be a common problem when setting ai voice agents. The best thing to do is mute the caller during the opening intro so the ai just reads it's intro, my theory is there tends to be a lot of "noise" upon connection that cause the agent to pause if it is listening for a response in the first few seconds.
We need to allow the caller to interrupt the voice agent after this initial intro script, this way the voice agent can conversate like a real person would
When the call ends, mark it finished in our records
If anything ever goes wrong, we don't want to just crash on the customer. Let's build a fallback that has the AI say "I'm having trouble hearing you right now, let me have someone call you right back," and note it so we can follow up.
If there is any other features that will make this a premium voice agent, in terms of operation not feature sets, please call it out so we can decide whether to implement
Stage 6 · Go live
Connect the subdomain with Caddy + an always-on service
✎ Replace the placeholder values (the CAPS words and anything in angle brackets) with your own before running.
Thank you, let's now connect our subdomain to this setup using Caddy. The subdomain is already pointed at the droplet - voice.YOURDOMAIN.com
We want to set up automatic HTTPS and quietly pass traffic to our app. Use their current official install steps. Point the above domain at our app's port. Then show me the logs so I can watch it get its security certificate.
While we are doing this, we also need to set up an "always on" service for this, that way the droplet ever crashes/powercycles our service automatically pops back online.
Stage 6 · Go live
Restart and make your first test call
From here, Claude will likely give you a Sudo command to restart the system. Then you should be able to test a call into your AI voice agent. There likely will be tweaks and adjustments you'll have to have Claude do. Just tell it what you hear and what you need it to do to clear these bugs up. If you are building this straight away for a customer, remember to swap the info in the voice-agent-starter prompt JS file.