How to Clone Your Voice for AI Content

The Content Kitchen Podcast

How to Clone Your Voice for AI Content — and Keep It Human

0:00

-5:58

How to Clone Your Voice for AI Content — and Keep It Human

Tam Nguyen

Apr 28, 2025

Transcript

If you're the voice of your business, staying visible can start to feel like a full-time job. Voice cloning can give you breathing room without making you sound like a bad AI knockoff.

When I first heard about "voice cloning," I almost dismissed it. I pictured stiff, robotic voices that didn’t sound anything like a real person. The kind of thing you’d hear in a glitchy chatbot from 2012.

But here’s what I’ve learned: when you use voice cloning the right way, it doesn’t make you sound fake. It helps you protect your energy without losing the parts of your voice that make people trust you in the first place.

That’s the real goal.

How I Got Here

A few months ago, I realized I was spending way too much time recording things I didn’t need to record live. Newsletter intros, short tutorials, onboarding videos.

It wasn’t just a time problem. It was an energy drain. By the end of the week, my voice felt tired and my focus was slipping. And I still had client work that actually needed my full attention.

At first, I resisted the idea of cloning my voice. It felt weird. I didn’t want to sound like a robot or make people feel like I was phoning it in.

But after looking at the mountain of small tasks piling up, I figured it was worth testing. Worst case, I’d waste an afternoon. Best case, I’d get a little breathing room.

I tried ElevenLabs—and it turned out to be a lot more useful than I expected.

What Voice Cloning Actually Is

Voice cloning isn’t magic. It’s not perfect either. It’s a tool.

ElevenLabs studies how you naturally speak—your tone, pacing, and quirks—and builds a model that can read new text in your voice. You feed it audio, and it learns your style.

If you want a voice that actually sounds like you (not a stiff AI version of you), you need about two to three hours of clean recordings. No background music, no heavy editing, no echoey Zoom calls. Just clear, natural talking.

The better the recordings, the better the voice model.
Simple as that.

What I Did (And What I’ll Do Differently Next Time)

I pulled together about two hours of old recordings. Podcast episodes. YouTube videos. A few voice memos I had lying around.

Some of it had music in the background. Some was edited differently. It wasn’t exactly clean, but it was what I had.

The AI voice I’m using for this newsletter came from that mix. It’s decent for lighter content. But if you listen closely, there are places where it sounds a little flat or too careful—like it’s trying too hard to get the words right instead of just talking.

This summer, I’m planning to record a new set of clean audio. Just me talking naturally in a quiet room. No music, no edits. Probably some rambling about systems and why half the things people overcomplicate could be fixed with a shared Google Doc.

Because with this stuff, like anything else, the quality of what you put in shapes what you get out.

How to Set Up Your Own Voice Clone

If you’re thinking about trying this, here’s what you’ll need to do:

Step 1: Gather your audio
Pull together two to three hours of clear recordings. Ideally:

Podcast episodes where you’re speaking naturally
YouTube videos without background noise
Long voice memos

Skip anything with intro music or heavy editing if you can.

Step 2: Create an ElevenLabs account
Go to ElevenLabs.io and sign up. Look for the VoiceLab section—that’s where you’ll upload your files.

Note: the Professional voice that I tested is on the paid version.

I just had to try it… but you can definitely give the free version a go to see how you like it. You’ll only need a few minutes of audio to train the basic model.

Step 3: Upload and train
Upload your files. Let ElevenLabs process them. It usually takes a few hours, so it’s a good time to organize a few sample scripts you might want to test once your voice model is ready.

Step 4: Test and tweak
Once it’s done, you can adjust settings like:

Stability (how strictly it sticks to your tone)
Expressiveness (how lively or flat it sounds)
Clarity (how natural the sound feels)

It’s worth playing with the sliders until it feels right.

Step 5: Start small and improve later
Your first version probably won’t be perfect. That’s normal.
Start with what you have. Use it where it makes sense. You can always retrain a stronger version later once you have better recordings.

Start where you are.
Adjust as you go.
It’s not supposed to be perfect the first time.

Where Voice Cloning Helps

Voice cloning helps me with the parts of my work that don’t need a live recording every single time.

For newsletters like this, I don’t sit down and write every word manually. I’ll usually have ChatGPT draft a conversational script based on my ideas, then I tweak it until it sounds like me. From there, I send it through ElevenLabs for the audio version.

It’s saved me hours each week—not by cutting corners, but by not spending extra energy on things that don’t need my live voice.

Same for short videos.
Instead of setting up my mic and blocking off half an hour to record a captioned clip, I can write the text, use my AI voice, and keep moving.

It’s a small shift, but it protects my focus. And honestly, it’s nice not having to fight with a microphone stand for every five-minute project.

When Voice Cloning Works (And When It Doesn’t)

Voice cloning works best when your audience already knows your real voice.

Sabrina Romanov, a creator who uses ElevenLabs and Synthesia, built an AI avatar that blends naturally with her brand.
Because she already had strong trust with her audience, adding an AI element didn’t feel awkward. It felt aligned.

People trust people.
The tech just supports that trust—if you use it carefully.

If you’re still early in your business, I’d focus on real human connection first. Talk to your people. Make the videos. Let them hear your voice with all its natural quirks.

The better your connection, the easier it is for AI tools to fit in without feeling weird later.

Keeping It Human

The real challenge with voice cloning isn’t the technology. It’s the tone.

Your voice isn’t just what you sound like. It’s the way you make people feel when you’re explaining something, encouraging them, letting them know you understand.

If you want your AI voice to sound real, start with writing the way you actually talk. Natural rhythms. Real pacing. A little messy, a little human.

Save the AI voice for lighter content—things like intros, simple updates, or quick tutorials.
When it’s time for deeper conversations, you’re better off showing up live.

The point isn’t to outsource your voice.
It’s to protect it.

It’s About Protection, Not Perfection

Voice cloning isn’t about scaling faster or flooding every platform with content. It’s about protecting your energy so you can keep doing the work that actually matters.

You don’t need to use it everywhere. You don’t need it to be flawless. You just need to find the places where it supports you best—and helps you stay creative, not exhausted.

I’m still figuring out where those lines are. Some weeks, the AI voice is a huge help. Some weeks, I realize I’d rather record a quick voice memo and send it the messy way.

That’s part of it too.

Because at the end of the day, it’s not about sounding perfect. It’s about having enough energy left to say something worth listening to.

Talk soon,
Tam

P.S.
The audio version of this newsletter was generated by my current AI voice clone.
It’s not perfect—and it definitely needs improvement but I’ll plan to retrain the model later on and let you know how it goes.