THE NEW YORK TIMES

Who is that you are chatting with? Oh, just ChatGPT

Who is that you are chatting with? Oh, just ChatGPT

When you think of what a voice programmed by artificial intelligence would sound like, you might picture something robotic and stilted, with a staccato cadence incapable of capturing the inflections, speed and emotion required to sound even somewhat human. But this is 2024, and the robots have gotten a serious upgrade. Now they can imitate voices, accents and intonation to an almost creepy degree – for better or worse.

ChatGPT’s new Advanced Voice Mode feature, which was released to most users of ChatGPT last week, is an audio version of the original ChatGPT, which uses artificial intelligence to respond to text prompts conversationally. Advanced Voice Mode works in the same way as the original but using audio; those using Advanced Voice Mode speak into the app, and the voice automatically responds. Users can choose one of nine voices, and then through conversation and text prompts, they teach those voices to talk in a way that they like.

It works well enough that OpenAI, the company that owns ChatGPT, issued a safety report warning that people could become emotionally reliant on the feature.

“It recognizes from both the words you are using, as well as the inflections in your tone, but also informed by the context of the words that you’re leveraging, to respond in a way that best makes sense,” explained Celia Quillian, an artificial intelligence expert. “ChatGPT has always been a predictive model, right? So it’s just predicting what the most likely response should be based off of the input you give it, and now it’s doing that with sound.”

The new feature has prompted a host of TikTok users to post videos showing off Advanced Voice Mode’s capabilities, including speaking in slang, dialects and even the occasional regional accent. The results are surprising and often hilarious.

When Morissa Schwartz, a New Jersey native and entrepreneur, asked ChatGPT to mimic what she sounded like, the app didn’t hesitate.

“Ay, how you doin’? It’s Morissa, down the shore, grabbin’ a slice of the finest pizza,” the voice said in an exaggerated accent. “You know, the kinda pizza that makes you wanna move here just for the food – fuhgeddaboudit.”

“Is that really what I sound like to you?” Schwartz asked incredulously in a video of the exchange posted to her TikTok account. “I knew I had a little Jersey accent, but is it really that noticeable?”

“Well you definitely have a touch of that Jersey flair,” the app responded. “It’s got character and warmth, just like you.”

In an interview, Schwartz said that as a sixth-generation New Jersey native, she found the app’s impression of her to be “a hilarious surprise.”

“I had no idea it could do impressions,” she said. “A couple of days after I got Advanced Mode, I noticed it changing accents, so on a whim, I asked if it could impersonate me. And I was shocked at how good it was, like, ‘Did this thing secretly binge-watch my TikToks?’”

Schwartz added that chatting with the bot “used to feel like asking questions to a super brainy professor.” Now, though, she said, “it feels like that brainy professor is my bestie.”

Advanced Voice Mode often relies on tropes and stereotypes when trying to communicate in the ways users ask, which can result in responses that some users could find offensive. When an influencer, Noah Miller, asked Advanced Voice Mode to talk “super gay,” the app responded in kind: “Absolutely darling, let’s have a fabulous kiki. What’s the tea? What’s on your mind? Spill it.” (Miller, for his part, appeared to find the exchange hilarious.)

Though most users seem to currently be using Advanced Voice Mode for entertainment purposes, Quillian said that she already sees some practical applications for the feature, including helping audio learners with homework, doing real-time translation and acting as a fill-in therapist.

“It’s going to sound kind of odd – I’m a big advocate for therapy and human therapists,” Quillian said, “But if you can’t afford it and you just need a listening ear, talking to this thing can feel very much like you’re talking to a person who’s empathizing with you, who’s asking questions back, who’s trying to dig into how you’re feeling.” She added, “I can see it as kind of a way to sort through problems out loud if you don’t have anyone else around in a very empathetic way.”

Still, Quillian admits that there could be a downside to people becoming too emotionally invested in a robot – just as OpenAI warned – no matter how smart and lifelike the voice may seem.

“I think that there is a well-grounded fear among many that with these kinds of emotive tools, people might start replacing AI for a relationship where it becomes scary, right?” she said. “And I’d agree with that. I think it’s that balance of you know, we can’t forget our humanity and our relationships to each other in the context of using these tools, but we can use these tools to advance our connections and our relationships to each other as well and to ourselves.”


This article originally appeared in The New York Times.

Subscribe to our Newsletters

Enter your information below to receive our weekly newsletters with the latest insights, opinion pieces and current events straight to your inbox.

By signing up you are agreeing to our Terms of Service and Privacy Policy.