Hidden Brain XX
[0] This is Hidden Brain.
[1] I'm Shankar Vedantam.
[2] At some point in our lives, many of us realize that the way we hear our own voice is in the way others hear our voice.
[3] Shea had that realization as a child, helping out at the family business, a Delhi, in Southwest Virginia.
[4] We did a lot of business with a deli, a lot of call -in orders, and I had to answer the phone with the name of the business.
[5] Often, when the phone rang, the same thing happened over and over again.
[6] I vividly remember always being confused from my mother.
[7] They would always say, oh, hey, Judy.
[8] You know, and either start into their questions or whatever they were looking to speak to my mother about.
[9] This case of mistaken identity became a running joke.
[10] You know, it was ha -ha, they thought that he was just.
[11] Judy.
[12] Shea didn't correct the callers.
[13] That's because She didn't mind being mistaken for Judy.
[14] It was just comforting to me because it felt natural.
[15] Shea was raised as a boy.
[16] But now, decades later, She identifies as a transgender woman.
[17] We're not using her legal name at her request because it's a man's name.
[18] Shea's experience at the Delhi became a template for the rest of her life.
[19] She listened to her voice, and she listened to the way others heard her voice.
[20] There was always a gap between the two.
[21] She first tried to sound more masculine to fit in with the way the world saw her.
[22] So I would consciously make an effort to try to talk a little deeper.
[23] It was, you know, I practiced it.
[24] One way she practiced was to sing along with the doors, tool, and nine -inch nails.
[25] I tended to sort of go towards heavier music.
[26] You know, it's raspy, deep, yelling almost, voices.
[27] Sounding more masculine became second nature.
[28] But it wore on her.
[29] My entire life, I have been playing the role of a boy.
[30] And it is exhausting.
[31] It truly is.
[32] Years and years passed like this.
[33] A divorce, a second wife, two kids, and a cancer scale later, she began to reconsider how she wanted to sound.
[34] Instead of trying to sound more masculine, she now started to try to sound more feminine.
[35] Even prior to accepting that I was trans, before I could put a label on what I was, I consciously made an effort to not sound as masculine.
[36] And that started in my early 30s.
[37] Once again, she used music as a way to practice.
[38] I would always sing in the car alone.
[39] And I would attempt Britney Spears.
[40] I think I did it again.
[41] I made you believe we're more than just friends.
[42] It seems silly, but I would, when I'm singing in the car, I would sort of turn my head towards my driver's side window.
[43] Because it would reflect the sound back to me a little more loudly so that I could hear the pitch and tone of my voice.
[44] And I would try to make my voice sound at a higher pitch without it sounding like I was trying.
[45] What did you hear?
[46] Did you hear the voice that you wanted to hear?
[47] No. No, I've never, I don't know that I have ever actually been able to hear the same voice that I hear come from me. She has spent a lifetime being dissatisfied with the way she sounds.
[48] She viscerally knows something the rest of us often forget.
[49] Our voices shape who we are.
[50] They shape how other people think of us.
[51] This week on Hidden Brain, we look at the relationship between our voices and our identities.
[52] Voices about who you are, our voice signals, things about our personality.
[53] Plus, how technology might help people with vocal impairments find voices that reflect who they are.
[54] Once you close your eyes and let your mind relax, it doesn't take long to escape to the beautiful beach.
[55] and the ethical quandaries that arise when we can create personalized, customized voices.
[56] This is huge.
[57] They can make us say anything now, really anything.
[58] Jackie Kirk used to love the sound of her voice.
[59] She spent her 20s in San Francisco.
[60] And like many young people living in the big city, she enjoyed going out with her friends.
[61] She danced to electronic music at clubs and drank at bars.
[62] She was outgoing.
[63] And she was a flirt.
[64] I have to admit, I've always enjoyed flirting.
[65] And so I was quite the flurter.
[66] It was sort of a fun activity to help pass the day in doing something mundane at work.
[67] For Jackie, flirting was also a demonstration of her confidence.
[68] It reinforces my own identity, how I felt about myself.
[69] Fun, I'm somebody that people are attracted to, not just physically or sexually, but a person who people like.
[70] Jackie liked to be liked.
[71] She liked being someone people wanted to be around.
[72] When she thinks about who she was back then, she refers to that person as Voice One Jackie.
[73] Voice One, Jackie, was really fun -loving, always joking, pretty carefree.
[74] For years, Jackie had been doing backpacking trips with her then boyfriend.
[75] All of our gear on our back, pounds, 60 pounds, et cetera, even carrying bottles of wine in the backpack.
[76] She'd been doing these backpacking trips for several years, but during a trip to a national forest in California, Jackie came up short.
[77] I couldn't go one step further.
[78] I, you know, in being young, you think, you know, what could it be?
[79] There's nothing wrong with me. I'm 20, whatever, 20 -something years old.
[80] And I noticed getting really short of breath, and it was a real struggle.
[81] I finally made it, of course, but it was really slow and a real struggle to make it back.
[82] Hiking became too taxing for her, so she cut back and switched to ballet.
[83] But one day, during a series of relevace, a more aerobically challenging dance move, Jackie felt lightheaded and dizzy.
[84] It was serious.
[85] I had a seizure, and, uh, You know, all of the medical follow -up led me to discover that I had a lung disease.
[86] She was diagnosed with idiopathic pulmonary hypertension.
[87] It's a rare progressive disease where the blood vessels in the lungs shrink and oxygen is not distributed properly.
[88] There's no cure for it except for a lung transplant.
[89] In 2008, at 32, Jackie received a double lung transplant.
[90] The surgery was successful.
[91] Within weeks, she was out of the hospital and back in the dance studio.
[92] In 2010, she left San Francisco to explore Latin America and Europe.
[93] But her body began to reject her new lungs.
[94] Before long, Jackie was back in the hospital, this time in Switzerland, waiting for another lung transplant.
[95] I had the surgery in January of 2013, and I was asleep still for another month and a half.
[96] When she woke up, she found her surgery had been successful, except for one very important thing.
[97] I couldn't speak.
[98] I couldn't speak.
[99] During the operation, the medical staff had used a ventilator to help her breathe.
[100] So I was intubated.
[101] That means they basically have a tube that they put inside your mouth, and it goes down your throat, and they send that down into your lungs.
[102] And during this intubation, the tube was a tube was.
[103] rigid enough to cause some damage to my vocal cords.
[104] Jackie began speech therapy, and within a few weeks, she slowly regained the ability to speak.
[105] But the voice that came out of her mouth, it wasn't her voice.
[106] My voice changed.
[107] It's raspyer.
[108] It's broken a bit.
[109] Ever since, speaking has been hard work.
[110] Yeah, I really have to push my vocal.
[111] I feel it.
[112] It's actually a physical effort.
[113] I'm actually squeezing the vocal cords.
[114] as hard as I can to make the loudest sound possible to get, to be heard.
[115] And it's very tiring.
[116] The harder it became to produce sound, the more self -conscious she became about the way she sounded.
[117] I feel less confident.
[118] I'm aware of how people might perceive me, so I'm a little more shy.
[119] I don't approach people like I used to.
[120] Jackie believes the change in her voice has led to a dramatic change in her personality.
[121] For much of her life, her voice was a manifestation of her confidence.
[122] I used to go to clubs quite a bit, you know.
[123] But, you know, when you have a normal voice, you can still talk to people.
[124] in those environments where it's kind of loud and noisy, or bars to meet friends or to flirt, like I like doing.
[125] But, you know, those places now, you know, I don't really go to anymore.
[126] Jackie, a woman who once described herself as carefree and outgoing, who took pride in her ability to flirt, became withdrawn, reserved.
[127] You know, I have tons of scars all of them.
[128] my body.
[129] And that plays on my confidence as well.
[130] But in public life, people can't see those scars.
[131] And I feel like my voice is that, you know, that scar they can hear, you know, they know something's wrong.
[132] And they know, oh, maybe she's weak, maybe she's sick.
[133] Just by hearing my voice, it's this signal.
[134] Our voices communicate so much more than mere information.
[135] They communicate our feelings, our temperament, our identity.
[136] When we come back, how scientists are weaving this insight into custom -built voices.
[137] I can't wait for my friends to hear my new voice.
[138] Scientists have been trying for more than two centuries to analyze the human voice, decode its components, and recreate it.
[139] An early success came from a man named Homer Dudley.
[140] He developed an organ -like machine that he called The Voter.
[141] It worked using special keys and a foot pedal and was capable of creating about 20 different electronic buzzes and sounds.
[142] When those sounds were combined, they formed words.
[143] The voter fascinated people at the 1939 World's Fair in New York.
[144] Well, we've heard the voter make a word, and by combining words, of course, we got a sentence.
[145] For example, Helen, will you have the voter?
[146] say, she saw me?
[147] She saw me. That sounded awfully flat.
[148] How about a little expression?
[149] Say the sentence in answer to these questions.
[150] Who saw you?
[151] She saw me. Whom did she see?
[152] She saw me. Well, did she see you or hear you?
[153] She saw me. The voter was an early example of electronic speech, but it was cumbersome to operate and required special training.
[154] Over the next 40 years, speech scientists continued studying the components of the human voice.
[155] They eventually developed methods to mathematically map the acoustic patterns and phonetic properties of natural speech, vowels, syllable constructions, consonants.
[156] Welcome to the Stockholm Speech Communication Seminar.
[157] Hello, I am teacher's while reading machine.
[158] Welcome to Mid -Mazata Library.
[159] To be or not to be?
[160] That is the question.
[161] I can read stories and speak them aloud.
[162] I do not understand what the worst means when I read them.
[163] This is a complete report of track speaking.
[164] You are listening to the rights of a machine.
[165] By the 80s, speech synthesis was no longer.
[166] the staff of science demonstrations at shows and fairs.
[167] Text -to -speech systems are beginning to be applied in many ways, including aids for the handicapped, medical aids, and teaching devices.
[168] The first kind of aid to be considered as a talking aid for the vocally handicapped.
[169] The research of Dennis Clatt at MIT paved the way for the voices we might be familiar with today, many of them used in assistive communication devices.
[170] I am beautiful Betty, the standard female voice.
[171] I am huge, very large person with a deep voice.
[172] I am the standard male voice, perfect Paul.
[173] This last one, by the way, became famous after Stephen Hawking adopted it.
[174] Speech technology has come a long way in the year since Homer Dudley unveiled the voter.
[175] But in many ways, synthetic voices still sounded synthetic.
[176] They didn't convey all the information that's packed into the human voice.
[177] Voice is identity, right?
[178] Voice is about who you are.
[179] Our voice signals how old we are.
[180] Our voice signals our gender.
[181] Our voice signals, you know, things about our personality.
[182] Rupal Patel is a speech scientist at Northeastern University.
[183] Perhaps more than many people, she has thought a lot about the human voice.
[184] When she misses her mother, for instance, Rupel has a special technique to evoke her presence.
[185] That's right.
[186] My parents now live in L .A. and I live here in Boston.
[187] And oftentimes I find myself imitating my mom.
[188] You know, I'll say, oh, Betta, how are you today?
[189] You know, or something like that.
[190] I'll imitate her the way she might say something.
[191] I might say that the same way to my daughter or something like that.
[192] But what I'm evoking is my mother's voice primarily to feel the closeness of her here.
[193] In 2002, Rupel took these ideas with her to Denmark, where she was scheduled to speak at a conference for researchers and patients.
[194] I was presenting some of my early work showing that individuals with very severe speech disorder still have the ability to make sound, and those sounds have some communicative content in them, some information that could be used.
[195] After her presentation, she walked out to the exhibit hall, and that's where she noticed something.
[196] Lots of people were using devices that produced synthetic voices.
[197] What was odd was that many of the voices didn't seem to match the people using them.
[198] At that point back in 2002, we had very limited synthetic voice options available.
[199] And so I heard a little girl or a young girl using a device to talk with an adult male voice and having a conversation with another person, a middle -aged man, who also was using the same voice.
[200] And so they were using different devices, but their voices were identical.
[201] She had just presented on the idea that our individual voices carry something unique about us.
[202] So why was this not reflected in these synthetic voices?
[203] Why are we giving them the same black box to speak through?
[204] There's got to be something that we can do, that we can harness the quality of the voices that they have, and imprint those or use that to give them a prosthetic voice that somehow reflects who they are and not just the same voice for everyone.
[205] Could a synthetic voice capture the richness of natural human speech?
[206] Rupert launched a company to answer this question.
[207] It's called Vocal ID, and it uses machine learning and other artificial intelligence technologies to create personalized voices.
[208] So what synthetic speech is is taking recordings of anyone, And then taking those recordings and building a model of the voice quality of the enunciation abilities, right?
[209] You aren't necessarily analyzing it and from a top down saying, well, this person has a high -pitched voice or this person has a low -pitched voice.
[210] You're taking the recordings as basically the raw ingredients to feed to a machine learning algorithm or set of algorithms, really.
[211] and they're learning the patterns of the clarity of the person's S, the, you know, how that sound is changed in the different phonetic environments, the voice quality, aspects of, all of these are learned by the machine.
[212] It's really re -emulating the human voice by a machine.
[213] In other words, the idea is to build a model of how a person sounds.
[214] To do so, you use a vast range of examples of that person.
[215] person's speech.
[216] Then you use the model to produce spoken language that incorporates all the idiosyncrasies and texture of that person's voice.
[217] One of Rupal's early clients was a young girl, Maeve Flack.
[218] Maeve was born with cerebral palsy.
[219] She's in a beautiful family where she's two older sisters.
[220] Rupal's goal was to give Maeve her own unique synthetic voice, one that could express not just her words, but her identity.
[221] The first step was to record her.
[222] So those sounds were the kinds of sounds that are unique to Maeve.
[223] Those are the sounds that she makes, where if, you know, when she's in a classroom with several other kids who also have communication disabilities, when she makes that sound, you know it's Maeve speaking.
[224] So we harnessed those sounds of maves to create her unique voice for her.
[225] Then Rupert turned to Maeve's older sisters, Erin and Megan, who volunteered to record their voices so they could be blended with Maves.
[226] Ice cream is my guilty pleasure.
[227] That man ran fast.
[228] Erin and Megan read hundreds of sentences and phrases and uploaded them to our website for Rupel.
[229] Like a painter mixing a palette, Rupel took elements of Mave's voice and mixed them with those of her sisters and other vocal donors to create what she calls a bespoke voice.
[230] I can't wait for my friends to hear my new voice.
[231] My parents are really happy.
[232] I'm not addicted to Fortnite.
[233] I want to meet Taylor Swift.
[234] So we're hearing, you know, Mave at this age in terms of her sound as well as her siblings' recordings being combined and being produced through this speech synthesis engine.
[235] It's possible that Mave may decide as she gets older that her voice needs to age with her.
[236] She'll need a new bespoke voice at that point.
[237] The same technology can also be used to preserve a person's existing voice.
[238] Sometimes this is done when a person faces the prospect of losing his voice.
[239] These could be individuals who are losing their voice to degenerative conditions, so slowly their voice is changing, such as ALS or Parkinson's disease.
[240] And then those who, the trauma is actually far more pronounced for individuals with something like Headenet Cancer, where they learn that they're going to have their voice box removed within a couple of weeks.
[241] Lonnie Blanchard confronted this traumatic news in 2018.
[242] Doctors had diagnosed him with cancer and said surgery was the only option.
[243] Lonnie had to have his tongue removed.
[244] Here he is speaking to the BBC.
[245] Now that I know I'm going to lose my voice, I've got to get some sayings down on a personal recorder to get what I would normally say to my wife and kids, but every time I go to do that, I draw up a blank.
[246] By the time Lonnie started working with Rupil, he only had a few weeks to bank his voice.
[247] We helped him get set up in terms of the microphone he would need and things like that.
[248] Rupel worked with Lani to build a database of sound samples before his surgery.
[249] He recorded sentences that gave Rupel and her colleagues the different kinds of sounds they would need to build a new voice.
[250] I wish we could get acquainted.
[251] I'm going to be a teacher when I grow up.
[252] After Lonnie banked his voice, Rupil used the recordings to create a personalized voice for him.
[253] The difference between his voice and Mave's voice is that Rupel didn't need to blend voices from donors.
[254] Lonnie was his own donor.
[255] Those voice samples then are used, are cleaned up and annotated by machine, actually, and then used to feed into the algorithms we have to create the synthetic voice.
[256] Similar to Maeve, Lonnie uses an assistive device, in his case, an iPad.
[257] He can type out what he wants to say and hear his voice speaking to his family.
[258] Once you close your eyes and let your mind relax, it doesn't take long to escape to the beautiful beach.
[259] It's really empowering.
[260] It's continued to be a way that he can connect to family members.
[261] and feel that part of him is not fully lost.
[262] While most of us will never have the experience of losing our voices and having to obtain synthetic voices as replacements, increasingly, many of us are coming into contact with these voices.
[263] Hey, Siri.
[264] How many ounces are in a cup?
[265] One cup is eight fluid ounces.
[266] Hey, Siri, set a timer.
[267] For how long?
[268] 56 minutes.
[269] Okay, sure.
[270] 56 minutes.
[271] Starting now.
[272] Alexa.
[273] Hey, Google.
[274] Can you play music?
[275] Play some jazz.
[276] There's a station you might like.
[277] Synthetic voices are already changing our lives, and it's likely we're going to become even more reliant on them.
[278] In May 2018, Google revealed a new program it was working on.
[279] CEO Sundar Pichai presented it to an audience of software developers.
[280] The technology is called Google Duplex.
[281] It allows you to make a restaurant reservation through a voice assistant.
[282] See, how may hear you?
[283] Hi, I'd like to reserve a table for Wednesday the 7th.
[284] For seven people?
[285] It's for four people.
[286] Four people when...
[287] Today, night?
[288] Next Wednesday at 6 p .m. Oh, actually we leave here for like upro like five people.
[289] For people you can come.
[290] How long is the way usually to be seated?
[291] For when tomorrow or a weekday or?
[292] Four next Wednesday.
[293] Uh, the seven.
[294] Oh, no, it's not too busy.
[295] You can count for four people, okay?
[296] Oh, I got you.
[297] Thanks.
[298] Bye -bye.
[299] The audience is laughing and applauding because the man making the call isn't a man, but a machine.
[300] It didn't seem like it was a robotic voice.
[301] The robotic voices were used to are the voices like, like when you are in a parking garage and you hear the, you know, please place your ticket with the stripe facing to the right.
[302] Very, very canned sort of speech.
[303] This was far more sophisticated and much more like you and I talk with hesitations and pauses and ums and ahs.
[304] You think it's a human on the other end.
[305] Now, of course, one of the things about that voice that Google had is that it did seem like a convincing voice.
[306] But if you need to convince me that that voice is not just a human voice, but a particular human's voice, you need to convince me that this is not just anyone calling for a restaurant reservation, but it's Barack Obama calling for a restaurant reservation.
[307] Presumably now the bar is much, much higher.
[308] That's right.
[309] It is.
[310] Barack Obama, though, does have a ton of his audio on the Internet, and there's a lot more audio to make his voice than there is my voice, for example.
[311] And so, yeah, but it's absolutely possible to learn.
[312] and if you have long enough, you can learn anybody's voice and if you have enough data.
[313] It's not hard to see how bad actors could misuse this.
[314] Create havoc in people's lives, trouble -at companies, political misinformation.
[315] That's absolutely, I mean, we're seeing deep fakes in video.
[316] We've seen, you know, President Obama's face with being manipulated and the audio coming out, you know, people creating these fake media in video.
[317] you're also seeing it audio, that's exactly why the security aspects of what we're doing are trying to detect.
[318] Is that fake audio or is that real?
[319] Is part of that fake audio or is part of that real, right?
[320] So it is, this isn't completely sci -fi.
[321] It isn't so far away.
[322] It isn't necessarily 20, you know, 2028.
[323] It's probably 2020.
[324] So we've got to get our defenses up in terms of questioning where that the authenticity of audio just as we do video.
[325] In 2017, a Canadian company called Liarbird showed how audio deepfakes might work in politics.
[326] This is huge.
[327] They can make us say anything now, really anything.
[328] The good news is that they will offer the technology to anyone.
[329] This is huge.
[330] How does their technology work?
[331] Hey, guys, I think that they use deep learning and artificial neural networks.
[332] By 2019, deep fake audio technology had gotten even better.
[333] Shortly after critics panned the final season of Game of Thrones, a YouTube channel called Eating Things with Famous People put out this tongue -in -cheek video showing the supposed remorse of the lead character, John Snow.
[334] It's time for some apologies.
[335] I'm sorry we wasted your time.
[336] I'm sorry we didn't learn anything from the ending of Lost.
[337] I have more lines in this video than I had in the last season.
[338] I'm sorry we wrote this in like six days or something.
[339] Now, let's burn the script at season eight.
[340] I just forget it forever.
[341] Spoofing a TV show is one thing.
[342] But imagine such high -quality deep fakes occurring in a more high -stake setting.
[343] Voices are increasingly being used by financial institutions to authenticate the identities of consumers.
[344] Recently, Rupel worked with a bank to assess.
[345] how vulnerable it was to vocal hacking.
[346] We tested their authentication system by creating synthetic samples or synthetic voices of particular individuals who are enrolled in their authentication system, and we tried to test those voices against the system to see if we could get through with the synthetic voices.
[347] And we were not able to do that for every single voice, but we were able to do it for some voices.
[348] And so it just starts to show.
[349] that there is a vulnerability in this technology.
[350] So how would you guard against it?
[351] Well, there are many ways to guard against it.
[352] One is you can classify the difference between, is the audio signal I'm listening to?
[353] Is it synthetic or is it human?
[354] As the synthetic voices become better and better sounding, that will be a more difficult decision to make.
[355] And it is something that if we can proactively solve, I think, or at least start to address, we're going to be way ahead of the curve than if we're trying to clean up our mess after the fact.
[356] Despite the potential risks of these new technologies, RuPaul is also optimistic.
[357] Voice synthesis tools have the potential to allow people to craft the voice they hear on the outside so that it matches the identity they feel on the inside.
[358] Ideally, in the future, these decisions are made by the end user themselves.
[359] Like, oh, I actually want that to sound a little breather.
[360] I'd love that to sound a little bit more confident.
[361] And, I mean, how does that translate to the acoustics?
[362] We don't quite know yet.
[363] But that's actually, I think, where when we can finally give the control of what the voice sounds like to the individual, I mean, that's the Holy Grail.
[364] She's for me. She's on me. I am teacher while reading the machine.
[365] You are listening to the voice of a machine.
[366] The first kind of aid to be considered as a talking aid for the vocally handicapped.
[367] I am beautiful Betty, the standard female voice.
[368] I am the standard male voice, perfect Paul.
[369] I was sad because there was no ice cream in the freezer.
[370] The sky is clear and the stars are twinkling.
[371] I can't wait for my friends to hear my voice.
[372] One cup is eight flu ounces.
[373] You are listening to a machine.
[374] This is huge.
[375] How does their technology work?
[376] Hey guys, I think that they use deep learning and artificial neural networks.
[377] Hi, I'd like to reserve a table for Wednesday, the 7.
[378] This week's show was produced by Thomas Liu.
[379] It was edited by Tara Boyle and Raina Cohen.
[380] Our team includes Parth Shah, Jenny Schmidt, and Laura Querell.
[381] Special thanks to Brent Bachman, Greg Sauer, and Kvon Jones.
[382] Our unsung hero this week is Rebecca Ralph.
[383] She's part of NPR's team looking at our changing interactions with smart speakers.
[384] She helped us record some of the smart devices you heard in this week's episode.
[385] Thanks, Rebecca.
[386] If you like this episode, please share it with a friend.
[387] We're always looking for new people to discover hidden brain.
[388] I'm Shankar Vedantam, and this is NPR.