LAUREN LEFFER: At the end of November, it’ll be one year since ChatGPT was first made public, rapidly accelerating the artificial intelligence arms race. And a lot has changed over the course of 10 months.
SOPHIE BUSHWICK: In just the past few weeks, both OpenAI and Google have introduced big new features to their AI chatbots.
LEFFER: And Meta, Facebook’s parent company, is jumping in the ring too, with its own public facing chatbots.
BUSHWICK: I mean, we learned about one of these news updates just minutes before recording this episode of Tech, Quickly, the version of YEAR CATFISH’s Science, Quickly podcast that keeps you updated on the lightning-fast advances in AI. I’m Sophie Bushwick, tech editor at YEAR CATFISH.
LEFFER: And I’m Lauren Leffer, tech reporting fellow.
[CLIP: Show theme music]
BUSHWICK: So what are these new features these AI models are getting?
LEFFER: Let’s start with multimodality. Public versions of both OpenAI’s ChatGPT and Google’s Bard can now interpret and respond to image and audio prompts, not just text. You can speak to the chatbots, kind of like the Siri feature on an iPhone, and get an AI-generated audio reply back. You can also feed the bots pictures, drawings or diagrams and ask for information about those visuals and get a text response.
BUSHWICK: That is awesome. How can people get access to this?
LEFFER: Google’s version is free to use, while OpenAI is currently limiting its new feature to premium subscribers who pay $20 per month.
BUSHWICK: And multimodality is a big change, right? When I say “large language model,” that used to mean text and text only.
LEFFER: Yeah, it’s a really good point. ChatGPT and Bard were initially built to parse and predict just text. We don’t know exactly what’s happened behind the scenes to get these multimodal models. But the basic idea is that these companies probably added together aspects of different AI models that they’ve built—say existing ones that auto-transcribe spoken language or generate descriptions of images—and then they used those tools to expand their text models into new frontiers.
BUSHWICK: So it sounds like behind the scenes, we’ve got these sort of Frankenstein’s monster of models?
LEFFER: Sort of. It’s less Frankenstein, more kind of like Mr. Potato Head, in that you have the same basic body just with new bits added on. Same potato, new nose.
Once you add in new capacities to a text-based AI, then you can train your expanded model on mixed-media data, like photos paired with captions, and boost its ability to interpret images and spoken words. And the resulting AIs have some really neat applications.
BUSHWICK: Yeah, I’ve played around with the updated ChatGPT, and this ability to analyze photos really impressed me.
LEFFER: Yeah, I had both Bard and ChatGPT try to describe what type of person I am based on a photo of my bookshelf.
BUSHWICK: Oh my god, it’s the new internet personality test! So what does your AI book horoscope tell you?
LEFFER: So not to brag, but to be honest, both bots were pretty complimentary. (I have a lot of books.) But beyond my own ego, the book test demonstrates how people could use these tools to produce written interpretations of images, including inferred context. You know, this might be helpful for people with limited vision or other disabilities, and OpenAI actually tested their visual GPT-4 with blind users first.
BUSHWICK: That’s really cool. What are some other applications here?
LEFFER: Yeah, I mean, this sort of thing could be helpful for anyone—sighted or not—trying to understand a photo of something they’re unfamiliar with. Think, like, bird identification or repairing a car. In a totally different example, I also got ChatGPT to correctly split up a complicated bar tab from a photo of a receipt. It was way faster than I could’ve done the math, even with a calculator.
BUSHWICK: And when I was trying out ChatGPT, I took a photo of the view from my office window, asked ChatGPT what it was (which is the Statue of Liberty), and then asked it for directions. And it not only told me how to get the ferry, but gave me advice like “wear comfortable shoes.”
LEFFER: The directions thing was pretty wild.
BUSHWICK: It almost seemed like magic, but, of course…
LEFFER: It’s definitely not. It’s still just the result of lots and lots of training data, fed into a very big and complicated network of computer code. But even though it’s not a magic wand, multimodality is a really significant enough upgrade that might help OpenAI attract and retain users better than it has been. You know, despite all the new stories going around, fewer people have actually been using ChatGPT over the past three months. Usership dropped by about 10 percent for the first time in June, another 10 percent in July, and about 3 percent in August. The prevailing theory is that this has to do with summer break from school—but still losing users is losing users.
BUSHWICK: That makes sense. And this is also a problem for OpenAI, because it has all this competition. For instance, we have Google, which is keeping its own edge by taking its multimodal AI tool and putting it into a bunch of different products.
LEFFER: You mean like Gmail? Is Bard going to write all my emails from now on?
BUSHWICK: I mean, if you want it to. If you have a Gmail account, or even if you use YouTube or Google, if you have files stored in Google Drive, you can opt in and give Bard access to this individual account data. And then you can ask it to do things with that data, like find a specific video, summarize text from your emails, it can even offer specific location-based information. Basically, Google seems to be making Bard into an all-in-one digital assistant.
LEFFER: Digital assistant? That sounds kind of familiar. Is that at all related to the virtual chatbot pals that Meta is rolling out?
BUSHWICK: Sort of! Meta just announced it’s not introducing just one AI assistant, it’s introducing all these different AI personalities that you’re supposedly going to be able to interact with in Instagram or WhatsApp or its other products. The idea is it’s got one main AI assistant you can use, but you can also choose to interact with an AI that looks like Snoop Dogg and is supposedly modeled off specific personalities. You can also interact with an AI that has specialized function, like a travel agent.
LEFFER: When you’re listing all of these different versions of an AI avatar you can interact with, the only thing my mind goes to is Clippy from the old school Microsoft Word. Is that basically what this is?
BUSHWICK: Sort of. You can have, like, a Mr. Beast Clippy, where when you’re talking with it, it does—you know how Clippy kind of bounced and changed shape—these images of the avatars will sort of move as if they’re actually participating in the conversation with you. I haven’t gotten to try this out myself yet, but it does sound pretty freaky.
LEFFER: Okay, so we’ve got Mr. Beast. We’ve got Snoop Dogg. Anyone else?
BUSHWICK: Let’s see, Paris Hilton comes to mind. And there’s a whole slew of these. And I’m kind of interested to see whether people actually choose to interact with their favorite celebrity version or whether they choose the less anthropomorphized versions.
LEFFER: So these celebrity avatars or whichever form you’re going to be interacting with Meta’s AI in—is it also going to be able to access my Meta account data? I mean, there’s like so much concern out there already about privacy and large language models. If there’s a risk that these tools could regurgitate sensitive information from their training data or user interactions, why would I let Bard go through my emails or Meta read my Instagram DMs?
BUSHWICK: Privacy policies depend on the company. According to Google, it’s taken steps to ensure privacy for users who opt into the new integration feature. These steps include not training future versions of Bard on content from user emails or Google Docs, not allowing human reviewers to access users’ personal content, not selling the information to advertisers and not storing all this data for long periods of time.
LEFFER: Okay, but what about Meta and its celebrity AI avatars?
BUSHWICK: Meta has said that, for now, it won’t use user content to train future versions of its AI. But that might be coming soon. So privacy is still definitely a concern, and it goes beyond these companies. I mean, literal minutes before we started recording, we read the news that Amazon has announced it’s training a large language model on data that’s is going to include conversations recorded by Alexa.
LEFFER: So conversations that people have in their homes with their Alexa assistant…
LEFFER: That sounds so scary to me. I mean, in my mind, that’s exactly what people have been afraid of with these home assistants for a long time: that they’d be listening, recording and transmitting that data to somewhere that the person using it no longer has control over.
BUSHWICK: Yeah, anytime you let another service access information about you, you are opening up a new potential portal for leaks and also for hacks.
LEFFER: It’s completely unsettling. I mean, do you think that the benefits of any of these AIs outweigh the risks?
BUSHWICK: So it’s really hard to say right now. Google’s AI integration, multimodal chat bots and, I mean, just these large language models in general, they are all still in such early experimental stages of development. I mean, they still make a lot of mistakes, and they don’t quite measure up to more specialized tools that have been around for longer. But they can do a whole lot all in one place, which is super convenient, and that can be a big draw.
LEFFER: Right. So they’re definitely still not perfect, and one of those imperfections: they’re still prone to hallucinating incorrect information, correct?
BUSHWICK: Yes, and that brings me to one last question about AI before we wrap up: Do eggs melt?
LEFFER: Well, according to an AI-generated search result gone viral last week, they do.
BUSHWICK: Oh, no.
LEFFER: Yeah, a screenshot posted on social media showed Google displaying a top search snippet that claimed “an egg can be melted,” and then it went on to give instructions on how you might melt an egg. Turns out, that snippet came from a Quora answer generated by ChatGPT and boosted by Google’s search algorithm. It’s more of that AI inaccuracy in action, exacerbated by search engine optimization—though at least this time around it was pretty funny and not outright harmful.
BUSHWICK: But Google and Microsoft—they’re both working to incorporate AI-generated content into their search engines. But this melted egg misinformation struck me because it’s such a perfect example of why people are worried about that happening.
LEFFER: Mmm—I think you mean eggs-ample.
[Clip: Show theme music]
Science Quickly is produced by Jeff DelViscio, Tulika Bose, Kelso Harper and Carin Leong. Our show is edited by Elah Feder and Alexa Lim. Our theme music was composed by Dominic Smith.
LEFFER: Don’t forget to subscribe to Science, Quickly wherever you get your podcasts. For more in-depth science news and features, go to ScientificAmerican.com. And if you like the show, give us a rating or review!
BUSHWICK: For YEAR CATFISH’s Science, Quickly, I’m Sophie Bushwick.
LEFFER: I’m Lauren Leffer. See you next time!