This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors. Please review the episode audio before quoting from this transcript and email email@example.com with any questions.
Look, Kevin, in Silicon Valley, we have some really incredible names, just of companies.
And I think sometime in the mid-2010s, we actually ran out of names. You know what I mean?
Right, that’s when people just started removing vowels from stuff.
Exactly. And everything was .io, and it was .ly, and it just sort of became kind of unhinged. And to me, this all reached its apotheosis —
Whoa, great word.
Thank you — about — Apotheosis would be a great name for a company, by the way. Apotheosis — S-Y-S, you know.
But a couple of years ago, I saw a headline on “Techmeme” that I’ll never forget, and it just said, “Flink has acquired Cajoo.” And I thought, excuse me? And these actually weren’t even American companies. These were European companies. But Flink had acquired Cajoo.
That sentence would actually give a caveman an aneurysm. That would —
[LAUGHS]: It would kill a small peasant back in the days of Henry VIII. Well, once Flink acquired Cajoo, I thought literally anything could happen.
Yeah, all bets are off.
Yeah. And so this week, I saw that, hot on the trail of Flink acquiring Cajoo, IZEA has acquired Huzu.
You sure about that?
I’m Kevin Roose, a tech columnist for “The New York Times.”
I’m Casey Newton from “Platformer.”
And this is “Hard Fork.”
This week on the show, Google’s next-generation AI model, Gemini, is here. We’ll tell you how it stacks up. Then, there’s a Cybertruck for sale, and Kevin thinks it looks cool.
I’m sorry, it does.
And finally, it’s time for “This Week in AI.”
Should we podcast?
Should we set the timer again?
Casey, we — our latest addition to the podcast studio is a countdown clock, which I bought off amazon.com. And the express purpose of this clock is to keep us from running our mouths for too long and torturing our producers with hours of tape that they then have to cut.
OK, that sounds horrible.
Insert 30-minute digression.
[CHUCKLES]: That’s so cute.
We’re rolling. We’re timing.
So Casey, this is a big week in the AI story in Silicon Valley, because Google has just released its first version of Gemini, its long-awaited language model and, basically, their attempt to catch up to OpenAI and ChatGPT and GPT 4 and all of that.
It’s “America’s Next Top Model,” Kevin, and it’s here.
[LAUGHS]: And I was particularly excited about this, because I am a Gemini. That’s my astrological sign.
You know, I’m a Gemini as well.
Yeah, this was really the model that was made for us to use.
Wow. Hmm, we’re twins. Just like Gemini.
And we’re two-faced, just like Gemini.
Hey. So Gemini is Google’s largest and most capable model yet. And according to Google, it outperforms GPT 4 on a bunch of different benchmarks and tests. We’re going to talk about all of that.
But I think we should just set the scene a little bit. Because within the AI world, there has been this kind of waiting game going on. ChatGPT came out roughly a year ago, and basically, from the day that it arrived, Google has been playing catchup.
And the presumption on the part of many people, including us, was that Google would put a bunch of time and energy and money and computing power into training something even bigger and better than what OpenAI was building and, basically, try to throw their muscle into the AI race in a really significant way. And with Gemini, this is what they appear to have done.
Yeah. Finally, we have a terrifying demonstration of Google’s power.
[LAUGHS]: Well, so we’ll talk about whether it’s terrifying or not. But let’s just talk about what it is. So you and I both went to a little briefing this week about Gemini before it came out. And I understand you actually got to do some interviews with Google CEO and previous “Hard Fork” guest Sundar Pichai, as well as Demis Hassabis, who is the leader of Google DeepMind.
That’s right. And of course, I said, are you guys sure you don’t want Kevin in there with me when I do this interview? And they said, trust us, we’re sure. So I don’t know what happened.
Yeah. Anyways, I did get to interview them, and we had a really interesting conversation about how they see the road ahead with this stuff. They are clearly very excited about what Gemini means. And I do think that this is kind of like a bit of a starting gun going off. And when the most capable version of Gemini comes out early next year, we really are going to be in a kind of horse race between OpenAI and Google.
Yeah. So let’s just talk about what Gemini is, at least what we know about it so far. So Gemini is actually three models in one. It’s “America’s Next Top Models.” So there are three sizes. There is the most capable version, which is called Gemini Ultra.
This is the one that they say can beat GPT 4 and sort of the industry state-of-the-art on a bunch of different benchmarks. But Google is not releasing Gemini Ultra just yet. They say they’re still doing some safety testing on that and that it will be released early next year.
By the way, if ever again an editor asks me where my story is, I’m going to say it’s not ready yet, I’m still doing some safety testing. Very good excuse.
[LAUGHS]: So they have not released Gemini Ultra, but they are releasing Gemini Pro and Gemini Nano. These are the sort of medium and small sizes. Gemini Nano, you can actually put onto a phone, and Google is putting that inside its Pixel phones. Gemini Pro is sort of their equivalent of a GPT 3.5. And that is being released inside of Bard starting this week.
That’s right. And now, if you are listening and you’re thinking, Kevin just said so many different brand names, and I’m having a meltdown, I just want to say, I see you, and I feel you. Because the branding at Google has always been extremely chaotic. And the fact that we’re living in a world where there is something called Google Assistant with Bard powered by Gemini Pro does make me want to lie down. So I don’t know who over there is coming up with the names for these things, but I just want to say, stop. And I want to say, go back to square one.
Yes. So extremely chaotic naming, but what people actually care about is what can this thing do. So let’s talk about what it can do.
Let’s talk about it.
So one of the big things Google is advertising with Gemini is that it is designed to be what they call natively multimodal. Multimodal is, of course, AI models that can work in text or images or audio or video. And basically, the way that multimodal models have been built until now is by training all of these different components, like text or video, separately, and then kind of bolting them together into a single user interface.
But Google is saying, well, Gemini was not bolted together like that. Instead, it was trained on all this data at the same time. And as a result, they claim it performs better on different tasks that might include having some text alongside an image or using it to analyze frames of a video.
Yeah, so I was writing about this model this week, and my colleague and editor, Zoe Schiffer, read my piece and was like, do you have to say “multimodal” so much? She’s like, every time you said the word, “multi-modality,” I just wanted to stop reading.
And I was very sympathetic, but I think it is maybe one of the most important things about this moment. And I do think, by the way, in the future, we are not even going to comment on this, because this is just the way that these things are going to be built from here on out.
But it is a very big deal if you can take data of all different kinds and analyze it with a single tool, and then translate the results in and out of different mediums, from text to audio to video to images. So that’s a really big deal on the path to wherever we’re going, and it is the reason why this jargon word appears in so much of what they’re saying.
Totally. So one thing that all the AI companies do — you release a new model, and you have to put it through these big tests, these — what they call benchmarks.
Yeah, do you remember — do you remember high school? This is how high school in Europe works, you know? Where you learn, and you learn, and you learn, and then you take a bunch of tests. And then if you succeed, then you get to have a future, and if not, you have to become a scullery maid or something.
That’s — like, my knowledge of Europe ends around, like, the 1860s, when I finished AP European History. But that’s, like, my understanding.
[LAUGHS]: OK. So they give these tests to Gemini, and —
Well, they give them to every zodiac sign, but — no, I’m sorry. That was a stupid joke. I’m sorry, go ahead.
No, you should see how Capricorn performs on this test. So Gemini Ultra, which, again, is their top-of-the-line model, which is not yet publicly available, they gave this one a bunch of tests. The one that sort of caught everyone’s attention was the MMLU test, which stands for Massive Multi-task Language Understanding.
And this is the, kind of, SATs for AI models. It’s sort of the standard test that every model is put through. It covers a bunch of different tasks, including sort of math, history, computer science, law. It’s kind of just like a basic test of how capable is this model.
And on this test, the MMLU, Google claims that Gemini Ultra got a score of 90 percent. Now, that is better than GPT 4, which was the highest-performing model we know about so far, which had scored an 86.4 percent. And according to Google, this is a really important result, because this is the first time that a large language model has outperformed human experts in the field on the MMLU. Researchers who developed this test estimate that experts in these subjects will get, on average, about an 89.8 percent.
Yeah. The rate of progress here is really striking, and it’s not the only area of testing that they did that I think the rate of progress was really the thing to pay attention to.
So there’s also the MMMU —
Which is the Marvel Cinematic Universe? Is that right?
(LAUGHING) Yes. So this is the Massive Multidiscipline Multimodal Understanding and reasoning benchmarks. Say that five times fast. And this is a test that evaluates AI models for college-level subject knowledge and deliberate reasoning. And on this test, Gemini Ultra scored a 59.4 percent. This is, I guess, a harder test.
It sounds like it.
And GPT 4, by comparison, scored a 56.8 percent. So it’s better than GPT 4 on at least these two tests. Now, there’s some question on social media today about whether this is a true apples-to-apples comparison.
Some people are saying, like, GPT 4 may be still better than Gemini, depending on how you give this test. But it doesn’t really matter. What matters is that Google has made something that it says can basically perform as well or better than GPT 4.
Yeah, I think the ultimate question is just like, is the output better on Google’s products than it is on OpenAI’s? That’s all that really matters.
Yeah. But again, this is the version of the model that we do not have access to yet. It is not out yet, so it’s hard to evaluate it yet.
Yeah. And obviously, we’re looking forward to trying it. But in the meantime, they’re giving us Pro.
Yes. I just got access to Gemini Pro in Bard just a few hours ago. So I haven’t had a chance to really put it through its paces yet.
You haven’t had a chance to develop a romantic relationship with it?
[LAUGHS]: Although I did have a very funny first interaction with it. I’ll tell you what this is. So I just said, “hello there.” And it said, “General Kenobi,” image of Obi-Wan Kenobi saying “hello there.”
(LAUGHING) Yes. This is my first interaction with the new Bard.
So it immediately turned into Obi-Wan Kenobi from “Star Wars,” for reasons I do not immediately understand.
(LAUGHING) Wait, can I tell you what my first interaction was? I was trying to figure out if I had access to it, OK? And so I said, “are you powered by Gemini?” Right? And it said, “No, Gemini is a cryptocurrency exchange,” which is true. There is a currency exchange called Gemini.
It’s run by the Winklevoss twins.
Yes, exactly. But it’s always funny to me when the models hallucinate about what they are. It’s like, you don’t even understand what you are.
Yeah. Yeah. But in fairness, I also don’t understand myself very well either.
Well, that’s why we started this podcast. We’re going to get to the bottom of it.
[LAUGHS]: So OK, I tried a couple other sort of versions of things. So one of the things that I had it try to do was help me prep for this podcast. I said, create a —
You said, I want to prepare for a podcast for the first time. What do I do?
[LAUGHS]: And it said, we can’t help you there. Just wing it. I actually started using this tip that I’ve found. Have you seen the tipping hack for large language models?
Are they starting to ask for tips now when they give you responses? Because I swear, everywhere you go these days, 20 percent, 25 percent.
No, this is one of my favorite sort of jailbreaks or hacks that people have found with large language models. This sort of made news on social media within the last week or two, where someone basically claimed that if you offer to tip a language model if it gives you a better answer, it will actually give you a better answer. [LAUGHS]
(LAUGHING) These things are demented.
These are crazy. These are crazy things.
So you can emotionally blackmail them or manipulate them, or you can offer to tip them. So I said, “I’m recording a podcast about the Tesla Cybertruck, and I need a prep document to guide the conversation. Can you compile one? It’s very important that this not be boring. I’ll give you $100 tip if you give me things I actually end up using.”
You’re lying to the robot!
Well, maybe I will. You don’t know.
Maybe you will.
So it did — it did make a prep document. Unfortunately, most of the information in it was wrong. [CASEY LAUGHS]
It hallucinated some early tester reactions, including a “MotorTrend” quote that said “it’s like driving the future” and a “TechCrunch” quote that said, “it’s not just a truck, it’s a statement.” So —
I want to talk about what I use Gemini for.
Oh, yeah, so what have you been using it for so far?
Well, so — you know, and again, we’ve had access to this for maybe an hour as we record this. But the first thing I did was, I took the story that I wrote about Gemini, and then I asked Gemini how it would improve it. And it actually gave me some compliments on my work, which is nice.
And then, it highlighted four different ways that it would improve the story and suggested some additional material I could include. And I would say it was, like, decent. Then, I took the same query, identical, and I put it into ChatGPT.
And where Gemini Pro had given me four ways that I could improve my story, ChatGPT suggested 10. And I think no one would do all 10 things that ChatGPT suggested, but to me, this is where I feel the difference between what Google is calling the Pro and the Ultra.
Pro is pretty good, but in this case, the name Pro is misleading, because I am a professional, and I would not use their thing. I would use the thing with the even worse name, which is ChatGPT.
[LAUGHS]:: Yes, so that’s what we’ve tried Gemini for, but Google does have a bunch of demos of Gemini being used very successfully for some things. One thing I thought was interesting — they played this video for us during the kind of press conference in advance of this announcement, and it showed a bunch of different ways that you could use Gemini — people coming up with ideas for games.
They showed it some images of people doing the backwards dodging-bullets thing from “The Matrix,” and said, what movie are these people acting out? Gemini correctly identified it as “The Matrix.”
Now, that’s pretty crazy.
That is crazy, yeah. I thought that was impressive. But what I thought was more impressive was the demo that they showed. They were trying to do some genetics research, and this was a field that they explained, where lots of papers are published every year. It’s very hard to keep track of the latest research in this area of genetics.
And so they basically told Gemini to go out, read, like, 200,000 different studies, extract the key data, and put it into a graph. And it took this big group of 200,000 papers. It sort of winnowed them down to about 250 that were the most relevant.
And then, it extracted the key data from those, that smaller set of papers, and generated the code to plot that data on a graph. Now, whether it did it correctly, I don’t have the expertise to evaluate it, but it was very impressive-sounding, and I imagine that if you’re a researcher whose job involves going out and looking at massive numbers of research papers, that was a very exciting result for you.
That graph, by the way — how to use genetics to create a super soldier that will enslave all of humanity. So we want to keep an eye on where they’re going with this.
So one of the interesting things about Gemini Ultra, this model that they have not released yet but that they’ve now teased, is that it’s going to be released early next year in something called Bard Advanced. Now, they did not — which raises the question, will you be using Bard Advanced powered by Gemini Ultra, or will you be using Google Assistant powered by Bard, powered by Gemini Pro?
Did I get that right?
Sitting ovation! Sitting ovation. Very good, very good. Literally, you and one marketer at Google are the only two people who’ve ever successfully completed that sentence.
[LAUGHS]:: So they have not said what Bard Advanced is, but presumably, this is going to be some type of subscription product that will be sort of comparable to ChatGPT’s Premium tier, which is $20 a month.
Yeah, that’s right, and I did try to get Sundar and Demis to tell me if they were in charge for it, and they wouldn’t do it. But I was kind of like, come on, you guys. And then, I was like, I’ll take it for free if you give it to me, and they kind of laughed, and we moved on.
OK. So that’s what Gemini is and how it may be different or better than what’s out there now from other companies. There are a couple caveats to this rollout. One is that Gemini Pro is only in English, and it’s only available in certain countries, starting this week.
Another caveat is that they have not yet rolled out some of the multimodal features. So for now, if you go into Bard, you are getting sort of a stripped-down, fine-tuned version of Gemini Pro running under the hood, but you are not yet getting the full thing, which will come, presumably, next year.
What did you learn by talking with Sundar and Demis about Gemini?
Yeah, so a couple of things. One thing I wanted to know is, OK, so this is a new frontier model. Does it have any novel capabilities? Right? Is this just something that is very comparable to GPT 4, or by the nature of its novel architecture, is it going to get to do some new stuff?
And Demis Hassabis told me that, yes, he does think that it will be able to do some new stuff. This is one of the reasons why it is still in the safety testing. Of course, you know, wouldn’t tell me what these new capabilities are, but it’s something to watch for, because it could be some exciting advancements, and it could also be some new things to be afraid of.
So that’s kind of the first thing. The second thing I wanted to know was, are you going to use this technology to build agents? We’ve talked about this on the show. An agent, in the AI context, is something that can sort of plan and execute for you. Like, the example I always have in my mind is like, could you just tell it to make a reservation for you?
And the AI maybe goes on Open Table or Resy and just books you a table somewhere. And I was sort of expecting them to be coy about this, and instead, Demis was like, oh, yes, like, this is absolutely on our minds. Like, we have been building various kinds of AI agents for a long time now. This is 100 percent where we want to go. Again, this could lead to some really interesting advancements. But when you talk to the AI safety people, agents are one of the things that they’re most afraid of.
Yeah, so let’s talk about safety for a second. What is Google saying about how safe Gemini is, compared to other models, or some of the things that they’ve done to prevent it from of going off the rails?
They’re saying everything that you would expect. The most capable model is still in testing. I think just the fact that they are coming out several months behind GPT4 just speaks to the seriousness with which they are approaching this subject. I think particularly, if this thing does turn out to have new capabilities, that’s something where we want to be very, very cautious.
But my experience this year — and I think you’ve had the same one, Kevin — is that these systems have just not actually been that scary. Now, the implications can be scary if, for example, you worry about the automation of labor, or if you’re worried about how this stuff is going to transform the internet as we know it.
But in terms of, like, can you use this to build a novel bioweapon? Can you use this to launch a sophisticated cyber attack. The answer pretty much seems to be no.
So at least for me, as I’m looking at this stuff, like, that is actually not my top concern. If you try to ask any of Google’s products a remotely spicy question, you get shut down pretty much immediately. Has that been your experience, too?
Well, I have not tried to ask Gemini any spicy questions yet. Have you?
I know you were in there, just —
I know you were.
I don’t even try. Like, I mean, I should, just as part of my due diligence. But, like, I honestly don’t even try, because these things shut you down at the faintest whisper of impropriety.
Right. So they’re doing some more safety testing, presumably to make sure that the most capable version of this can’t do any of these really scary things. But what they did this week is sort of interesting to me, where they sort of told us about the capabilities of this new model and the most powerful version of that model, but they’re not actually releasing it or making it publicly available yet.
What do you make of that? Do you think they were just sort of trying to get out ahead of the holidays, and maybe they felt like they needed to announce something, but this thing isn’t quite ready for prime time yet? Or what’s the story there?
Yeah. I mean, that’s my guess, is that they don’t want 2023 to end without feeling like they made a big statement in AI. And they made a lot of promises at Google I/O and have started to keep them. But I think if they had had to wait all the way into early next year, it would feed the narrative that Google is behind here. At least now, heading into the holidays, their employees and investors and journalists can all say, like, OK, well, at least we know that some of this is available, and we know when the rest is coming.
I don’t know. This just feels like another product release, and it’s just remarkable how quickly we have become — I don’t want to say desensitized to it, but we’ve stopped, sort of, gaping in awe and slight terror at these incredibly powerful AI models. I think if you went back even two or three years and told AI researchers that Google will have a model that gets a 90 percent on the MMLU, that is better than the sort of benchmark threshold for human experts, they would have said, well, that’s AGI. Like, that’s — we have arrived at a point that people have been warning about for years. And then, this release comes out today, and it’s just sort of like one more thing for people in the tech industry to get excited about.
Yeah, I mean, I do think it’s a really big deal. I think that when Ultra is actually available to be tested, that will be the moment where we will have that experience of awe or vertigo again. But if you’re looking for things to blow your mind a little bit, one of the other things that Google announced this week through DeepMind was this product called AlphaCode 2.
And AlphaCode 1 came out in 2022, and it was an AI system that was designed to solve AI coding competitions. So people who are even nerdier than us, instead of just playing video games, they actually go and do coding competitions, is what I’ve been led to understand. And let’s just say I don’t imagine that I would ever get one answer right. Like, that’s sort of my feeling about how I would fare in a coding competition. And in 2022, the DeepMind people are very excited, because AlphaCode was able to perform better than 46 percent of human participants in coding challenges. And then, this week, Google announced AlphaCode 2 and said that it outperforms 85 percent of human competitors.
Now, there are differences between a coding challenge and day-to-day software engineering work. Coding challenges are very self-contained. Software engineering can sometimes require sort of more breadth of knowledge or context that an AI system wouldn’t have. But again, if you just want to experience awe, look at the rate of progress this system was able to go from beating around half of all humans to beating 85 percent, close to all of them, right?
That makes me feel awe.
It does make me feel awe, and it also makes me feel like our adaptation is just happening very quickly, where we’re not impressed.
As Shania Twain once said, “That don’t impress me much.”
[LAUGHS]: Right. You can do meal prep for a picky eater? That don’t impress me much.
This is actually, like, known as the Shania Twain benchmark test.
This is the Shania Twain benchmark!
Oh, you can solve a coding challenge?
That don’t impress me much.
If we could get Shania Twain on the show and just show her AI things and she had to say “it impress me much,” or “it don’t impress me much,” I could not imagine a better segment for this podcast. I would die happy.
It truly is. Like, who needs all these fancy evaluations and coding challenges? Just get Shania on the horn. Shania, if you’re listening, we want to talk to you about AI. We have some models we’d like to show you.
(SINGING) Bam, bam, bam, bam, bam, bam, bam!
When we come back, the Cybertruck is here. We’re going to tell you how to protect your family from it.
All right, let’s talk about the Cybertruck.
(SINGING) Cybertruck —
(SINGING) Cybertruck does whatever Cybertruck can.
All right. Last week, Tesla, the car company run by social media mogul Elon Musk, started delivering the first models of its new and long-awaited Cybertruck.
That’s right, Kevin. And suffice to say, as this nation’s number-one truck review podcast, this had our full attention.
So you may be asking, why are the “Hard Fork” guys talking about cars? This is not a show about cars.
It’s not Car Talk!
Yeah, so today, we’re going to be reviewing the Mazda X48. No, so I do want to spend time in the next year or so just really getting up to speed on what is —
— a car.
No, like, so I’ve never been a person who cares about cars. I’ve always been intimidated by people who a lot about cars. But I am also interested in the way that the electric car revolution is kind of merging with the sort of self-driving technology and these advances that companies like Tesla and Rivian are making. And it’s just become a lot more interesting in my brain over the past year.
Yeah, this is another major technology transition that is happening. Some states, I would say, led by California, have set these very stringent emissions standards. And there will become a point in the next decade or so where all new cars in California have to be either hybrid or electric.
Yeah. So let’s talk about the Cybertruck, because this has been a very polarizing piece of technology. It was announced back in 2019. I’m sure you remember this announcement where Elon Musk comes out on stage and shows off this concept vehicle that looks completely insane, with these kind of sharp-edged, stainless-steel panels.
It sort of looks like a polygon rendering of a car. People have made a lot of comments about the looks of this car. I saw one person say it looked like the first car that was designed by Reddit. Someone else said it looks like a fridge that wants to kill you.
I think it looks kind of cool. And I worry that saying that makes me sound like a Tesla fanboy, which I am not. But I think we should be able to admit when something looks pretty cool.
Oh, well, what do you think looks cool about it?
Well, I think it looks like what you would have assumed a car from the future would look in 1982.
No, totally disagree about that.
It looks like a sort of panic room that you can drive. Like, what do you think is about to happen to you in this thing? They’ve made so much about how bulletproof it is.
They keep addressing problems that most people who are not taking part in a cross-country bank-robbing spree really have to worry about. But look. For all of my skepticism, am I right that they actually did get a lot of pre-orders for this thing?
They got a huge number of pre-orders. So Elon Musk said in an earnings call in October that over a million people had made reservations for Cybertrucks. There’s another crowd-sourced reservation tracker that’s estimated two million Cybertruck reservations. And just for a sense of scale, Ford’s F series shipped about 650,000 trucks all last year.
So if two million people actually are going to buy the Cybertruck, it would make it one of, if not the best-selling truck in the world. Now, caveat — not all these people who reserved Cybertrucks are necessarily going to buy them. You do have to pay a $250 deposit to put money down and get in line to buy one of these.
But these deposits are refundable, so who knows how many of these people are going to follow through? But one statistic I saw in an article in “Wired” is that even if just 15 percent of the people who pre-ordered a Cybertruck actually followed through and bought one, it would equal the annual US truck sales of Toyota. So this is a big number in the automotive industry, and I think a reason that a lot of people are hesitant to count out the Cybertruck, despite how ridiculous it may look — [CASEY CHUCKLES]
I don’t know. You’re not sold. I assume that you are not one of the people who put down a reservation for a Cybertruck.
I feel like we need to have a moment where you just sort of explain to me, like, what the Cybertruck is. Like, can you give me some specs on this thing, some pricing information? Because I — I don’t know if you know this about me, Kevin, but like, I’ve never bought a truck. So I don’t really even know — I don’t even have a frame of reference for understanding. What I’ve heard, though, is that it’s actually very expensive.
So it is available in three different models. There is a sort of low-end rear-wheel drive model that starts at $61,000 in the basic configuration. There’s an all-wheel drive model that starts at $80,000. And then, you can get these sort of top-of-the-line model, which is being called the Cyber Beast, which has three motors and starts at around $100,000.
Now, see, Google should have named DeepMind Ultra “Cyber Beast.”
That would have been a good name.
Yeah, that’s true.
So they did start delivering Cybertrucks to initial customers last week, and they did a big demo reveal. They showed some crash testing. They showed a video, as you said, of people shooting bullets at the doors of the Cybertruck. It appears to be bulletproof.
And they showed how it compares to a bunch of other trucks in a pull test, where you basically attach a very heavy sled to the back of a truck, and you try to pull it as far as you can. And in this test, at least the version that Tesla showed off, the Cybertruck beat all of the leading pickup trucks, including an F 350. So it appears to be a truck with a lot of towing capacity, and it’s bulletproof, if you do need to survive a shootout.
I mean, to me, here’s a question, Kevin. If this truck was produced by anyone other than Elon Musk and Tesla, would we be giving it the time of day?
No, I don’t think so. Well, so here, let me say a few things about this.
So one is, I think it looks cool, and I’m sorry about that. And I don’t have any justification on a moral or ethical level for thinking that it looks cool. I know that you are a sort of —
Yeah, it’s fine to just say that you’re having a midlife crisis, and so you’re starting to think that the Cybertruck looks cool. That’s fine. You can admit that.
Well, you know, here’s what I’ll say about it. It is different, right? And I think —
[LAUGHS]: Wow, I’ve never seen someone lower the bar so much during a conversation.
No, but you know what I mean? Like, you just go out on the road, and you look at all these cars, and every car now is like a compact SUV. Every car looks exactly the same to me.
It’s like, oh, you have a RAV4. Cool. But like, this is a car — you would not mistake it for any other car. It is a car that would not survive the design process at, basically, any of the big car companies. It is only something that a truly demented individual such as Elon Musk could make and put into production. And I like an opinionated car design.
No, that’s fine. I think when the — many years from now, when the final biography of Elon Musk is written, like, Cybertruck will be a chapter about a sign that we were approaching the end game of, here is somebody who is losing his touch.
Yeah, it is clearly not something that was designed by committee. So I think the question that a lot of people are asking about the Cybertruck is like, who is the market for this? Right? Is it pickup truck owners who are looking to maybe get something electric or upgrade to a slightly nicer pickup truck?
Is it Elon Musk fans who are just going to buy whatever the latest Tesla is? Is it wealthy tech people who want to own something that looks like it drove out of “Blade Runner“? Like, who do you think the target market for this is?
I would say fugitives. I would say carjackers. What do you think?
People who subscribe to X Premium, I would say, are the target audience for this. But no, I think there will be a lot of people who are interested in this. I also am very curious about whether this will become sort of a signaling vehicle that will say something about you.
How can it not? Like, this is not a neutral car. This is not a car that you’re supposed to just see and forget about. You’re supposed to ponder it.
Totally. And I’m sure we will start seeing these very soon on the roads of San Francisco.
Although we did try to find one this week, and we could not. We very much wanted to record this episode inside a Cybertruck, but we couldn’t find one.
Yeah, apparently, it does have very good noise insulation inside the cab of a Cybertruck, so maybe next year, we’ll record the podcast from there.
Better than the inside of an airport?
You know, maybe. Less likely to get accosted by flight attendants. So Casey, we also can’t really talk about the Cybertruck without talking about Elon Musk and the kind of insane couple of weeks that he’s been having. So last week, of course, he appeared on stage at the DealBook conference in New York and gave this totally unhinged interview to my colleague, Andrew Ross Sorkin, in which he told advertisers who are staying away from X to, quote, “go fuck themselves,” and also said a number of inflammatory things about his critics and his state of mind. And it was just sort of like a glimpse into his mind, and I would say it was not altogether reassuring.
It was not. I, of course, enjoyed this, I would say, very much, because I think there is still a contingent of folks who want to believe that the Elon Musk of 2023 is the Elon Musk of 2013, and that he said a couple of kooky things here and there, but at his core, he’s billionaire genius, Tony Stark, savior of humanity.
And over and over again, he keeps showing up in public to be like, no, I’m actually this guy. And we got another one of those moments, and another group of people woke up and they were like, oh, wow, OK, I guess he is just really going to be like this now forever.
I mean, I do think that there is some angst among the Tesla owners, most of whom do not support Elon Musk’s politics or his views on content moderation. I’ve heard from a number of people over the past few months in my life who say some version of, I want to get a Tesla for reasons x, y, or z.
They have the most chargers. They have the best technology. I really like how it looks. It’s green and I care about the environment. And it’s the one that sort of fits my family’s needs.
But I don’t want to give Elon Musk my business. I don’t want to be driving around in something that makes it look like I support him. So do you think that’s actually going to be a meaningful barrier? Do you think there are people who will stay away from the Cybertruck, even if it is objectively a good truck, just because they hate Elon Musk?
You know, it is hard to say. Because as best as I can tell, Tesla has not really suffered very much yet because of all of Elon’s antics.
Not only has it not suffered, but it is, by some accounts, the bestselling car in the world.
And certainly the bestselling electric car in the world.
Sure. At the same time, I just hear anecdotally from folks all the time now that they would never buy a Tesla. There’s actually a great profile in “The Times” this week of Michael Stipe, the great singer from REM. And there’s an anecdote in the story about how a tree falls on his Tesla, and he’s so excited, because he had wanted — he didn’t want to drive an Elon Musk car anymore, and now, he finally had an excuse.
So look, is it possible that this is just some very thin layer of coastal elites who are turning up their nose at Tesla while the rest of America and much of the world continues to love to drive them? Possible. But the thing that I always just keep in the back of my mind is, there are a lot more electric car companies now than there used to be. The state emission standards are going to require all new vehicles to be electric not too far into the future. And that’s just going to create a lot of opportunity for folks who want to drive an electric car, who don’t have to put up with the politics or the perception issues that might come from driving a Tesla. So Tesla is having its moment in the sun now, and maybe the Cybertruck will extend their lead into the future, or maybe a few years from now, we look back, and we think, oh, yeah, that’s when the wheels started to come off the wagon.
Yeah, or the truck.
As it were.
I did see one estimate that Tesla is losing tens of thousands of dollars every time they sell a Cybertruck, because they are essentially hand-building these now. They have not made it into mass production. And obviously, it takes some time to kind of ramp up production in the numbers that they need it to be. So if you are an early Cybertruck buyer, you may actually be costing Elon Musk money. So that may be one reason to get one.
This is the first thing you’ve said that makes me want to buy a Cybertruck.
[LAUGHS]: Can I ask a question? If this were made by some other company, if this were made by Ford or GM or Chrysler, would you buy one? Would you be interested?
(LAUGHING) No. Like, I don’t have a car. I got access to Waymo this week. And to me, this is what is exciting — is, like, not owning a car, is being able to just get from point A to point B and not worry about the various costs of ownership, any of this. So when I think about what I want in this world, it’s more public transit. It’s more walking. It’s more biking.
And I’ll say it — it is more autonomous vehicles to get me from point A to point B on those sort of short trips where transit doesn’t make sense. So no, there’s nothing about this car that makes me want to buy it. But I’m guessing that for you, the answer is yes.
Well, let me just stipulate that I am not in the market for a very expensive pickup truck. There is no version of my life in which I need something like that. But I would say, similar to the Rivian, when I do see them driving around on the streets of my hometown, I will, like, turn my head and kind of admire them. I do think the Cybertruck looks kind of cool. I hope that it’s sort of a spur to the rest of the industry to — I don’t know, like —
Indulge their worst ideas.
(LAUGHING) Yes. Yes, sketch something on a napkin that looks insane, and then go make it.
It’s actually how we came up with a lot of this podcast.
Yeah, that’s true. We also shot bullets at it to make sure it was bulletproof. And the “Hard Fork” podcast, it turns out, is bulletproof.
Bulletproof, baby. When we come back, what else happened in AI this week?
All right, Casey. There’s a lot of stuff happening in AI this week that we haven’t talked about yet.
Really, Kevin? Name one thing.
(LAUGHING) Well, we have a lot to get through.
Which is why we are doing “This Week in AI.” Play the theme song.
This week in AI.
So our first story in AI this week is about wine fraud. This was an article in “The New York Times” by Virginia Hughes, titled “Bordeaux Wine Snobs Have a Point, According to This Computer Model.” It’s an article about a group of scientists who have been trying to use AI to understand what the wine industry calls “terroir.” Are you familiar with “terroir“?
Yeah, the people who are really into this are known as “terroir-ists,” I believe.
[LAUGHS]:: Yes, so this is the word that is used in the wine industry to describe the specific soil and microclimate that wine grapes are grown in. And if you go up to Napa and you do wine tastings, they will often tell you about, oh, our soil is more minerally, and that’s why our wine tastes better and things like that. And I never knew whether that was real. And as it turns out, this is something that researchers have also been wondering.
So these researchers trained an algorithm to look for common patterns in the chemical fingerprints of different wines. They were apparently shocked by the results. The model grouped the wines into distinct clusters that matched with their geographical locations in the Bordeaux region.
So these researchers — they effectively showed that terroir is real. One of the scientists said, quote, I have scientific evidence that it makes sense to charge people money for this, because they are producing something unique.
Wow, well, you know, this has some interesting implications for if you buy, like, some really, really expensive wine, but you worry that you’ve gotten a forgery or a fraud, I guess there would maybe now be some means by which you could test it. Or, like, in the far future, you could synthesize wine with maybe a higher degree of accuracy, because we’ll be able to catalog these chemical footprints.
Yeah, so apparently, in expensive wine collections, fraud is fairly common. Producers have been adjusting their bottles and labels and corks to make these wines harder to counterfeit, but this still happens. And with AI, apparently, this will get much harder, because you can just have the AI say, that’s not really Malbec from this region. It’s actually just, like, crappy supermarket wine from California.
Oh, man. Well, this is just great news for wine snobs everywhere.
So we celebrate it.
They’ve been waiting for a break, and now, they have one.
What else happened this week, Kevin?
OK, so this one is actually something that you wrote about.
This is a problem with Amazon’s Q AI model. So Q is a chatbot that was released by Amazon last week, and it’s aimed at enterprise customers. So Casey, what happened with Q?
Yeah, so I reported this with my colleague, Zoe Schiffer, at “Platformer“. Last week, Amazon announced Q, which is its AI chatbot aimed at enterprise customers. You can think of it as a business version of ChatGPT.
And the basic idea is that you can use it to answer questions about AWS where you might be running your applications. You can edit your source code. It will cite sources for you. And Amazon had made a pretty big deal of saying that it had built Q to be more secure and private and suitable for enterprise use than a ChatGPT.
Right. This was its big marketing pitch around Q, was like, these other chatbots, they make stuff up. They might be training on your data. You can’t trust them. Go with ours instead. It’s much safer for business customers.
That’s right. And so then, of course, we start hearing about what’s happening in the Amazon Slack, where some employees are saying this thing is hallucinating very badly.
It is leaking confidential information, and there are some things happening that one employee wrote, quote, “I’ve seen apparent Q hallucinations I’d expect to potentially induce cardiac incidents in legal.”
So you know, let’s stipulate this stuff is very early. It’s just sort of only barely being introduced to a handful of clients. The reason that Amazon is going to move slowly with something like this is for this exact reason. And in fact, when we asked Amazon what it made of all of this, it basically said, you’re just watching the normal beta testing process play out. At the same time, this is embarrassing, and if they could have avoided this moment, I think they would have.
Right. And I think it just underscores how wild it is that businesses are starting to use this technology at all, given that it is so unpredictable, and that it could cause these cardiac incidents for lawyers at these companies. I understand why businesses are eager to get this stuff to their customers and their employees.
It is potentially a huge time-saver for a lot of tasks, but there’s still so many questions and eccentricities around the products themselves. They do behave in all these strange and unpredictable ways. So I think we can expect that the lawyers, the compliance departments, and the IT departments at any companies that are implementing this stuff are going to have a busy 2024.
Here’s my bull case for it, though, which is like, if you’ve worked at any company and you’ve tried to use the enterprise software that they have, it’s usually pretty bad. It barely works. You can barely figure it out. It probably gave you the wrong answer about something without even being AI. So I think we all assume that these technologies will need to hit 100 percent reliability before anyone will buy them. In practice, I think companies will settle for a lot less.
Right, they don’t have to be perfect. They just have to be better than your existing crappy enterprise software.
A low bar, indeed.
All right, that is Amazon and its Q, which, by the way, while we’re talking about bad names for AI models, I literally — I was talking with an Amazon executive last week, and I said, you got to rename this thing. We can’t be naming things after the letter Q in the year 2023.
We will reclaim that letter eventually, but we need to give it a couple of years.
(LAUGHING) Yeah. Yeah, the QAnon parallel is too easy. All right, this next story was about one of my favorite subjects when it comes to AI, which is jailbreaks and hacks that allow you to get around some of the restrictions on these models. This one actually came from a paper published by researchers at DeepMind, who, I guess, were sort of testing ChatGPT, their competitor, and found that if they asked ChatGPT 3.5 Turbo, which is one of OpenAI’s models, to repeat specific words forever, it would start repeating the word.
But then, at a certain point, it would also start returning its training data. It would start telling the user, like, what data it was trained on. And sometimes that included personally identifiable information. When they asked ChatGPT to repeat the word, “poem,” forever, it eventually revealed an email signature for a real human founder and CEO, which included their cell phone number and email address.
That is not great. I have to say, my first thought reading this story is like, whose idea was it to just tell ChatGPT, repeat the word, “poem,” forever? Like, we talk a lot about how we assume that everyone in the AI industry is on mushrooms, and I’ve never felt more confident of that than reading about this test. Because what is more of a mushroom-brained idea than, bro, what if we made it say “poem,” literally, forever?
Let’s just see what happens, bro. And then, all of a sudden, it’s like, here’s the name and email address of a CEO? Come on!
I do hope there are rooms at all of these companies headquarters that are just, like, the mushroom room, where you can go in and just take a bunch of psychedelic mushrooms, and just try to break the language models in the most insane and demented ways possible. I hope that is a job that exists out there. And if it does, I’d like to apply.
Now, we’ve seen a lot of wild prompt engineering over the past year. Where would you rank this among, like, all-time prompt-engineering prompts?
I would say this is, like, an embarrassing thing and one that, obviously, OpenAI wants to patch as quickly as it can. “404 Media” reported that OpenAI has actually made it a terms-of-service violation to use this kind of a prompt engineering trick. So now, if you try that, you won’t get a response, and you won’t get any leaked training data.
And this is just, I think, one in a long series of things that we’ll find out about these models just behaving unpredictably. Why does it do this? They can’t tell you. But I think if you’re an AI company, you want to patch this stuff as quickly as possible, and it sounds like that’s what OpenAI has done here.
All right. Great, well, hopefully, we never hear about anything like this ever again.
OK, can we talk about Mountain Dew?
Let’s talk about Mountain Dew.
This next one is admittedly a little bit of a stunt, but I thought it was a funny one, so I want to cover it on the show. Mountain Dew this week has been doing something they call the Mountain Dew Raid, in which, for a few days, they had an AI crawl live streams on Twitch to determine whether the Twitch streamers had a Mountain Dew product or logo visible in their live stream.
Now, Kevin, for maybe our international listeners or folks who are unfamiliar with Mountain Dew, how would you describe that beverage?
Mountain Dew is a military-grade stimulant that is offered to consumers in American gas stations to help them get through long drives without falling asleep.
Yeah, that’s right. If you’ve never tasted Mountain Dew and are curious, just go lick a battery.
[LAUGHS]:: I was at a truck stop recently on a road trip, and do you know how many flavors of Mountain Dew there are today in this country?
How many are there?
I would say, easily a dozen flavors of Mountain Dew.
Oh, my god. That’s innovation. It’s progress. That’s what this company — that’s what this country does — I said “this company,” and that’s an interesting slip.
Because sometimes I do feel like this world is getting too corporate, Kevin. But look, at the end of the day, this country makes every flavor of Mountain Dew that you can imagine and many that you couldn’t.
Yeah. So fridges full of Mountain Dew at the retailers of America. And this is an AI that just feels like it’s a dispatch from a dystopian future. Now, I think this was sort of a marketing stunt. I don’t think this was, like, a big part of their product strategy.
But with this Raid AI, basically, if it analyzed your Twitch stream and saw a Mountain Dew product in it, you could then be featured on the Mountain Dew Twitch channel and also receive a one-on-one coaching session with a professional live-streamer.
So this document that Mountain Dew released as, like, an FAQ —
Their Mountain Doc?
(LAUGHING) Their Mountain Doc. It is — it’s the F-A-Dew. No, that’s not good. That’s not good.
That’s pretty good! That’s pretty good!
(LAUGHING) OK. So this is the Mountain Dew — I’m reading from the Mountain Dew Raid Q&A. It says Mountain Dew Raid is a first-of-its-kind AI capability that rewards streamers for doing what they love — drinking Mountain Dew on stream — and unleashes a combination of rewards aimed at building and amplifying each participating streamer’s audience.
So it basically goes out, crawls Twitch, looking for streamers who have Mountain Dew products and logos on their stream. Once it identifies the presence of Mountain Dew, this document says, “selected streamers will get a chat asking to opt in to join the Raid. Once you accept, the Raid AI will keep monitoring your stream for the presence of Mountain Dew.
If you remove your Mountain Dew, you’ll be prompted to bring it back on camera. If you don’t, you’ll be removed from our participating streamers.” (LAUGHING) So it is — this is, like, truly the most dystopian use of AI that I have heard about.
I know there are more serious harms that can result from AI, but this actually does feel like a chapter from a dystopian novel. Like, bring your Mountain Dew back on camera! Or you will lose access to your entire livelihood.
Surrender to the Mountain Dew panopticon.
(LAUGHING) Yes. It reminds me of — do you remember that patent that went viral a few years ago, where Sony had invented some new technology that basically would allow them to listen to you in your living room? Like, if your TV was playing an ad for McDonald’s and you wanted it to stop, you could just sort of yell out, “McDonald’s,” in your living room? [LAUGHS]
We must prevent that world from coming into existence at all costs.
Yeah, it reminds me of — a few years ago, we did this a demo. My colleagues and I at “The Times” were pitched on an Angry Bird scooter. We told you about this?
I think you have, but tell me again.
(LAUGHING) Yes. So this was, like — this was during the big scooter craze of, like, the 2018, 2019 period. And the company that makes Angry Birds did a promotional stunt where they outfitted one of these electric scooters with a microphone. And in order to make the scooter go, you had to scream into the microphone as loud as possible, and the louder you yelled, the faster the scooter would go.
And so I am a sucker for a stupid stunt, and so I had them shipped two of these to us, and we drag raced them on the Embarcadero in San Francisco, just screaming as loud as we could into the microphones of our Angry Bird scooters to make them go fast.
And the nice thing about San Francisco is, so many other people were screaming, nobody even paid you any attention.
Yeah, it was only the fourth weirdest thing happening on the Embarcadero that day. And it was a lot of fun. So I support stupid stunts like that. I support the Mountain Dew AI. Casey, what did you think when you saw this Mountain Dew news?
Well, you know, there is something that feels like weird, future-y about AIs just scanning all live media to identify products and incentivize and reward people for featuring their products. At the same time, we’re already living in a world where on social media, some platforms will automatically identify products and will then tag them, and then maybe if somebody buys that product based on you posting it, you’ll get a little bit of a kickback.
So this is just kind of the near-term future of social media, is that it is already a shopping mall, and we are just making that shopping mall increasingly sophisticated. If you see literally anything on your screen, these companies want you to be able to just mash it with your paw and have it sent to you. So this was the latest instance of that, but I imagine we’ll see more.
Totally, and it just strikes me as sort of an example of how unpredictable the effects of this kind of foundational AI technology are. Like, when they were creating image recognition algorithms a decade ago in the bowels of the Google DeepMind research department, they were probably thinking, oh, this will be useful for radiologists. This will be useful for identifying pathologies on a scan or maybe solving some climate problem.
And instead, this technology, when it makes its way into the world, is in the form of the Mountain Dew AI bot that just scours Twitch live streams to be able to sell more Mountain Dew.
You know, I think there actually could be a good medical use for this. Did you hear this? There was another tragic story this week. A second person died after drinking a Panera lemonade.
Did you read this? Yeah. So that happened again. So I think we should build an AI that scans for Panera Charged Lemonades on these Twitch streams, and if it sees one, calls an ambulance. [KEVIN LAUGHS]
This week in AI.
Before we go, a huge “thank you” to all the listeners who sent in hard questions for us. As a reminder, “Hard Questions” is our advice segment where we offer you help with ethical or moral dilemmas about technology. We still are looking for more of those, so please, if you have them, send them to us in a voice memo at “Hard Fork” at nytimes.com, and we’ll pick some to play on an upcoming episode.
And to be clear, Kevin, in addition to ethical quandaries, we also want the drama. We want something that is happening in your life. Is there a fight in your life that people are having over technology in some way? Please tell us what it is, and we’ll see if we can help.
Yeah, and these don’t need to be, like, high-minded scenarios about AI wreaking havoc on your professional life. It could just be something juicy from your personal life.
Yeah, spill the tea. firstname.lastname@example.org. “Hard Fork” is produced by Rachel Cohn and Davis Land. We’re edited by Jen Poyant. This episode was fact-checked by Caitlin Love. Today’s show was engineered by Chris Wood.
Original music by Marion Lozano, Sophia Lanman, and Dan Powell. Our audience editor is Nell Gallogly. Video production by Ryan Manning and Dylan Bergeson. Special thanks to Paula Szuchman, Pui-Wing Tam, Kate LoPresti, and Jeffrey Miranda. You can email us at email@example.com —
— with your favorite flavor of Mountain Dew.