Artificial Intelligence and Auto Safety with Phil Koopman – Part 2

We’re back with Phil Koopman to continue our discussion about AI and Auto Safety. Phil walks us through four crucial areas: safety engineering, security, machine learning, and human factors. Can large language models be used for safety? How do we acknowledge robot drivers? When does Fred return? All that and more in this episode.

Donate for Safety.

More from Phil over at Substack.

Subscribe using your favorite podcast service:

Transcript

note: this is a machine generated transcript and may not be completely accurate. This is provided for convience and should not be used for attribution.

Hey, listeners, welcome back.

Phil Koopman Introduces His Book on AI Safety

Anthony: We’re continuing with Phil Koopman who’s going to talk a little bit more about his book coming up. Hopefully this fall, we’re all looking forward to it. So Phil, can you break us down the four core areas of what’s coming out?

Phil Koopman: Sure.

The Four Core Areas of AI Safety

Phil Koopman: The book is about embodied AI safety.

And that’s not just Robotaxis, but it uses Robotaxis as lessons learned. ’cause we learned so much. And I try and zoom back up and it’s not just a technical book about here’s how to do the machine learning or here’s how to define safety. Although all those things are in it and are important, I. [00:01:00] I start out I’d saying that if you wanna play the embodied AI safety game, robotaxis or otherwise, home house spots, whatever it is, there’s four things you need to really understand.

And a lot of the problems we’ve seen, especially in robo taxis are because teams were weak in the one, one of the four areas.

Safety Engineering and Security

Phil Koopman: The first area is you need to know safety engineering, and the big idea for safety engineering is you identify the hazards and you mitigate the expected risks due to that hazards.

That’s the core of safety engineering. If you’re just driving around to see what happens, you are not doing safety engineering and it’s astonishingly. People think if there’s no bug, it’s safe. No, you can have perfect software that does something dangerous and it’s not safe. So the first thing is safety engineering, hazard else.

The second thing is you need to know security because if you’re not secure, an attacker’s gonna compromise your safety. And so it talks about threats, identifying threats, and identifying vulnerabilities and countermeasures and just basic security principles.

Machine Learning and Human Factors

Phil Koopman: The next [00:02:00] thing you need to know is machine learning.

And you need to understand that machine learning breaks the usual assumptions behind safety and security. For example safety likes software to always do the same thing the same way every single time It’s run. Yeah. That doesn’t work for machine learning. So it and safety also implicitly assumes there’s a person.

Who’s responsible for doing things like it’s too much rain. You shouldn’t fly, or It’s too much ice, or it’s too dark. You shouldn’t put this robot taxi in the road. There’s a person who’s there to take responsibility for enforcing limitations, and if you’re autonomous, now, the computer has to do that.

And most of the safety standards don’t talk about things like this. The traditional safety standards assume that you have a defined environment and for defined use, and they don’t talk about. Misuse and abuse. And they don’t talk about what happens if the environment changes in an unexpected way. So you need to understand the limits of machine learning.

And the basic idea there is that everything is [00:03:00] statistical rather than terministic. And statistical can be really amazing for very cool functionality. And as the further you go down the cool functionality path, the harder you make safety so you understand that’s going on. But the fourth one that not near enough people talk about is you have to understand people and the limits to people.

People have a limit to what they can comprehend, how fast they can react, and those things are coupled together. People make mistakes. That’s what people do. And if you want to get safety, blaming people for being people is not gonna get you there. So you need to understand engineering, security engineering.

You need understand machine learning and you need to understand human factors, especially perception response time. You need to understand those things. If you’re gonna make safe, embodied AI and anyone goes into the game with a big gaping hole in the knowledge on any one of those four things is not gonna get there.

And a lot of what we’ve seen is teams who have too much of a gap in one of those [00:04:00] areas. Putting products out there that turn out to have problems.

Challenges with Autonomous Vehicles

Anthony: I wanna ask about this, the last part, the perception response time. So we have autonomous vehicles on the road. Are you referring to the response time of the pedestrians out on the street?

The other drivers? I. Are you referring to the All of it?

Phil Koopman: All of it.

Anthony: Okay.

Phil Koopman: All of it. But the place that really bites you the most are in these hybrid systems.

Remote Drivers and Response Time

Phil Koopman: And it isn’t just cars where you have a person and you have a machine learning based system that are collaborating somehow that’s where you get in trouble.

And even roboto, taxis, all robotaxis. All robotaxis right now have a human driver. Many of them don’t want to call it a driver. They all have a human driver, and by driver I mean someone who can, if they make a mistake cause a crash. And so as far as we know the Teslas we have pretty good information available saying that Tesla’s operating in Austin right now actually have remote drivers with steering wheels and brakes and gas.

And they’re basically acting the same as an [00:05:00] FSD supervisor except the remote. And they’re expected to jump in and prevent a crash. But if the, if just as if they were in the car, if the car does something crazy faster than they can respond. They can, if it’s fascinating, they can respond. They can’t respond.

And response times can be not just handfuls of seconds. They can in tens of seconds depending on what’s going on. But even the Waymo Cruises of the world, there was a Waymo mishap where, it was stopped at a red light, but I wasn’t sure it was a red light. And so it phoned him and said, is that light red?

And the remote assistant said, no, it’s green. So it went through the intersection and a light motorcycle, like a mop, head rider wiped out trying to avoid hitting it. And now, in what world is that? Not a driver,

Anthony: right?

Phil Koopman: The driver made a decision that caused a wreck. How is that not a driver right now that Waymo says the car is responsible for safety.

So they didn’t blame the driver, which is good, but. It turns out that remote assistant, if they can control any aspect of vehicle behavior, even if it’s just they just get a vote. Yeah. [00:06:00] But if they’re the penny that switches the balance from one side to the other, how is it that they’re not participating in the bad thing that happened?

I.

Anthony: So how does that work? So for example, with Tesla, you’re talking about, so somebody’s, let’s say, let’s pretend it’s one-to-one right now, I think our impression is it is one-to-one. So you have one robax in the road and you have one person and then

Phil Koopman: helping somewhere. That’s my expectation as well. And when China launched Robotaxis, that’s what they did too.

So they’re, this is not the only, only folks who have launched ar taxi like this,

Anthony: but eventually they’ll, their goal is to scale out for economic reasons. We’ll have one person monitoring X number of vehicles,

Phil Koopman: one expects and while I don’t have firsthand info, I’ve heard from reasonable sources that this is already going on in China, that there are already multiple vehicles per.

Per supervisor, because that’s the only way you can possibly afford to build a robotaxis fleet.

Anthony: So you’ve talked about this issue on past episodes, whereas there’s that response time, there’s that. Latency, that delay of, what is it? ’cause it’s obviously we’re not connected in real time to those cars.

There’s always [00:07:00] gonna be some level of latency. They’re over connected over a cellular network. We know calls data get dropped all the time. This is normally a question I’ll ask Fred is, Hey Fred, do some math. These cars are driving 30 miles per hour. Someone in a remote office sees an issue.

What’s their, if they’re fully engaged, let’s pretend they’re fully engaged, fully caffeinated. Not distracted by anything. That kind of response time for them to engage and cause the robotaxis to not cause an accident. Is that possible? Like at what speed does it no longer become possible,

Phil Koopman: the videos I saw. For Austin on the first day operations, they were at 35, 36, 3 7 38. They’re at 35 miles an hour, which is pretty borderline. If you hit a pedestrian at 35, you’re gonna do some damage. It’s well above 20 which is the much less fatal speed. But I think people get it wrong here.

They say we’re gonna measure from the time something bad happens to the, [00:08:00] and then it has to the camera has to scan it, and then it has to be sent over a data link, and then you have to see it on your display and you have to switch the steering wheel. And then the steering wheel has to be sent back to the car and for sure.

That data transmission and steering wheel and display and cognitive processing and all that stuff back and forth takes time. And if you do the math, there’s not enough time to respond. But if you do the math, a lot of times there isn’t enough time to respond, even if you’re in the car. What’s going on?

Imagine a little more time in the car. Slam on a brake. So make for sure having a remote driver makes it worse. There’s no question. There will be crashes where that time difference is what made the difference.

Anthony: Okay,

Human Drivers vs. Autonomous Systems

Phil Koopman: but I’m gonna go a different way and this has more to do with with why computers aren’t cannot in someti times, not be as safe as people.

If you look at the data for human drivers as you get older, your reflex is slow. Like a lot. I’m, I do not have the reflexes I had when I was 17. [00:09:00] There’s no way. Not even close. Okay. But if you look at the AAA data for fatal crashes, fatalities go down and keep going down till you’re in near sixties.

I. The reason, main reason they go up in your seventies and eight eighties is not ’cause you old and senile. It’s because by the time you’re 80 it doesn’t take much of a crash to kill you. It’s much more about that than it is about anything else? So we have this really really weird thing where a old, it’s just scary as anything.

And you have to be like in your twenties before you are down in the sweet spots. And so as you are speed, as you’re slowing down, as your reflexes degrade, you’re getting safer. What’s going on there? Okay. Now that’s data, that’s well established. Data. I’ll tell you my own personal theory is that wisdom trumps reaction time.

People said cars will be safer because they can see everything. We talked about the cruise bus last time. Just ’cause they can see it doesn’t mean they won’t hit it. And also ’cause they can react faster than human. If they react stupid, it’s not gonna avoid it. [00:10:00] So a lot of this safety driving is knowing that something bad’s about to happen and avoiding it before the last second twitch response.

And the problem you’re gonna have with remote drivers is they’re gonna lack the feel of the vehicle. It’s gonna be a little bit harder and they’re looking through a camera. It’s just gonna be that much harder if they don’t have the audio cues. Not as well. It’s gonna be that much harder to pick up the subtle things re remotely.

And it’s gonna require them to anticipate and avoid crashes. But if they’ve been told, don’t intervene unless you have to. ’cause we need good stats. Then they’re gonna have to let the car do dangerous things and by the time they figure out it’s not gonna avoid a crash, that it’ll be too late to jump in.

And this is, this has been a problem for safety drivers who are in the vehicles. And I think being remote is gonna make it worse because of the time lag and because their sensor information is gonna be degraded in quality. It just won’t be the same as being there. I tried driving in a simulator once and I was terrible [00:11:00] at it.

And I finally figured out it’s ’cause I wasn’t I wasn’t getting road fuel on corners.

Anthony: Oh, that’s fascinating. That’s what was nailing

Phil Koopman: me. I kept flying off corners ’cause I didn’t have red fuel on corners. And I was looking for the banking, the super elevation. I was looking for these very subtle hints.

And I’m a terrible remote driver. I’m a terrible simulated driver because I depend too much on the can aesthetic clues. There, there’s stuff going on. Now can they, if they’re trained and experienced and they have a crack team of a dozen or two dozen of the world’s best drivers, can Tesla make the robotaxis safe?

It’s plausible. I don’t know how you scale that to a million vehicles though.

Anthony: Yeah, and I don’t see how you save any money hiring crack drivers to do this remotely. Like I, but that’s a different, that’s not a safety issue.

Phil Koopman: The story they’re gonna give is at first of course we do this because we’re paranoid about safety.

And any time, next, any minute now it will no longer need interventions. And then we can go to supervising two and four and eight and 32, and then eventually switch to what Mamo does, which is they don’t [00:12:00] supervise at all. So the other difference is are you, is it your job to jump in and save the day?

Which is really hard for all those reasons. Or is it the robot taxi’s job to no one’s in trouble and get itself to a safe place and ask permission by phoning home, which is what Waymo does. So this Tesla’s looking at a big transition when they can go from having to keep an eye on it to having it call home.

’cause when you have it, call home. Then the time pressure is off. The perception response time is a huge deal when you are dealing with complex situations with limited information. And if you make, if you don’t act fast enough, someone dies, right? That’s really hard. The idea for Waymo and other companies is you take that out of the equation a bit by having the robax get itself safe and stopped and waiting for help and then have the person remotely help it.

And that sounds great, except there’s cars honking at you while you’re waiting.

Anthony: Sure.

Phil Koopman: And then it’s gonna be a [00:13:00] call center. So a lot of call centers, they get judged on how many seconds it takes to answer your question. I. Because economics and metrics and it’s going to end up there sooner rather than later.

And so we’re back to, they jump into a complicated sit situation. They have no idea what’s under the car. They have no idea. The car just ransom over and over thinking of the cruise mishap and the time pressure’s on, and it’s complicated. And if they make a mistake, it’s bad in, of course, people make mistakes.

That’s what people do. And so that’s where the limos and everyone else who doesn’t need constant supervision ends up and I’m not saying anyone does it poorly, I’m just saying it’s a tough gig and it’s really challenging to design a system where you’re going to have very few bad outcomes.

Anthony: This is, my, my ideal childhood fantasy, I guess it’s never gonna come true, huh?

Phil Koopman: If you get into a robot taxi and there’s no driver in the front, do you really care what’s going on behind the scenes? As long as it has a safe outcome, I.

Anthony: I, I guess not, but I liked [00:14:00] in Tesla’s terms of service where it says, we cannot guarantee that you’ll be dropped off at the location you requested.

Phil Koopman: Yeah. Which I thought

Anthony: was amazing, but they’ve all

Phil Koopman: had

Anthony: this. And

Phil Koopman: Then you get into personal safety. What if it drops you off in a really dangerous place? We’ve seen issues with that. Yeah.

Anthony: I, so I wanna go back to perception response time. ’cause now I wanna talk to the pedestrians outside the vehicle.

Because I, we’ve talked about this where, you come to a four-way stop and everyone doesn’t have their DMV manual in there. Wait, who has the right of ways? Is the cars to my right that got there first or is it to the left? How do we do this? So as humans, we all just make eye contact and figure out who’s gonna go for it.

We do this with pedestrians. Are they actually crossing the street? Are they waving me on? What? There’s no one in the car. How do you have that nonverbal communication?

Phil Koopman: That’s tough, right? And you can do some of it by when people, I. People who aren’t great at eye contact, not everyone is good at eye contact.

It turns out okay, sure. You inch your car forward a little bit and see if everyone else inches forward. You knew that you can use vehicle motion as a communication mechanism successfully, and I expect [00:15:00] that’s what the robot taxis are gonna have to do, right? But there’s another thing from a purely engineering point of view that gets really tough.

The law says whoever arrives first goes in, if it’s at the same time, it’s the person on the right. There’s two issues. What if all four arrive at the same time and we’ve all been there? Okay, but the other one is what if the guy on the right one, if the guy on your left arrived one 10th of a second before you?

What if it’s a thousandth of a second? How many people can measure a thousandth of a second while they’re driving? Nobody. Nobody okay. The commuter can’t. So even though it arrives second, because it’s on the right, it should go first because the person isn’t gonna realize that they have the right of way, and you’re gonna have to wait.

And so you need to, if you’re building a computer that has better perception, raw perception, not understanding, but better sensor ability and better timing, estimation ability in their person, you have to model what the person’s experiencing to be able to act like a [00:16:00] human in those situations. ’cause the goal isn’t to not be blamed, the goal is to not have the crash.

And those are often two different things. So you need a model of how people act, so you can act like a person if you’re in mixed traffic, or you can have bad outcomes.

Anthony: We do better training and force these systems to read the US Coast Guard’s rules of the road.

Phil Koopman: I, they all follow that. I was quite proficient at the rules of the road at one point in my life.

There you go. But even there, there’s a lot of really weird situations where, which have to do doesn’t make sense, but if you do the right thing, it’s your fault. So you do something crazy because you never want it to be your fault.

Anthony: I was told the perfect example. I gave some, like some obscure, convoluted example, the.

And the captain looked at me and goes, don’t hit the other boat.

Phil Koopman: Yeah.

Anthony: I’m like, yeah. Oh yeah. That’s the goal. Yeah. But

Phil Koopman: that’s one of the goals. But the other goal is there’s a thing you’re not supposed to turn right. You’re supposed to turn left. So do you turn right one degree or do you turn left 359?

The answer is you turn left 3 59 then it’s not your fault. Exactly.

Lidar Debate and Data Collection

Anthony: So going back to your table of context, [00:17:00] one a of your index, one of the great titles that we grabbed on was to lidar or not to lidar.

Phil Koopman: Yeah. This is, that is the question. I, you set it up for me. I couldn’t resist.

Anthony: Oh, good.

It is the question, of course. And this is the big debate ’cause all these analysts that these financial analysts that I think are clowns are like Tesla system’s gonna be cheaper because they haven’t added all these things. And I’m like, yeah, you remove seat belts, the car will be cheaper too.

Phil Koopman: As a hint, there are a few sections that the early version shows up in my substack, so that one is, that’s one of ’em, and you can put a show note to point to that one, but I’ll give you the short version, but. The vast majority of the book is stuff no one’s ever seen. To be clear, it’s not just eclipse.

It’s not Eclipse show. Okay? It’s new writing. But there, what I, everyone goes Tesla, lidar, everyone else, I’m sorry, Tesla, not Lidar, everyone else, LIDAR or Lidar, bad lidar, whatever. And I make the point that’s the symptom, not actually what’s going on. And so the way I look at it is the business choice is that you can have a [00:18:00] few number of test vehicles with really good sensors.

Or you can have a huge number of vehicles, but with poor sensors. ’cause you can’t afford, 10 years ago you couldn’t put, afford to put LIDAR on every one of a million cars. It just, it was just economically and feasible or even if theoretically, economically feasible. Tesla chose to do something else with their money, right?

So it’s a huge amount of money. So if you have a million vehicles and you don’t have lidar because you are financing it by selling vehicles as opposed to financing it by having billions of dollars of venture capital. That’s certainly, good for you in a business sense. Okay, but you don’t have lidar.

What do you do? It sends you down a different development path. If you have lidar and you have these great sensors, the componentized machine learning, which I talked about in the last episode, where you have chunks of machine learning. This chunk does perception. This chunk does planning, this chunk does something else.

And often the chunks are even finer green than that. If you have really good quality data you and you want to have people label, that’s a car, that’s a road, that’s a person, that’s a whatever. It’s a really [00:19:00] expensive way to get and curate data, but you get super high quality data and it’s really easy to train little chunks on it.

But if you don’t wanna spend the money on the lidar, but you have a big fleet. Then you have a lot of data that is lower value per data sample. ’cause it’s not labeled. You can’t afford, you can’t afford to label. Hundreds of thousands of hours of data, it’s not happening. Okay. And by the way the way, not Waymo specifically, but other teams like them had 5,000 full-time employees, 1000 full-time employees, thousands of full-time employees just at contractors labeling data.

So you’re not gonna scale that up by factor a hundred. But you have all this huge amounts of data. What do you do with it? You go to end machine learning. We just throw a bunch of data in and you tell it, and you judge it on whether it goes fast or lower, left right, sketch rolled up newspaper, good dog, bad dog, goes it the wrong way. You say, here’s a video clip, did you wanna go left? Pad dog, go. Okay. And eventually it starts learning to go right on that data. And so it’s a lot cheaper. It’s a lot [00:20:00] easier to build a system if what you have is a lot of ton of unlabel data.

And so the component versus end to end is motivated by this choice about whether to put the lidar in or not. So it’s not, it’s backwards. People say lidar are good, not lidar are bad. It’s no. It’s a different design choice and really it boils down to componentize versus end to end is where you end up.

Now, I think end-to-end is problematic for safety because dealing with edge cases is really. Difficult. Okay. But it’s not because there’s no lidar, it’s because the technology is tough and if using cheap cameras, that’s certainly not gonna help. Although the newer Tesla cameras are better resolution, and I’m not sure if that’s enough resolution but it ends up there.

And so people get hung up on lidar. It’s actually a package deal. It’s a whole bundle of design choices to use these high value sensors. To use a bunch of training data and now there’s a third option. The wies of the world have this third option that they’re not gonna use road data [00:21:00] ’cause Tesla primarily trains on road data and if you have a big fleet you’ll see a lot of stuff.

So all that makes sense and Waymo started by having to have their own test track where they set stuff up and use simulations. But the third option is to use large language models that other piece of AI technology and ask the large language model to create a bunch of synthetic simulation data outta thin air.

So you can use the end learning, but you don’t have to go out on the road and collect data. ’cause the LM just coughs up bunch of data. Here’s a piece of, here’s a clip of street data. Could you make that car into a truck? And could you change this from red to green? And can you have a PA kid chase a ball from the side and it’ll come up with a picture and you train on it.

So that’s a third. A third. There’s three choices, right? There’s component based with high quality data. There’s end-to-end with huge amounts of lower quality data, not necessarily bad data, just not high quality data. And the third one is using end-to-end with synthetic data from large language models.

So that’s the three games in play right now. [00:22:00] And it’s too early to tell who’s gonna win. Way moves ahead. We’re in the first innings. The game’s not anywhere near over,

Anthony: right? That last option sounds, incredibly flawed to me. ’cause it’s who’s creating these scenarios And they have a limited worldview.

Phil Koopman: You take real data and you have the large language model trained on real data. And you say, given this frame, what’s likely to come next? And so it can, it’s just like large language model. You put a bunch of words in, you say, what’s the next word? You put a bunch of frames of video and say, what’s the next frame?

And then you say, yeah, that’s cool, but I want you to make this change now. What’s the next frame? And use that instead of a physics based simulator. That is what some companies are doing. While Bobby’s the note, Bobby’s the one who’s been talking about it, but they’re not the only ones looking at that kind of thing.

Anthony: Yeah. I just think is what we’ve talked about is the, safety is the edge case. Like all these accidents happen with things that we don’t think about. We don’t think about, oh, change this to a truck or a a boy chasing a ball type thing.

Phil Koopman: If you think about all the changes and put ’em in, and the physics generated by the large language model based simulation, [00:23:00] it’s not actually text trained on the internet.

It’s all. It’s all vision based, but it’s the same technology. If you think of all the things to train it on, maybe it gets them but you have to make sure you think of all the things, and you have to make sure that the things the potential educations you’re thinking of match up.

Where the behavior discontinuities are in the technology. So it may have really weird notions of things that it has trouble with that would never occur to a person. And so you say, fine, we trained on this simulator and the simulator did everything possible so we know it’s safe. And my answer is define everything possible.

And how do you know your simulator really did that? And of course it’s not gonna be perfect. And not being perfect is okay. ’cause it’s not about perfection. It’s okay. Not be perfect. It’s do you meet all the definitions of safety, which is a whole different discussion we had. It’s not just the reduced body count.

It’s also are you driving recklessly? Are are you. Putting passengers at better safety, at the cost of more risk to pedestrians. There’s a bunch of things you have to look at.

Anthony: Did the street planners put [00:24:00] the utility pole on the curb or in the road?

Phil Koopman: Was it protected by a curb or hanging out in the middle of the road, just waiting for robot taxi to crash into it.

Yeah.

Anthony: Okay.

Conclusion and Final Thoughts

Anthony: Before we finish, Michael, is there anything you want to add? Nope. Michael’s gotta run for it. Phil, anything you wanna wrap up on?

Phil Koopman: We missed the obligatory Piggly Wiggly reference ’cause Fred wasn’t here. Oh

Anthony: yes.

Phil Koopman: But it’s now, that’s now been corrected, Fred. So hope you enjoy this week when you listen to the episode.

Anthony: There you go. Fred Piggly Wiggly’s taken care of. While you wait for Phil’s next book, you can pick up his last book. How Safe Is Safe Enough? I’ve read it. It’s actually great. It is totally accessible. I know a lot of people come into this field and they think, oh, I can’t read this book. There’s no way I could understand this.

Not true. Bill’s an excellent writer. Very easy to get a grasp on things. Highly recommend it and then come back in the fall and We’ll, I’ll highly recommend your next book unless I don’t understand it.

Phil Koopman: No we’ll see how that goes. Thanks for having me on three and Michael, really appreciate it.

Anthony: Thanks a lot, Phil. Bye. For more information, visit

Phil Koopman: [00:25:00] www.auto safety.org.