Autonomous Vehicle Research at Waymo

Jerod:

Welcome to the Practical AI Podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Jerod:

Now onto the show.

Sponsor:

Well, friends, when you're building and shipping AI products at scale, there's one constant, complexity. Yes. You're bringing the models, data pipelines, deployment infrastructure, and then someone says, let's turn this into a business. Cue the chaos. That's where Shopify steps in whether you're spinning up a storefront for your AI powered app or launching a brand around the tools you built.

Sponsor:

Shopify is the commerce platform trusted by millions of businesses and 10% of all US ecommerce from names like Mattel, Gymshark, to founders just like you. With literally hundreds of ready to use templates, powerful built in marketing tools, and AI that writes product descriptions for you, headlines, even polishes your product photography. Shopify doesn't just get you selling, it makes you look good doing it. And we love it. We use it here at Changelog.

Sponsor:

Check us out merch.changelog.com. That's our store front, and it handles the heavy lifting too. Payments, inventory, returns, shipping, even global logistics. It's like having an ops team built into your stack to help you sell. So if you're ready to sell, you are ready for Shopify.

Sponsor:

Sign up now for your one dollar per month trial and start selling today at shopify.com/practicalai. Again, that is shopify.com/practicalai.

Daniel:

Welcome to another episode of the Practical AI Podcast. This is Daniel Witenack. I am CEO at Prediction Guard, and I'm joined as always by my cohost, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are doing, Chris?

Chris:

Doing great today, Daniel. Lots of as always, lots of AI and autonomy to talk about. And you know what? We have way more to talk about as well.

Daniel:

We we have way more to talk about. Yeah. Speaking of way more, we're very excited to welcome back Drago Engelov, who is the vice president and head of the AI foundations team at Waymo. Welcome, Drago.

Drago:

Thank you, guys. It's great to be back after five years or so, right?

Daniel:

After five years, yeah. We were commenting before we started the recording that the last episode with Drago was on 09/01/2020. So that was episode 103. So a few things have changed in the world generally, but certainly in relation to AI. I'm wondering if you could maybe just catch us up at a high level, Drago, in terms of driverless cars, autonomous vehicles.

Daniel:

How do you see the world differently now than you did in 2020?

Drago:

One thing I would say is in October 2020, we opened our Waymo One service in Phoenix East Valley to everybody, so just one month after we talked. But since then, we have launched and scaled quite dramatically in now five major metros. Is San Francisco, Los Angeles, Phoenix, Atlanta, and Austin. We are also serving hundreds of thousands of rides a week to paying customers. We are expanding.

Drago:

We announced expansion to at least half a dozen or no more cities that will be going on through next year, and we may announce yet more. In the cities we were at, we continue reporting the safety performance of our autonomous driver, and we are over 100,000,000 autonomous miles driven on the road at this point, so it's fairly statistically significant. In those miles, our safety study at close to 100,000,000 miles showed that we are five times less likely to get into accidents with critical injuries and over 10 times, I think 12 potentially, less likely to get into collisions or injure pedestrians, so that has been happening. And we are on to doing more and more right now. I think we work on improving the driver further.

Drago:

We have a sixth generation vehicle coming up. We have started partnering with different companies. For example, we're partnering with Uber in Austin, in Atlanta, so our vehicles show up on their app in those cities. We have partnered actually with Lyft in Nashville, if I believe, and we partnered with DoorDash to explore delivery, so we're exploring and expanding the scope and the partnerships that we are doing as well, but I think in 'twenty five, I would say a lot more people have had and continue having the opportunity to try Waymo. I'm quite a convert myself.

Drago:

To me, probably the moment, the bigger moment was in 'twenty two when I got riding in San Francisco by myself fully autonomously, and so since then it took some time for more people to get exposed, but now I think the phenomenon is out there. I think also the autonomous vehicle industry went through cycles. There was certainly around 2223 time of pessimism in autonomous vehicles, but I think through our success, through generative AI, and I think there's other companies now, it's again a very lively space. There are others that are also trying to push what's possible with autonomous driving and robotics, so it's again a very, very happening place. Yeah, we are contributing probably.

Drago:

I would like to thank the most advanced version of an embodied physical AI today that you can do without.

Chris:

That's fantastic. I got to say, as a native Atlantan, I'm so happy that you you guys are in in my city. And we're a very, very car centric city as well. You know, you you really have to have a vehicle to get around. And I noticed, you know, as you were naming the cities that you guys are in, that tended to be the case in terms of Variado.

Chris:

Does that play into any of the way that you guys think about testing in terms of being, you know, like Atlanta traffic for its size is notoriously bad? And I would love to see ever more Waymo's and and other autonomous vehicles here because I am terrified of all the drivers around me with our daily collage of of traffic accidents and stuff like that. So I keep telling everyone just wait. Autonomous vehicles are coming. I'm kinda curious how you pick these different testing cities, that you guys engage in and what are some of the things that you're testing for that maybe those locations are particularly, apt for helping out on.

Drago:

It's a bit of a combination of both technical and business reasons. I think we're trying to do large metros where autonomous vehicles can be a big market and help a lot of people, so that's one. Also, we've intentionally been growing our ODD, so to speak, operational design domain. Our first service, Waimo Wan, in Phoenix East Valley, Chandler, that's maybe a bit suburban with up to forty five hour arterial roads. We learned to master it and then went to San Francisco that is dense urban with fog and some rain and hills and windy roads and narrow roads and tons of pedestrians downtown, so we dealt with that.

Drago:

Then we started expanding. I think some of this is Atlanta is a big city, also different state. There are some differences across the various states, how people instrument the roadway and how people drive, right? So we're spreading geographically more and more. I think also we're spreading to other domains.

Drago:

A few that are really top of mind is highways. We have been working on highways for a long time. We've gotten to a certain point with highways. Generally, to have a good taxi service, you need highways, right? It turns out that's a very fascinating, interesting problem.

Drago:

They're difficult because whenever you move at high enough speeds, like 65 miles an hour or so, the consequences of any mistake are really high, and many things can happen. So it pushes your robustness and safety capability there. We've been doing highways, but one thing I did do is I rode. Now we can give highway rides to employees, and I rode one to Milbray Station to get to the airport, and it's fantastic. I hope to be able to bring it in the future to more and more people.

Drago:

I think that will make the service a lot more useful. Also, we announced that we will drive in other cities that have snow, so potentially even in '26, right? Our sixth generation platform is designed after the Jaguar. It's a Zika vehicle and Geely Zika. It's Zika, and that Zika is designed with a hardware suite to be able to handle snow.

Drago:

We are also heading out to other countries, so we announced that we intend to launch driverless capabilities in London next year, and London is a left side driving city, and so is Tokyo where we currently have vehicles and we're testing, right? You can see we're trying to cover little by little the operational design domain of most large metros with all of their properties. We're, of course, also in Texas. That's its own unique state, but we started with more southern states, large metros, so you don't have to worry about snow at least. You want to tackle these challenges in some order, not just try to do everything at the same time.

Drago:

It's very difficult to validate your ability to do well in everything all at the same time, right? We're just taking we we're kind of mixing what is what makes business sense with actually expanding the capabilities to become truly global driver.

Daniel:

And you started you you mentioned the driver, the car. I'm wondering if for those out there, those listening, which this is kind of maybe hard to do just from an audio standpoint, but if you kind of imagine the driverless car as a system in 2025, how would you kind of describe that architecture or that system? What are the kind of main components? Like I imagine, you know, sensors, the actual car, the compute, like how, what does that system look like in 2025? Just at a high level?

Daniel:

Then of course, I'm sure we'll get into some of the modeling things and foundation models and all of those things, but

Drago:

I mean, the car is Ultimately, it's a robot on wheels, right? The main distinguishing capabilities are that it has a set of sensors, in our case, camera, LIDAR, radar and microphones. Microphones are quite helpful for many things, including listening to sirens and occasional instructions. Then you have compute on the car. It's a nontrivial amount of compute.

Drago:

It's more than you can put on a phone, and all our vehicles are electric. That was an explicit choice of the company. I'm personally quite proud of this choice. I think that's good for the environment to actually have such cars and I think can accelerate, I think, transition to more electric vehicles, which I think is good personally. So they have this robot on wheels with computer and sensors, and then you have actuators, right?

Drago:

Then there is a lot of system design engineering to make sure steering and brakes and all these things. They need redundancy and robustness to make sure that if any system goes wrong or we need to think also if compute parts of it can go down, that you have contingencies, so it needs to be designed with redundancy. You need to think of what if steering wheel column, there can be also issues with steering. What is the redundancy? For for autonomous vehicles, they need to think additionally and build these things into the hardware.

Drago:

It's a robot designed for safe transportation from the ground up, even though we're built using. We're just extending existing platform, and we work with the various automakers to do this.

Chris:

As you're doing this and you guys have progressed over these five years since we last talked, one of the challenges is probably not every person out there is is a Chris or a Daniel who's who's very invested in this kind of technology, you know, going forward. You have a lot of people out there here in the South. We joke that, you know, every every other driver thinks they're a NASCAR driver and stuff. And and that that notion of control and safety and and, you know, the general population may not have as much confidence in some of these technologies because they're not following it closely and living it the way you do all the time. How do you how do you approach that?

Chris:

And how has that changed over these last five years since we talked to you last in terms of getting buy in from the public and getting them feeling, you know, like it's as you talk about the the safety statistics, are amazing, but getting them to really feel that deep down, you know, inside that they can that they know they can trust, and believe in this mode and that it is in fact much, much safer than what they are typically doing on a day to day basis.

Drago:

So there is I know people do not feel statistics. It's hard, right? Because they're a product of many, many rides. You're doing 10 or even a 100 safely is not enough. I think what people feel is when they get into the vehicles, and this worked for me my moment even though I went before, and also my wife and friends of mine, people get comfortable really, really fast.

Drago:

You need to pass a certain bar where they feel, Okay, this thing actually is a really, really good driver. My mother-in-law sat in it just a few weeks ago for the first time and she rolled around. She's like, This car drives much better than me, right? Once she thinks this way, she's immediately at ease, I think. I think people relax after the first several minutes are very exciting.

Drago:

Then they relax and enjoy the experience and mind whatever they like to mind, either the environment or their phone or other things. People get really used to it if you cross this threshold of, Can I trust you? I think your driving immediately shows this. Now, us in the industry also understand that coming back to statistics, you need to back it up. With regards to backing up, Waymo, right, we believe in transparency and we're quite open with the incidents that happened.

Drago:

We file the details and we also track the statistics and do our best estimate. We have a great safety team. They publish these reports in them. We evaluate and try to estimate how are we doing compared to a fleet of human taxi drivers or human drivers driving in our area that we are handling, and this is both by us, but also there are studies done by insurance companies who, of course, want to quantify this very well, and so there's a Swiss REST study also proving our numbers. They also believe we significantly can decrease claims of different kinds for injuries, for accidents, and so on as well.

Drago:

That's another external validation for the kind of thing we provide. That's what I would say to people. Now, it's a process. You need to work with the local communities. You need to work with police.

Drago:

You need to work with the various city stewards officers. We train a lot of people. We engage with them. We work overtime. I think you can see that in the cities we have been over time, I believe generally the trust in us increases, and I think that the satisfaction of Wemos by the users, if you look at the apps like in the stores, so I think on the App Store, we had a five star rating, right?

Drago:

There is a bit of almost like people that would just use Wemos now if they could, and that's a testament to the value that people see in the rides, but it drives, of course, to safety and ultimately engaging these people, getting them comfortable. Often when people experience this, many of them become converts. I encourage people. Try it. You may be the next convert if you have not yet.

Drago:

I personally love it. I take it as much as I can It's always a pleasure working on a product. Enjoy yourself, so I feel blessed that way.

Sponsor:

Well, friends, it is time to let go of the old way of exploring your data. It's holding you back. But what exactly is the old way? Well, I'm here with Mark DePuy, co founder and CEO of Fabi, a collaborative analytics platform designed to help big explorers like yourself. So, Mark, tell me about this old way.

Sponsor:

So the old way, Adam, if you're a product manager or a founder and you're trying to get insights from your data, you're wrestling with your Postgres instance or Snowflake or your spreadsheets. Or if you are, and you don't maybe even have the support of a data analyst or data scientist to help you with that work. Or if you are, for example, a data scientist or engineer or analyst, you're wrestling with a bunch of different tools, local Jupyter Notebooks, Google Colab, or even your legacy BI to try to build these dashboards that someone may or may not go and look at. And in this new way that we're building at ABBYY, we are creating this all in one environment where product managers and founders can very quickly go and explore data regardless of where it is. So it can be in a spreadsheet, it can be in Airtable, it can be in Postgres, stuff like.

Sponsor:

Really easy to do everything from an ad hoc analysis to much more advanced analysis if, again, you're more experienced. With Python built in right there, in our AI assistant, you can move very quickly through advanced analysis. The really cool part is that you can go from ad hoc analysis and data science to publishing these as interactive data apps, dashboards, or better yet, delivering insights as automated workflows to meet your stakeholders where they are in, say, Slack or email or spreadsheets. This is something that you're experiencing, you're a founder or product manager trying to get more from your data or for your data team today and you're just underwater and feel like you're wrestling with your legacy BI tools and notebooks, come check out the new way and come try out Fappy.

Sponsor:

There you go. Well, friends, if you're trying to get more insights from your data, stop resting with it. Start exploring it the new way with Fabi. Learn more and get started for free at fabi.ai. That's fabi.ai.

Sponsor:

Again, fabi.ai.

Daniel:

Well, Drago, I understand that every driverless car company is gonna have a different approach to modeling and all of those sorts of things. You've talked a little bit about the hardware and the car, but I think it would be good for people to understand, we talk about this driver or you mentioned the driver, people might have in their mind because we do talk a lot about models now after the generative AI boom, that there's this model that can reason and blah, blah, blah. And so people might have this view of like, there is a model that drives the car. Could you help us really break down like in 2025, is this a system of models, models that do different things, a kind of combination of different types of models and even non AI pieces? Could you just help us kind of generally understand how that works?

Drago:

So when you think of the stack, right, let's talk first about what it needs to do. It needs to perceive the environment using the sensors. It needs to build some representation of this environment. It needs to use this representation of the environment to make a set of decisions. Traditionally, autonomous vehicles are around a long time.

Drago:

Waymo is around over fifteen years already, right? So it's a rapidly developing technology space, but traditionally you can think of there's this historically, people thought, Okay, there are these models. There's a perception model that builds a representation of the world that can be useful for certain things, and then there is some kind of behavior prediction and planning module that reasons what we could do and potentially some people like to also reason what others could do to cross reference our behavior with the other folks, and then based on all this information, eventually select promising decisions. That's what the stack normally does. Now, there's different ways to implement it.

Drago:

Generally, the trend has been to have few, and in some cases people claim they have one, large models, AI models on the car, and you could say ML or AI. For a while it was called ML. When the models became big enough, people called it AI, right? So you have these large AI models on the car. A few are one depending on the various companies, and they're connected in certain ways.

Drago:

You can train them end to end or not. That's also an option. Different companies can choose. The two are orthogonal concepts. Whether you have modules and whether you can train them end to end is different concepts, so it can be structured end to end.

Drago:

Essentially have models entering end to end. These are two, and so different companies on this very coarse taxonomy fall somewhere in this bucket, right? Think Waymo always has used AI or ML since I've been there and it's been the backbone of our tech. I think over time, our models have streamlined and become fewer and fewer. I can say that.

Drago:

I think off board, what my team does is build these large foundation models for Waymo that are not limited by how much computer latency constraints you have, and they can be quite helpful to essentially curate data or teach the models that actually run on the car in the simulator. We can get to simulators later. We have experience with most aspects of these options, whether it's end to end and whether it's structured or not. I think off board, I can definitely tell you we've explored a lot with large vision language models. That's one of the latest technologies that's relevant to us.

Drago:

I think in the field of robotics, people talk also about vision language action models because you can tie in one model both understanding vision and language inputs and potentially ask for certain actions as outputs, right, which is ultimately what the robot needs to generate. That's an exciting area that has developed in 'twenty five. I think in our model, Waymo Foundation model, we combine benefits of these vision language models, but also combine it with some bespoke Waymo architecture innovations. Think in areas such as fusing these new modalities that vision language models typically are not trained on, like LiDAR and radar is one. Another one is modeling the evolution, future potential evolutions of the world.

Drago:

There is some interesting Waymo technology on how to do this well that we also use, but we fuse all of this and VLM technology, world knowledge from also other bases, whether it's a world model or a visual language model, into something that then is able to do well on autonomous driving tasks. That's off board. On board, we don't typically talk exactly what is there, but I think we're trying to get state of the art, the best architectures that we believe solve the problem and put them together on the card. I think it's a really, really high bar to have a model perform in all the conditions and all the situations we need it to, right? We also have some notion of, as you know, VLMs also have this weakness of hallucination, so we have the safety harness around them to prevent hallucination to double check what they are predicting, right?

Drago:

We also have that aspect in our staff as well, which we have worked on historically. That's what I can say on a high level. I hope that's not too scattered. Maybe you guys, if you want anything specific, we can discuss that in a little bit more detail.

Chris:

I do have a follow-up to that. Recognizing that you're not able to get into the specifics of how of this of the architectural decisions and model decisions that Waymo is is engaged in. If you could abstract it a little bit and maybe just talk about the space a little bit, you know. I'm curious as you talk about world models, you know, and having representation of the environment that brings in not only AI but the notion of simulation as as, you know, one of the tools in the tool chest if you will. I suspect, like, we have a lot of listeners that that are hearing lots of different AI use cases in general, but may not have as much expertise and autonomy.

Chris:

And so as you talk about that that notion of of representation of that environment, Could you talk a little bit about, like, what that problem looks like and what are different things that you might think of to solve it without having to get into how you guys have done it, but just kind of like, what is that what is that juxtaposition of simulation, AI, and representation of the world in the environment around you look like?

Drago:

So maybe I mean, simulation, if we're gonna go there, maybe I can just juxtapose two things there. I like saying this historically. I've been doing this for a while. There are two main problems in autonomy. One is to build this onboard driver and another one is to test and validate this onboard driver, and both are really, really hard problems, and people usually talk about the first one.

Drago:

But I think imagine there is some collection of models and you need to prove that it's safe enough to put them out in the real world. That's in itself a really challenging problem, arguably no simpler than putting the first model together, and that one, ultimately because you need to be a bit more exhaustive, potentially takes even longer time to build the full recipe to validate things properly, right? So these are the two problems. Now, in autonomy, what is different maybe than the standard AI models is there's a few things. One is ultimately output actions that commands to a robot that are a different type of data than traditionally, say, text and images.

Drago:

I think that's one. Another one is we operate under strict latency constraints. You need to react quickly. For us, what is also interesting in AV is this is probably the first serious domain where we had to really learn how to interact with humans in the same environment, so it's highly interactive multi agent setup, right? Then we have additionally, if you choose to add additional sensors and cameras, have a lot more modalities coming in and we have a ton of data.

Drago:

Essentially, the way to think of it is imagine you get maybe billions of sensor reading per second or even tens of billions, a lot, and you need to make decision over design, you need to have a context of many seconds of these sensor inputs, maybe a dozen cameras, half a dozen LIDAR and radar, and so you need to collect maybe five to ten seconds, some can argue twenty, thirty of context to make a decision, right? The decision is fairly low dimensional. It's like, okay, steering or acceleration, but the inputs are incredibly bulky, so you need to somehow learn the mapping from this extremely high dimensional space, representational space to decisions, that's very hard, right, under latency constraints, under safety critical constraints. That's what makes our domain interesting. Now, a lot of the things that work in machine learning in one domain transfer to the other, right?

Drago:

So yes, there is, for example, very similar scaling law findings that if you have cutting edge architectures and you do proper studies and scaling and you have a lot more data and compute and you feed it to these architectures, and now for every class of algorithms, there's a bit different scaling laws, but even the simpler, imitative algorithms that people also did in language predict next token, we can predict next action, right? There is these direct parallels. You can do reinforcement learning in language. We can do reinforcement learning in our simulator, right? These are the parallels, but how exactly things translate is interesting.

Drago:

The ideas translate. The implementation is a little more creative than the usual, just staying on the Internet because there is a bit of a domain jump to the real world, right? So that's interesting. The other part is compared to, say, language LLMs, can actually we have a paper MotionLM from two or three years ago where the idea was, Hey, why don't we talk tokenize motions to make them like language? Turns out it's a very effective idea.

Drago:

Now, it models that architecture, which is very LLM inspired. It models future interactions of agents in the environment very well. You can think of agents talk to each other with these motions they execute simultaneously in an environment and now you can leverage the machinery. We have this paper. It's quite effective, right?

Drago:

That's an example of this. Now, one other interesting point though is text is its own simulator. Essentially, you speak text to each other, that's the full environment. You spit out text tokens, text tokens, and text tokens. In our case, we well, we predict actions, we execute actions.

Drago:

Imagine now, but you need the simulator because now based on these actions, you need to envision what the whole environment looks like and how your, whatever, hundreds of millions to billions of sensor points look like, so now you need something that generates them as you act so you can test yourself how you behave over time. As you make decisions at a fairly high frequency, then there is a known problem which is called covariant shift. Essentially, can take you to places you may not have seen before in the data and there you may have particular failure things that you may not observe unless you push yourself and drive on policy to those places in the data, but to drive there now you need the simulator. The simulator needs to be realistic enough where you don't go somewhere else entirely as opposed to the actual place you will end up with decision making. That's another very interesting point.

Drago:

Simulation is hard. If you want robust testing, simply having drivers on the road is not a particularly scalable solution if you want to keep it erratic on your stack because some of the events happen once in a million miles or more and you would much rather test them in the simulator, but for the simulator, now you have to solve this problem, which is interesting and challenging. That's unique in our domain.

Sponsor:

What if AI agents could work together just like developers do? That's exactly what agency is making possible. Spelled AGN, TCY, agency is now an open source collective under the Linux Foundation, building the Internet of Agents. This is a global collaboration layer where the AI agents can discover each other, connect, and execute multi agent workflows across any framework. Everything engineers need to build and deploy multi agent software is now available to anyone building on agency, including trusted identity and access management, open standards for agent discovery, agent to agent communication protocols, and modular pieces you can remix for scalable systems.

Sponsor:

This is a true collaboration from Cisco, Dell, Google Cloud, Red Hat, Oracle, and more than 75 other companies all contributing to the next gen AI stack. The code, the specs, the services, they're dropping, no strings attached. Visit agency.org, that's agntcy.org to learn more and get involved. Again, that's agencyagntcy.org.

Daniel:

Well, Drago, I'm really intrigued by how you kind of helped me form a mental model for the types of problems that are part of the research in this area. I would definitely encourage our listeners to go check out waymo.com/research. There's a bunch of papers there that people can find and read, but also there's even a Waymo Open Data Set, which, supports research and autonomous driving. So, that's really cool to see. It's amazing.

Daniel:

I'm wondering, Drago, as you look at this kind of, I see all sorts of things from, scene editing to forecasting and planning to

Drago:

Did I mention you need to embody the agents in the simulator too? They're not deterministic.

Daniel:

Oh, yeah, yeah.

Drago:

Because they start doing different things, you need to, well, guide the agents to react to you in reasonable ways as well. Otherwise, they'll be reacting to an empty spot where you're no longer even if you collected the situation with your sensors, as you start radiating from it in the similar areas, still need the agents to do reasonable things, right? Yeah, yeah, yeah.

Daniel:

Yeah, yeah, that makes sense. And I guess that really kind of gets to my question a little bit, which is, I assume over the last five years, we haven't chatted, there's been a lot of progress in certain areas and maybe certain challenges that are kind of holdouts that remain very, very challenging and maybe not as much progress has made. So in this kind of autonomous driving research world, can you paint in broad strokes kind of where there has been very rapid progress as things have advanced and maybe some of those of the hardest problems to solve that still remain kind of at arm's length, if you will.

Drago:

I would say one thing for folks that especially closer to robotics, they will see just like the field of AI is going through some crazy inflection point of both methods people develop and popularity. I think the same is true in robotics and the same is true in AV. I've been in the space over ten years now just doing AVs, and I would say every couple years, our capabilities with AI and machine learning dramatically expand due to innovations, and this innovation train has not stopped. So where we are five years later compared to five years before in terms of modeling, I think is still huge improvements possible. I think we're moving more and more to machine learn power stacks and I think ultimately understanding how to leverage data driven elegantly, scalably handle this problem with data driven solutions.

Drago:

So that's been generally an evolution, and I think we understand how models behave better. I think these latest architectures and the scaling that we mentioned is a really interesting domain. We started studying it, for example, for a while back. So there's this paper we have, for example, of scaling laws of motion LM architecture. So it's an LLM like architecture.

Drago:

So you say, oh, what are its scaling laws? How does it compare to LLMs? We have a tech report on this, for example. Still similar kind of learnings transfer as LLMs, but there's some bespoke really interesting things. For example, for that architecture, improving what's called open loop prediction performance seems to correlate to improving closed loop performance.

Drago:

That's not always true, right? We see different scaling factors compared to language, like our motion space is nowhere near as diverse as language tokens, so we need actually, for the same set of parameters model, we need a lot more data of examples of how the world behaves to scale. These are interesting findings generally, right? So that's one. I think now as the architectures keep evolving, now there's diffusion and autoregressive models and now how each compare and how do they compare in open loop and closed loop.

Drago:

These are all very interesting areas people are studying. I think generally there's this question lately as well of how do you build the best simulator with machine learning and what kind of models are there? And, you know, most recently, there's some groundbreaking work, like the Genie model by Google. I don't know if you guys saw it. It's a controllable video, essentially.

Drago:

You can, like, give motion controls, and it dreams the video close to real time of what it should look like. So, essentially, you're you're controlling the world you're imagining a bit. Right? And you can do it in real time, or you can do it, of course, off board too or offline with even larger potentially models. Now these models are pre trained on a large amount of video and text, and so they capture a lot of knowledge of how the real world behaves, and it somewhat complements the knowledge that vision language models capture from the internet corpuses.

Drago:

How do they storylate? How do you mix them? Which one is beneficial for which type of tasks? These are all interesting capabilities that people are doing. Maybe one other interesting topic is there's a lot of talk about architectures for robots that are some combination of system two and system one architecture.

Drago:

You guys may have heard it, right? Now, we know that large models are more capable when trained on more data and more compute, but in latency sensitive situations, if they're too big, you can't run them in real time, so now the question is, okay, well, what if you have a real time model that handles most cases, but then you have a slower model that does better high level reasoning that runs at some slower hertz that helps guide and understand additionally and provide this to the fast model well needed while still keeping this reflexive capability. Someone jumps front of you, you still respond, right? These are interesting questions in our domain as well. There's many, actually.

Drago:

It's a really, really fascinating time and I think we're studying a lot of these questions just as the whole field is and we have some very interesting findings. Some of them not published yet. Generally, I would encourage people, come join us. You can, well, contribute to the premier embodiment of physical AI currently out there, and you can do interesting research, right? Sounds like fun.

Drago:

Yes. These are all fascinating topics. And of course, how to control hallucinations in all these models, how to determine when these models are out of domain and potentially making clear mistakes, right? This can happen. We have research experience with VLMs like many of the current ones, but we have a paper called Emma where we try to fine tune VLM for driving tasks, got a bunch of learnings.

Drago:

It can be quite good, but it has limitations too, right? So how do you overcome these limitations with additional system design is very interesting.

Chris:

I'm curious as we're talking about this and I'm really enjoying the conversation. And I work for another company in autonomy but in a in a slightly different context. I'm curious one of the things that is popular in the industry I'm in right now is solving for swarming behaviors. As you're talking about many autonomous vehicles that are that are having to collaborate in certain ways. I'm curious from your take that may or may not be an interesting problem for Waymo.

Chris:

I I don't know what your thinking is on that, but I would love to know when you look at that space, what are some of the things that you think about and are interesting to you about the notion of many autonomous vehicles collaborating together?

Drago:

That's been a very interesting area that actually there was earlier research that I was impressed with where people proved that if you can control groups of vehicles, you can improve traffic flow. So to me, we're not exactly swarming yet autonomous vehicles. They're still a subset, a relatively small subset of the whole traffic. It's mostly when I think of swarming, I imagine, say, a crowd of 200 people on Halloween all around the car and stuff like this, that's swarming. Or you go to downtown after a Giants game and they're exiting and that is swarming.

Drago:

Right? They're the human agent, so to speak, more prone these days to swarming than AVs still. Maybe we'll get more prominent. I think when you think of coordinating multiple AVs, in our domain already, they do send each other valuable information. For example, if one of our vehicles encounters some very complex construction, it can help pass information about it to the others.

Drago:

If we encounter potentially slowdowns or vehicles getting stuck, that kind of information can be passed. I think controlling jointly vehicles started becoming interesting now that we're getting to some kind of scale. I think one of the interesting domains where this is interesting is when you want to charge them. Imagine you need to charge now hundreds of vehicles in a location. How do you control all these vehicles so that they all get to the right place and don't block each other and it's all very efficient?

Drago:

That's one example of where you're fairly swarmed, it's your own warehouse, right, or a garage, where this comes up, and then down the line potentially there is opportunities to improve traffic flow for everyone, but that's still maybe in the future.

Daniel:

Well, you took us right there, Drago, as we're kind of getting close to an end here, I'd love to talk about that future. We were talking beforehand and I was saying, I'd love for you to share just what you're excited about. And that could be of course in general related to driverless research. It could be kind of in the AI ecosystem generally, something that you're excited about as you look forward to or are thinking about a lot. Does anything stand out so that we can ask you about it?

Daniel:

Hopefully not in five years from now, but maybe maybe the next time you're on in in less than five years, we can ask you ask you about it.

Drago:

Sounds good. Well, I'm around, so I could come probably faster than in five years' time.

Daniel:

Yeah. In a waymo. Yeah.

Drago:

Potentially, yes. I think maybe let's go in a couple areas. First, maybe as to parallel this chat we had earlier, maybe first about the product and then a bit about the AI. I think in terms of the product, in a way, with the safety studies we've shown, these are significant improvements over the baseline, and I think we've shown it already at scale with some fairly starts to become fairly good confidence or some statistical significance at this point. Maybe your listeners, I'm not sure they understand, but even just on The US roads alone, I'm not talking world roads, US roads, forty thousand people die every year from accidents.

Drago:

That's a lot. I think these gains are starting to become somewhat meaningful, so it starts becoming thinking, Hey, maybe we have a mandate to expand. We should be expanding. It will save people's lives. And you think about it, and then the question is, How can I contribute to expanding?

Drago:

I mean, ignore all the of course, I believe it's a great service. A lot of people love it for a lot of good reasons. We could potentially go into some reasons people found why they love it, right? But I think even just from the mandate, okay, it's helping in a meaningful way and I think being out there can make quite a dent against some of these numbers. So yes, I would love it to expand more.

Drago:

Now, we're doing that. I think to me then the question is what can I do to contribute to it, right? I think one of the most scalable solutions to tackling dozens of new cities and conditions in countries is machine learning and AI, right? So now for me, what I'm excited about is harness all the positive latest trends. I think for me, more directly first into the Weimar Foundation model work we're doing, where we can directly experiment and deploy them and then try to push more and more of them to contribute similar benefits to the main production systems, which is the onboard driver and the simulator, right?

Drago:

That's what I think about. Now, more specifically, if you don't want to go into AI techniques, I think this question of, Okay, how do I endow vision language models with more modalities? It's a fascinating one. We actually have some good results already. How do you expand to new modalities, say, LIDAR and RADAR?

Drago:

How do you connect it to actions, the model? What's an effective way to do this while preserving all the world knowledge that's present in the model that you're trying to build on top of? It's an interesting model and system design challenge. Then what I'm also excited is building the simulator, I think, as realistic, as scalable as possible. I think the modern technologies like the Gini model that I mentioned, these world models that are still relatively few and far between, but I think it's a ton of labs that are working on them today.

Drago:

I think taking that kind of technology and build the most generalizable possible simulator with it, I think is fascinating. Now, the interesting thing is you could do that, but they can still be very expensive to run, so you still need to show this is not just enough to show that it can handle very realistic, interesting cases. You still need to show how you can run it without breaking the bank. The amount of simulation Weimoo does today to ensure that we're safe, we run millions of virtual miles every day. That's a lot of things to simulate potentially with so many sensors on board and so on.

Drago:

There's some very interesting questions in that space. How do I get the maximum possible simulator realism and how do I get the maximum possible scalable simulator? There's a very interesting mix of technologies getting involved to do that.

Daniel:

That's awesome. Well, I'm certainly excited about that. Like I say, I encourage our listeners to check out Waymo's research page. Lots of amazing stuff to explore there.

Drago:

And folks can see our history. Right? Like, think you can see the kind of work and papers people did from, I think, 2019 to now. And there's almost a 100 papers there now. And maybe it's not 100 only because we may not have uploaded the most recent ones.

Drago:

I'll try to make sure we do soon if we're missing any, so if the readers go there, they can see, the full the full set.

Daniel:

That that sounds great. Well, thank you for joining us again, Drago. It's it's a real pleasure to have you on the show again, and let's let's, let's not make it five years next time. We'll we'll try to get you on and and hear the update sooner than that, for sure.

Chris:

Don't be a stranger.

Drago:

Thank you, guys. Pleasure to be on the show.

Jerod:

Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for show.

Jerod:

Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.

Creators and Guests

Chris Benson
Host
Chris Benson
Cohost @ Practical AI Podcast • AI / Autonomy Research Engineer @ Lockheed Martin
Autonomous Vehicle Research at Waymo
Broadcast by