Software and hardware acceleration with Groq

Jerod:

Welcome to Practical AI, the podcast that makes artificial intelligence practical, productive, and accessible to all. If you like this show, you will love the changelog. It's news on Mondays, deep technical interviews on Wednesdays, and on Fridays, an awesome talk show for your weekend enjoyment. Find us by searching for the changelog wherever you get your podcasts. Thanks to our partners at fly.io.

Jerod:

Launch your AI apps in five minutes or less. Learn how fly.io.

Daniel:

Welcome to another episode of the Practical AI podcast. This is Daniel Witenack. I am CEO at Prediction Guard, and I'm joined as always by my cohost, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are doing, Chris?

Chris:

Doing very well. How's it going today, Daniel?

Daniel:

It's going great. Yeah. It's been fun, productive week in the AI world over here at Prediction Guard, so no complaints. But this is a really I'm excited about this episode because it's one I've been wanting to make happen for quite a while.

Daniel:

Today, we'll be talking about both AI hardware and software with DJ Singh, who is a staff machine learning engineer at Grok. How are doing DJ?

DJ:

Hey, Daniel. Thanks for having me. It's been going well, yeah.

Daniel:

Yeah, good, good. Yeah. And I guess we should specify for our audience, this is Groq as in G R O Q. I imagine maybe some people get confused these days with that. But yeah, this is one that I've been really excited about, DJ, because I've been observing what Grok has been doing for some time and, of course, innovating in a lot of different ways, like I mentioned on the hardware side and on the software side.

Daniel:

Could you maybe just set the stage for us a little bit in terms of the overall ecosystem as you see it in terms of what may be a bloated term of, like, AI accelerator or hardware and also the software that goes along with that and and kinda where Grok fits into that ecosystem.

DJ:

Right. So I think I'll first start and just quickly brief about Grok. So Grok is, of course, a company which provides fast AI inference solutions. So whether it's text, image, or audio, we are delivering AI responses at blistering speeds and order of magnitude more than traditional providers. Now you spoke of AI accelerators and, traditionally, training and inference has been done on GPUs.

DJ:

But I think in the last few years, we've seen all sorts of AI accelerators come in place. So there are those, more mobile device oriented ones that phone companies like Samsung and Apple come up with, right? And then there are more more stuff happening on the server side, part of which is what Grok is also leading towards.

Daniel:

Yeah. That's great. And on the server or hardware side, am I correct? Is sort of Grok does have have their own hardware that they've developed over time.

Daniel:

Is that right? What's kind of been the progression of that and a current state?

DJ:

Absolutely. So Grok developed this technology which we call as Grok LPU. It's essentially a software and hardware platform which comes together to deliver that breakthrough performance of low latency and high throughput But Grok's how Grok got into it was first to develop that software. So we developed the software compiler first before moving on to the hardware side, kind of a shift in how traditional development was being done previously.

Daniel:

And did that mean, that does seem very unique to me. So what was the, I guess, the motivation or the thought process behind taking maybe that non standard approach, kind of compiler first, then hardware?

DJ:

Yeah. No. Absolutely. So, traditionally, as I mentioned, development is done and that a new accelerator is developed. So if somebody makes the hardware first and then the software has to deal with the inefficiencies of the hardware.

DJ:

Whereas when Grok decided, and now this company is founded by Jonathan Ross, our CEO, who was a co founder of, Google's TPU program, the tensor processing unit program, and based on his learnings from there, one of the key decisions was let's develop the software first. Right? So we have developed this software compiler which helps to convert these, AI models into, this code which runs on the Grok LPU, but specifically, the compiler is responsible for, scheduling each and every operation of that AI model. So you can think of it like an AI model in terms of computers made up of, like, additions and multiplications. And, it kind of the software compiler, it, decides where and when to schedule something.

DJ:

And that goes into our, you know, various design principles, one of which, of course, I mentioned is to be, software first. Right? Now you might ask, why, do we do this. Right? So one one key consideration is that not only does the software have to deal with hardware inefficiencies.

DJ:

Right? But there are other aspects of the hardware which can add on delays, whereas Grok prefers to have a deterministic system in place. So determinism, I would say, is like deterministic compute and networking to kinda have an understanding of where and when to schedule an operation. So to understand this, we can consider an analogy. Now imagine a car driving around along the road with several stop signs.

DJ:

Stopping at every sign is essential for safety, but it does add some delays. Right? Now what if the world was perfectly scheduled and we knew where to start the car and drive at maximum speeds so that there are no collisions. Right? So there would be no need for these stop signs, no delays as such, and it also makes a more efficient use of the road since you can then have more cars and everybody's, like, going at maximum or near maximum speeds.

DJ:

So to reflect this analogy and back to the hardware space, Grok chose to remove components which can add delays. So it could be, let's say, network switches or other even algorithmic delays, some sort of algorithms which control packet switching. These all things add non determinism into the system.

Daniel:

I did wanna maybe some of our listeners out there, like you've been talking about this compiler level which, you know, I think of a compiler, similar to what you said as, hey. I'm writing some higher level software code that's compiled to these instructions that run under the hood on the actual hardware components doing, as you said, additions or or whatever those sorts of numerical operations are. But people might sort of be confused also in terms of the software stack. They may be familiar with something like CUDA, which helps, you know, have have drivers to run on certain hardware like NVIDIA GPUs. Or I know, you know, we've we've worked a little bit with, Intel Gaudi processors, and there's driver package Synapse, which is similar in that sort of way, helps translate kind of your higher level code to run on these hardware components.

Daniel:

Could you help us kind of map out that software stack, like where this compiler fits in? And are there other components like these drivers that would have a parallel in the Grok world?

DJ:

Yeah. So traditionally, as you've mentioned like on let's say Nvidia ecosystem, there are like tons of engineers who go and create these kernels which are invoked when you have some sort of model operations. So there would be maybe even thousands of engineers in the company who would work towards developing this very specialized kernels to go and execute things. However, due to the the structure of the GPU itself architecturally, this is not the best philosophy for design. You know, I'm sure the audience is familiar with GPUs.

DJ:

I remember playing games on them growing up and editing videos, and these grew up to be more powerful in the recent decades. But, know, GPU started in the nineties and the design hasn't changed all that much. We've had an addition of high bandwidth memory and other hardware components to it, but all of it essentially is still the original design, you know, originating from the original design. It does make the system to be again less deterministic, so that goes back to the compiler system here. And let's let's talk about the NVIDIA GPU kernels here, right?

DJ:

So they have to deal with the different hierarchies of memories as an example. So for those of the listeners who are familiar with the different memory systems in a computer system, right? So you might be familiar with like an L1 cache, which has like an even time or access time of like one nanosecond, but you do then have these bigger memories which are high bandwidth memories, which are like closer to 50 to one hundred nanosecond. And for a task to be processed performantly, data needs to be fetched from between these different memories onto the compute, which is there, and that transfer of data adds in more delays. And since this is a conservative system, right?

DJ:

So let's say you have two operations and one depends on the other, it's waiting on that operation to complete. So it adds on further delays, you know. So one operation is stuck on waiting on the data. The other operation is stuck on the the second operation. Right?

DJ:

So that kinds of just incrementally adds more and more delays into this. So that's the an example of how the traditional, I guess, compiler or the traditional kernel based system, doesn't scale as well. What Grok chooses to do, of course, is not have any kernels whatsoever, but have a compiler which controls this at a fine grained level. So a typical system will have multiple chips. Right?

DJ:

So AI, you know, I'm sure people are familiar with models like LAMA 70,000,000,000, right? And these models tend to be spread across multiple GPUs and even on multiple Grok chips, right? And this compiler kind of controls how this model is precisely split across these different chips and how it's executed and to get the best performance out of it down to the level of the chipset and the networking. So as I mentioned before, we've removed a lot of the hardware which adds delay and this sort of scheduling is done, by by Grox compiler alongside with some assistance from the, of course, the firmware which is there.

Chris:

And I appreciate that. As we talk, I'm trying to get a good sense of kind of how the whole stack looks as you're starting to dive into it. You've talked a bit about the kind of the compiler versus having a kernel at the model layer there. But with you guys covering both the hardware and the software, would you say Grok, as I try to understand that whole business model that you're approaching it with, is it more of an integrator that's full stack all the way from the hardware up through the OS and into the model layers, or from an integration layer? Are you writing most of the software stack that's touching the hardware?

Chris:

How do you choose whether to go pick, and I'm just pulling things out of there, not attributing to you, but going and picking Linux and picking CUDA and picking this and picking that versus what you're writing to create your own full stack? How do you, I'm trying to get a sense of kind of how that's distributed, those decisions from a design standpoint.

DJ:

Yeah. No. That's a great question. So all the way from our starting stack, right, so let's start at the top. So most folks end up when think about using AI models in production would end up using some sort of API.

DJ:

So we our cloud organization designed a REST compatible API. It's compatible with OpenAI spec, which is there, which makes it very easy for developers to really integrate with it. And then that ties into all the way into our, rest of our stack. And to answer your question directly, yes, most of the stack has been custom written. We are of course using some, Linux based primitives which are there, underneath our system, and there are of course some components, such as, for the compiler, there is this, MLIR system which is being used.

DJ:

MLIR is like a compiler term. I don't want to go super deep into it, but it's like a multi level intermediate representation which kind of helps to transform things in between. So overall, I would say this entire design pattern has been thought through from scratch, and it's taken the company a couple of iterations to get to that point.

Sponsor:

Well, friends, I am here with a new friend of mine, Scott Dietzen, CEO of Augment Code. I'm excited about this. Augment taps into your team's collective knowledge, your code base, your documentation, your dependencies. It is the most context aware developer AI, so you won't just code faster. You also build smarter.

Sponsor:

It's an ask me anything for your code. It's your deep thinking buddy. It's your StanFlow antidote. Okay, Scott. So for the foreseeable future, AI assisted is here to stay.

Sponsor:

It's just a matter of getting the AI to be a better assistant. And in particular, I want help on the thinking part, not necessarily the coding part. Can you speak to the thinking problem versus the coding problem and the potential false dichotomy there?

Sponsor:

A couple of different points to make. You know, AIs have gotten good at making incremental changes, at least when they understand customer software. So first and the biggest limitation that these AIs have today, they really don't understand anything about your code base. If you take GitHub Copilot, for example, it's like a fresh college graduate understands some programming languages and algorithms, but doesn't understand what you're trying to do. And as a result of that, something like two thirds of the community on average drops off of the product, especially the expert developers.

Sponsor:

Augment is different. We use retrieval augmented generation to deeply mine the knowledge that's inherent inside your code base. So we are a copilot that is an expert and that can help you navigate the code base, help you find issues and fix them and resolve them over time much more quickly than you can trying to tutor up a novice on your software.

Sponsor:

So you're often compared to GitHub Copilot. I got to imagine that you have a hot take. What's your hot take on GitHub Copilot?

Sponsor:

I think it was a great one point zero product, and I think they've done a huge service in promoting AI. But I think the game has We have moved from AIs that are new college graduates to, in effect, AIs that are now among the best developers in your code base. And that difference is a profound one for software engineering in particular. If you're writing a new application from scratch, you want a web page that'll play tic tac toe, piece of cake to crank that out. But if you're looking at tens of millions of line code base like many of our customers, Lemonade is one of them.

Sponsor:

I mean, 10,000,000 line mono repo As they move engineers inside and around that code base and hire new engineers, just the workload on senior developers to mentor people into areas of the code base they're not familiar with is hugely painful. An AI that knows the answer and is available seven by 24, you don't have to interrupt anybody and can help coach you through whatever you're trying to work on, hugely empowering to an engineer working on unfamiliar code.

Sponsor:

Very cool. Well, friends, AugmentCode is developer AI that uses deep understanding of your large code and how you build software to deliver personalized code suggestions and insights. A good next step is to go to augmentcode.com. That's augmentc0de.com. Request a free trial, contact sales, or if you're an open source project, Augment is free to you to use.

Sponsor:

Learn more at AugmentCode.com. That's augmentc0de.com. Augment Code Com.

Daniel:

So, DJ, you you mentioned that the a lot of the focus around, you know, really that design from the hardware layer up through those software layers and and digging into all of those was to achieve fast inference. Could you tell us a little bit about the kinds of models that you've run on Grok and just some highlights in terms of when you say fast performance, what does that mean practice? Now I've seen some pretty impressive numbers on your website, so I won't steal your thunder. But yeah, just talk a little bit about kind of what is achievable with what kinds of models on the Grok platform.

DJ:

Yeah. So first of all, you know, I I share some numbers, but we are just getting started. So these numbers are only gonna get better with time. But, like, let's say let's take LAMA $370,000,000,000 as an example tends to be one of those industry standards for comparing performance. So we've had numbers all the way from like 300 tokens per second to like, multiple thousands tokens per second depending on those use cases, right?

DJ:

And, yeah, we've had some smaller models which go up to several thousand tokens per second. We've had our one of our speech to text models called Whisper, which is again an OpenAI model running on on Grok. And this model, I think we've gotten around 200 x as the speed of factor as they discuss it in the audio world.

Daniel:

Yeah. And and maybe talk a little bit about and maybe for those out there that aren't they're trying to process these thousands of tokens per second, what does that imply? I would say, you know, if you're using a chat interface, for example, and and something is responding at thousands of tokens a second, it's potentially a wall of text. It's sort of almost all at once as far as our human eyes see it. Could you talk a little bit about the implications of that?

Daniel:

So I mentioned the chat interface, which certainly some people are using chat interfaces. Right? But at the enterprise level for true enterprise AI use cases, why is fast inference for these kinds of models why is that important? Because in a chat interface, I can only read so much text so fast with my own human mind as it comes back to me. Could you give us some and I certainly have my own thoughts on this, but I'm wondering if you could think about why does that speed matter in enterprise use cases, and why does it matter to push that maybe further you know, our own speed of reading, for example?

DJ:

No. Great question. So I think if you were to start with, what Google studies from a decade perception over, like, search results is like if it takes longer than I think it's about two hundred milliseconds or so, somebody, like, lose interest, you know? So speed is critical, whether it's for the enterprise or everyday people, right? I mean, we've demonstrated this several times and, you can try out for yourself.

DJ:

You can have like, let's say you open ChatGPT with something like O1 or you have Grok on the side with one of our reasoning models, and you can try comparing them side by side. So what becomes more critical as I'm coming to is that we like everybody thinks of speed as as being, yes, it's important for real time applications, but then there is the aspect of accuracy, right? So if you could reason for longer, for let's say in the case of our reasoning model, so we've had like deepseq r one for example, right, and these models, they generate a lot of tokens. And if you can reason for longer, you can get higher quality results as a as a consequence of this. So while not making the system too slow for the user, right?

DJ:

So whether again it's enterprise or it's for everyday users, speed can translate to quality as well.

Chris:

So to extend that just a little bit, if you are, and we kind of been talking directly about inference speed and stuff like that, more from the practitioner standpoint, if you're maybe a business manager or a business owner out there, and you're looking at Grok, and you're kinda comparing it against more traditional inference options that are already out there, when you're talking in terms of speed, and for instance, being able to have the time to do the research and stuff, what are some of the use cases from a business standpoint where they need to go, it's time for us to reassess the more traditional routes that we've taken on inference, and look at Grok for these solutions? Could you talk a little bit about what some of those business cases would be?

DJ:

Yeah. I mean, if you care about accuracy, speed or cost, you should consider Grok. So not only are we fast, the Grok LPO architecture allows us to give really low cost or I would say, our cost per token are really low, and we pass on those savings to all of our customers. So if you are concerned about any of these cases and you wanna work with different modalities, if you care about image, text, or audio, if you care about rag, if you care about reasoning, we are there for you.

Daniel:

Yeah. And just to tie into that as well, some people might be listening to this and and thinking in their mind, oh, Grok has this whole platform that they've designed, hardware and software. I don't have a I don't have a I don't have a data center. It's going to be expensive for me to spin up racks of of these things. Could you talk a little bit about I could be mistaken, so please correct me.

Daniel:

I think that is something that can happen. There are physical systems that people can access and use and potentially bring into their infrastructure. But I know also I see a login, I see API as you mentioned, REST API in your previous answer about the developer experience. Maybe just talk through some of those access patterns and also how you as a company have thought about which of those you provide. Because certainly there advantages on the hardware side of maybe a fixed cost, but then there's the burden to support that.

Daniel:

Just talk us through a little bit about the strategy that you all have taken. Because you are deploying this whole platform, how have you thought about providing that to users and in what sort of, what sort of access patterns, I guess?

DJ:

Right. So I'd say to start with, one can go to our website grok.com and just experience the speed themselves. It's a chat interface, and then it's trivial to sign up for our, account over there, and on a free tier, we offer like tons of tokens over there for free. You can sign up and, get access to our APIs. So, once you get access to our APIs and let's say you've already been using an existing API, let's say you're using OpenAI, it's it's pretty easy for you to switch to, Grok and, it's maybe a single or two line change and just try it out for yourself, right?

DJ:

We firmly believe in letting people experience the magic themselves other than us talking about it. I think just actions speak louder. So, yeah. For, of course, our deep enterprise customers, we, of course, do offer other services on that side, right? So, we're talking about single tenant and then there's, of course, multi tenant based architectures over there.

DJ:

So we do offer dedicated instances where there's a real need for that, and we do manage that. So now Grok kind of deploys its own data centers, and we offer those all all over an API. So it's very easy for our customers to go and sign up and use them.

Chris:

I'm gonna ask you if you would if if you could kinda talk a little bit about it just because as folks are listening and stuff and they will go try that out afterward, and I know that we'll have links in the show notes to the site so that they can do that. But could you talk a little bit about, and you could pick your example, but you mentioned the OpenAI, and something that they've probably had experience with. It's one of those things that everybody has at least touched at some point out there, and you're providing a better experience here. And could you talk a little bit about what that is? When you talk about go experience this yourself, and you're gonna see how amazing it is, could you talk through what you've seen your customers experience in that way, just so that listeners will kinda get a sense, or maybe a preview of what they should experience?

Chris:

Having messed around with OpenAI for a while and now they're going over to Brock and they're doing that and they're going, woah, this is amazing. What is that amazing that you're expecting them to see?

DJ:

Well, first of all, people are are just amazed by the speed that they get, like the speed of the output that comes up, you know, whether it's text or audio, you just get the output right away right there. It's it's really, really fast and it's, I think, really makes people think of new ways of doing things. So, you know, one example from our developer community and, you know, our developer community has grown to over a million developers now. So one one recent example from a hackathon was that somebody, developed this snowboarding navigation system based on Grok, taking images and kind of trying to guide people while snowboarding. As I mean, my mind was blown by these creative geniuses that are out there.

DJ:

Just amazing. So all sorts of new applications out there, enabled by the speed.

Daniel:

Well, DJ, I I do wanna follow-up on some of what you had talked about there on the developer community. So could you maybe clarify one thing for me? So there's the the Grock systems that you have deployed and models that you have deployed in those systems, which it sounds like if I'm interpreting things right, people can just use, I'm assuming your programming language clients or REST API to access that API and build off of those models that are in that environment. So in that case, it's sort of accessing models maybe in a, like you say, in a similar way to they would access OpenAI models and that sort of thing. Is there another side of the developer community that is saying, Hey, well, we're actually, we have our own custom models, whatever those might be.

Daniel:

What is the process? I guess my question is, what is the process of getting a model supported on Grok? You've talked about mainly kind of the Gen AI level models of LLM or vision or transcription. How wide is the support for models in terms of, Hey, if I have this model I'm thinking in my mind, in a manufacturing scenario, if I have a model that's a very specific model that needs to run at extremely fast speeds to classify the quality of products coming off of a manufacturing line, right? But it's a custom model.

Daniel:

I say, Okay, Grok has the fastest inference. What should I expect in terms of model support as of now in terms of architectures and then your vision for that in the future and also how maybe people could contribute there if there is an opportunity.

DJ:

Yeah. I think right now, one can just reach out to our sales team and we can figure it out. So based on, the workload and, the size of the model and things like that, we could we could figure out what's the best path going forward. Now going to the future, we have some very exciting developments, but I don't want to spoil that right now since it's still a work in progress. So I guess we'll disclose that whenever we can.

Daniel:

And maybe kind of along with that, I know we have, you know, even my team, tried out running models on a variety of GPU alternatives. Sometimes what happens there is the latest model comes out on the market, and it's maybe you know, supported in certain driver ecosystems very quickly. And then maybe on some of these alternates, there's a there needs to be a kinda longer pathway for support in kind of custom software stacks that aren't, aren't GPU based. How do you all navigate that right now? I know, of course, our team is small and it's hard for us to navigate that.

Daniel:

Maybe you have people thinking about those things every day. But yeah, how do you navigate that challenge as an engineering team to support all of these different models as they're coming out given that you have a completely different software stack than others are working with in the ecosystem?

DJ:

Yeah. If you think about it, we don't have to write kernels per model level, you know? So when a new model comes out, generally on the GPU world and even other custom accelerators, typically, people spend a lot of time writing more optimal versions of it. So you might hear about, new coder kernels being launched. Let's say, you know, after the original attention, there was the flash attention one.

DJ:

So that's like more optimal way of running some of these models on the GPU. But we don't have to do this, at a per model level. What ends up happening is as we enhance our compiler over time, all these enhancements just reflect onto all of the models that we end up supporting, and the process to support, different models on Grok is, kind of similar. We end up spending some time removing vendor specific hardcodings. Right?

DJ:

So there tends to be a lot of GPU specific code, which we end up removing. And then we kind of run our compiler to translate this into, you know, finally to the Grok hardware, but there are a lot of knobs we turn, tweak, and turn to give you the best possible performance of of that. And as the compiler improves with time, we just end up passing on these improvements to all the models right away. So our effort per model is is not as high, you know.

Daniel:

So, just to clarify on that point, these models would kind of roll out. You would kind of build into the compiler kind of less vendor specific thing or more general functionality over time, which would expand your ability to support certain types of operations. But you wouldn't necessarily be able to say, hey. I've got this random model. I created my you know, some research team created their own architecture, right, of of this crazy thing.

Daniel:

It may take some effort to kind of map that into the Grok software stack, but maybe if I'm hearing right, sort of less burden over time as the ecosystem develops. Is that the right way to inter interpret that?

DJ:

Partially, yes. But I would add that if you think about what the Grok system is at the heart of it, right, it's matrix multiplication and vector matrix multiplications. And that's what most machine learning models are there. Right? Yes.

DJ:

When we have a generational shift like transformers, one might want to go and look at what's the new model type and how well does it map to our hardware. We might want to have some strategies to address some of that, right? But fundamentally, models haven't changed all that much, after the transformers have been introduced. Now, you know, you kind of hear about, diffusion models, even in the text world most recently. But as long as these fundamentals don't change frequently, I think our our core belief of, just supporting this wide ecosystem of models continues to live sturdy.

DJ:

If you if you look at other AI accelerators, some of them have gone and hard coded, let's say, to the transformer architecture itself, and their bet is that super specialization is the way to go. But our belief is that we would like to support a more wider scale of models. And that's that's pretty much, what our compiler system would do to kind of map between this high level, let's say, PyTorch model into the Grok platform, converting it to, let's say, intermediate layer where the compiler could work independently of what model it is. So there's no hard coupling, let's say, to a particular model or to even an architecture type. It's it's very low coupling.

Chris:

I'm curious. I've been been really kinda spinning on the on kind of the speed of of you're talking about in terms of inference and some of the capabilities that your stack offers. As in general as the model ecosystem has been developing into second half of last year and into this year, raging into this year, kind of agentic AI, and then that's kind of evolving into physical AI, and so you're dealing with robotics and autonomy and things like that that you're supporting to where we're expecting an explosion of devices out there in the world that these systems are supporting, what is your strategy forward and approach for thinking about kind of physical AI that we're evolving into, where you have agents that are interacting with physical devices that are interacting with us in the real world, so it's not all in the data center, but the data center's supporting that. How does that fit into your overall view forward?

DJ:

Yeah, I think the AI industry revolves very rapidly. Like personally, I don't think, there can be any long term strategy which will not need adjustments based on developments, but, our belief is still that I think edge based deployments, you know, and calling things over over APIs, will be the preferred, interface, going forward for a long time, right? So, sure, your, let's say, mobile chip might be able to perform some basic level tasks over there. But if you need like really high accuracy, high quality, model inference, doing this over an API, I think would get you there. So compared to the, you know, like, model size which you can actually deploy on a mobile phone.

DJ:

So just another example for, like, an edge device.

Daniel:

I have one question for you just as a as an engineer that has been working on at the forefront of this inference technology, what has been some of the challenges that, I guess, you've faced as you've really dug into these problems, maybe that were unexpected or maybe they were expected for you? What have been some of the biggest challenges and maybe some learnings that looking back on your time working on this system, you can share with the audience?

DJ:

Yeah. No. Great question. As I said, I think the AI industry moves really fast and sometimes there are these shifts. Right?

DJ:

So we saw this shift to large language models and that's when the company itself kind of pivoted to focus on this. So Meta releasing LAMA and the LAMA two series of models was really what got our company to, focus on this side and really push on this, right? So similarly, I think we are a startup. We are always pushing on all fronts, always trying to improve on things. So whenever there's some new architectural change, we look to see how we could best adapt our system for that to get to maximize throughput, right?

DJ:

So sometimes there are these kind of changes and, you know, this is something which actually excites me about Grok and working at such a talent dense company. My colleagues really come up with really great exciting new ways of doing things to really push the bar on some of these things. So, maybe it's like could be a mixture of experts or reasoning models whenever something new comes up. Right? I think that's getting the maximum performance out of that is something we care about.

DJ:

We deeply care about. And, yeah, I think that's been, one of the key areas.

Daniel:

Awesome. Well, as we kind of get close to an endpoint here, this has been fascinating. I'm wondering, DJ, if you could just close us out by sharing some of the things that you think about personally kind of going into this next year. As you mentioned, things are moving so fast. There are shifts that are happening.

Daniel:

What are some of the things that are most exciting for you as you kind of head into this next year of development and work?

DJ:

So as a developer and amateur data scientist, I would say that for me, the push on the coding side of the AI world has been very exciting. It helps me kind of think about how can I have more impact, whether it's at Grok or in the world in general? So the push of AI on the coding side, reasoning models, multiple modalities and the fusion of all of this, right? I think that's what I really want to look forward to for the next couple of years. There's of course the robotics bit which we touched upon, but that I feel is probably a couple of years down the line.

Daniel:

Awesome. Well, thank you, DJ, for for representing Grok, and and, congratulations on what you and the team have achieved, which is which is really amazing and and monumental work. So great work. Keep it going. We'll be excited to follow the story and and hope to get an update again on the podcast sometime soon.

Daniel:

Thanks.

DJ:

Sounds great, guys. Thanks for having me.

Jerod:

All right, that is our show for this week. If you haven't checked out our Changelog newsletter, head to changelog.com/news. There you'll find 29 reasons. Yes, 29 reasons why you should subscribe. I'll tell you reason number 17.

Jerod:

You might actually start looking forward to Mondays.

DJ:

Sounds like somebody's got a case of the Mondays.

Jerod:

28 more reasons are waiting for you at changelog.com/news. Thanks again to our partners at Fly.io, to Brakemaster Cylinder for the Beats, and to you for listening. That is all for now, but we'll talk to you again next time.

Software and hardware acceleration with Groq
Broadcast by