GenAI risks and global adoption

Jerod:

Welcome to the Practical AI podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Jerod:

Now, onto the show.

Daniel:

Welcome to another episode of the Practical AI Podcast. This is Daniel Witenack. I am CEO at Prediction Guard, and I am joined as always by my cohost, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are doing, Chris?

Chris:

Hey. Doing great today, Daniel. How's it going with you?

Daniel:

It's going great. I traveled a bit over the weekend to run a mini marathon, which was fun, but I am back safe at home. And, of course, safety is is something that hopefully we can talk a little bit about today with our with our guests who are, Rick Kobayashi, who is cofounder and CEO at Citadel AI, and Kenny Song, who is cofounder and CTO at Citadel AI. Welcome.

Kenny:

Thank you for having us.

Rick:

Yeah. Thank you, Hwal.

Daniel:

Yeah. It's great to have you both here. Good to get introduced to you both. I'm I'm really excited about this conversation because, of course, security and safety related to AI is very close to my own work, so it's always great to connect with people in this space. Yeah, I'm also interested to hear a little bit, and I even saw some things on your website, you know, talking a little bit about some of the AI in Japan, some of the, I guess, regulations or guidelines for for businesses that have come out there.

Daniel:

So maybe before we get into the nature of what you're building and what you're doing, any thoughts for those out there that might be in The US or in Europe and be constantly exposed to the things that are going on in AI in those jurisdictions, Any thoughts about what is the same or what stands out about what's happening with AI in Japan?

Rick:

So, yeah, anyway, thank you very much for joining us. My name is Rick. And as for the Japanese conditions, basically, I'm afraid to say it's around one, two years behind The US situation, basically, up to now. But on the other hand, surprisingly, as for Gen AI, I think that Japan is one of the most advanced countries. Not the development of the foundation models, the utilization of the Gen AI applications.

Rick:

And probably it might be related to kind of Japanese animation or cartoon. So, you know, like there are lots of say robot or some of the say animations which specify some of the AI or robot technologies and, you know, Doraemon or there are lots of animation over there. And people are very getting familiar with such a kind of talking with robot or some of these advanced computers from the childhood. So in that sense, I think that many people do not have any hesitation to work with or talk with like a chatbot or any GenAI. So in that sense, the hurdle to introduce LNM or GenAI in Japan is very low in that compared with US or other countries.

Rick:

So in essence, development side or technology side, frankly speaking, Japan is behind The US situation. But as for the usage of such kind of a generic, I think, I hope Japan is one of the most advanced countries.

Daniel:

Yeah. That's really interesting to hear the sort of perception side of things. And of course with more, maybe even more adoption or adoption that's ahead of maybe other places in the world, it could be that users or usage of the technology has hit some bumps along the road or has hit some problems, which I know is a lot of what you all are working on. What has been the situation in Japan around regulation and business usage of AI? Is it on the more regulated side or less regulated side in terms of government and, I guess, regulation in terms of security or safety, privacy, these sorts of things?

Rick:

I think Japan is in the middle between EU and The US situation. So they say that it's a soft approach. So there is no strict regulation in Japan, but there are lots of kind of guidance in Japan. And people have already noticed some of the issues on the security side or safety side of AI. And on top of that, they are trying to introduce LLM applications for contact center applications.

Rick:

So they are concerned about more reputational risks rather than security risks. So in essence, they understood the importance of the safety of the AI or trustworthy of the AI, but more concerned about reputational risks rather than security risks.

Chris:

I'm curious as we're kind of talking about the adoption and the environment around that, as you kind of mentioned that in Japan, it's very you know, people are really getting into the utilization of LLMs and generative AI. Do you have any any kind of thoughts around, you know, what is it what is it that's driving that, you know, compared to other countries that you've observed? If you're looking at that kind of adoption in Japan versus The US or versus Europe or whatever. Any thoughts around that? Because I I had I had observed that as as someone in The US that Japan seemed to be implementing, and I was kinda curious if there was you know, what was the what was kind of the driving force there that that got the utilization up, especially among just your typical average everyday folks there, not outside of the AI industry?

Chris:

I would love your thoughts about that.

Rick:

I'm not so sure the the background reasons why Japan is more aggressive to introduce such a kind of L and M applications. Probably as I said, people have lower, say, risk of perception, I must say. In some cases, in Japan, like a chatbot or kind of, say, new technology might be a kind of friend in a sense. Sure. In US side, such a kind of a new AI is kind of enemy of a human.

Chris:

Yeah. So they're kind of a a little bit of a cultural difference in terms of how that that openness to adoption and stuff. So, yeah, that would make sense. That would make sense to me.

Daniel:

Yeah. And I guess sometimes your your friends can potentially hurt you even if they even if they're not not trying, which I know, you you all are are kind of involved in both, you know, the evaluation of these systems, you know, running metrics against these systems and kind of building trust, in in a very real way. Kenny, I'm wondering if we bring you in here and and maybe just, you know, safety and security around AI is such a broad topic now with so many people addressing it from so many different perspectives. I'm wondering if you could help us zero in on maybe the kinds of problems that you all are exploring and how those fit kind of more generally into the landscape of, I guess, security risks or threats as related to AI.

Kenny:

Definitely. Yeah. So thank you for having us on the podcast. My name is Kenny. I'm the co founder and CTO of Citadel So what we do at Citadel is we build software tools to help organizations test, monitor and govern their AI systems.

Kenny:

And in the world of LLMs, when customers come to us, they usually have some proof of concept that they've been developing internally. They've been using some foundation model, they're building some chatbot or some agentic workflow, and they come to us when they want to make that proof of concept production ready. Usually their main problem is they want to mitigate some of the risks that they see. Things like hallucinations, toxicity, or users trying to prompt inject the system. They're looking for a solution to these types of problems.

Kenny:

That's where we come in as a tool provider.

Daniel:

Yeah, and how does that, I guess, how are you seeing when you're interacting with your customers or they're coming to you with these problems? What is the impact of those problems like hallucination or injection? Is that something that is causing real problems or something that they've sort of heard about that they're concerned about, but it's maybe not causing real problem? What are you seeing there in terms of, I guess, the impact? Let's say I just want to throw caution to the wind and ignore these things.

Daniel:

What's the bad side of this that could happen were I to take that more loose approach? Any thoughts?

Kenny:

Yeah, I think it's usually a mix of both. It also depends on the risk appetite of the company that's developing the system. And typically, our customers are larger enterprise companies, both inside of Japan and outside of Japan. And so they have very mature risk management practices. And before they, you know, launch these POCs into a production service, whether that's internal facing or external facing, they want to make sure that they, you know, have appropriate controls in place to manage the risk and they've properly identified the potential risks, reputational data security, so on.

Kenny:

So yeah, I think for the customers we talk to, it's generally a big concern for them. And they come to us with the problem of mitigating some of these risks that they've already identified.

Chris:

I'm curious, as you've kind of identified, you know, what that kind of the strata of customers that you're looking at there, What what is kind of is there is there something that definitively kind of separates that kind of mature larger organization from some of the smaller ones? Do they have a different set of problems that they're coping with or maybe just haven't gotten far enough along in terms of maturity on risk management? What like, at what point do you see the uptake kind of falling off within maybe smaller organizations? You know, what's what does that look like as you as you move from that upper strata into the mid tier?

Kenny:

I think my my general perspective is that there's two ends of the spectrum of like startups to enterprise company. And for startups, the risk is fairly low because you don't have a brand to protect, you don't have an existing business to think about. You can just launch these POCs out to your customers very quickly and you can iterate quickly. For enterprise customers, they tend to be more guarded. They have a closer eye on potential risks to the business.

Kenny:

Those are the customers that really want pretty robust testing before deployment, and also monitoring and potentially real time guardrails filtering after deployment. I think it depends a bit on the size of the company.

Daniel:

I'm wondering with that, as you're looking to those types I'm curious a little bit of the backstory of Citadel. Maybe Rick, you could tell us a little bit of kind of how you came to these problems because you seem like you have customers now, you've developed some of these things. When did that happen? How early was that in kind of Was that before GenAI? Was that as GenAI was coming up?

Daniel:

How did things develop in terms of your thinking around these problems and how you would bring something to the market in terms of addressing them?

Rick:

So in that sense, was before Gen AI. And actually, the person who came up with such a kind of idea is not me, but Kenny. So Kenny has worked in Google brains, and he was one of the core member of the TensorFlow team. So he's taking a leadership to develop the most advanced AI technologies in Google and found that there should be a lots of risk about the trustworthiness or safety of the AI. And he believes that probably he is the best person to explain by himself.

Rick:

But anyway, he reached the idea that some of the tools which protect sort of kind of safety or security issues should become popular, especially in the enterprise companies where they may not have a bunch of AI engineers inside. So that is a background. Our company name Citadel itself is shown what we are focusing. So Citadel is, as you know, saying like a vault or castle. So we are the company to say, protect such a kind of risk, human from AI risks.

Rick:

Such kind of a concept is say, based on our company name.

Daniel:

Yeah, anything to add there, Kenny?

Kenny:

Yeah, that's a pretty good overview of the background. When we started the company, it was at the 2020. So it was before the era of large language models. And We were initially really focused on helping organizations monitor their traditional predictive AI models, so tabular models, vision models, that kind of thing. About two years ago, we started getting a lot more interest from our customers in LLMs.

Kenny:

And how do we reliably test this new technology? How do we integrate it into our workflows and our business applications? And so these days, I think a lot of our new customers come to us with LLM types of questions. And then we still have a lot of existing customers that work more on the predictive AI side.

Chris:

So Kenny, I'd like to follow-up as you guys were kind of developing the idea going back to that a moment for the company. And Rick mentioned that you had been there at Google Brain and that you were on the TensorFlow team. I'm I'm curious how as a you know, for me as as a someone who's used TensorFlow a bit. And so that was like, wow, you know, one of the people who helped put that together. I'm curious what parts of the experiences you had there led into the formation of Citadel in your mind is, as you had some of the the insights that you might have developed in your previous employment.

Chris:

Did did any of that what you know, how how did that lead to what Citadel does? How did that kind of bring forward when you and Rick started the company up?

Kenny:

Sure. So a bit more about my personal background. So in 2017 to 2020, I was working at Google Brain as a product manager. And my team was responsible for building machine learning infrastructure at Google. This included TensorFlow and also other platforms like TFX, TensorFlow Extended, and some of the work on Google Cloud's AI platforms and so on.

Kenny:

So sort of the software foundations that power a lot of Google's machine learning applications. Think Google tends to be pretty ahead of the curve in AI adoption. Think back then it was more called machine learning rather than AI. But basically what we worked on was making models at Google more reliable at Google production scale. That usually meant building these pretty sophisticated pipelines that not only target training the models, but also serving them in production.

Kenny:

We had pretty robust systems for monitoring data drift and validating data that enters production models, monitors the output of these models. And, you know, at Google, you can afford to have hundreds of platform engineers build out this kind of infrastructure to use internally. But for most other types of organizations, they can't make that level of investment in internal platforms. And we felt that there was an opportunity to sort of build some of these model monitoring and data validation tools for other companies. And so that's where the idea of Citadel AI kind of started.

Kenny:

And when we started the company originally, we focused a lot on adversarial attacks, actually back in 2020, since it was a very hot research topic. Basically, designing noise and images and other types of input data to trick models into making the wrong predictions. We found that after a few months, this wasn't really a problem that companies were interested in. It's a very interesting research problem, but less interesting commercially. After that, we pivoted towards more observability and testing of these predictive models.

Kenny:

These days we focus a lot more on LLM testing and monitoring.

Daniel:

Maybe you could just give us, going back to you, Rick, maybe you could just give us a few examples that stand out of kinds of companies and the kinds of models that they're running and the way that or some of the things that they would be interested in tracking or detecting or observing. A few concrete examples might help our audience kind of grasp some of the cases that you're working with.

Rick:

One of the largest customers in Japan is the financial industry, surprisingly. Like banks or insurance companies or security companies are very aggressive to introduce LLM applications into their core operations, such as like a contact center or internal applications. And when they say they can go through POC stage and when they get into commercial operations, they start caring about some of the kind of safety issues. So because those financial industry is governed by the government itself. So in essence, different from regulation related AI, they have already been kind of regulated by traditional, say financial regulations.

Rick:

So if say LM applications behave badly, that damage their core businesses. So to introduce such a kind of AI application is to differentiate or advance their services compared with their competitors. But by introducing new technologies, if their core business is damaged, that would be very, say, huge negative impact. And also, if AI behave badly, as I said, the Japanese government may say, in some cases, call or punish them. So that is one of the most largest risks for them to introduce L and M generic applications.

Rick:

So what we are doing is to making sure that such kind of a bad behavior will not happen in the application or not, and making sure that they can safely introduce Sego into commercial operation. So some of the leading the bank company or insurance company is our say our customers.

Chris:

Gotcha. I'm curious if I could follow-up on that for a second. We've kind of talked about the the notion of risk management in general that that industry is is dealing with. How do they given the risks for a large, you know, organization to implement LLMs? Do you do you have any insight into how they're making the the risk management judgment on the benefits of implementing maybe a new chatbot versus the potential downside where, you know, as you pointed out, could be damaged to brand, damage to core operations, damage from a regulatory standpoint.

Chris:

That's as I'm listening to you, that does sound very risky, you know, from that perspective. How do they make how do they evaluate that? Like, say that they know that they wanna come to Citadel and get that help. What do you have a sense from your customers what a typical balance on that is of the benefit of the LLM utilization versus the downside if things go off? How do they what are they thinking when they come to you in that way?

Rick:

So in that sense, they fully understand that balancing is very important. So to protect the risk so much may say delay the service advancement. On the other hand, say, without having any security or safety test, they may get into bad situation. So they need to know how to balance such a kind of risks and benefit. And they have set up several, like, internal organization to manage such a kind of risk insight.

Rick:

So, yeah, as you can easily imagine, financial industries they are very structured company. So there are some of like risk management department inside. And in some cases, are like AI governance team inside or something like that. So they jointly work together and try to manage such a kind of a risk. And at the same time, they try to introduce the advanced system so that they can differentiate themselves from others.

Rick:

And it's true as for the contact center, probably the same situation in The US. But when the consumer end user try to reach out the banks or insurance company and call it free dial, In many cases, they can't easily reach to the contact center person. They have to wait thirty minutes or something like that. So because of that situation, they are so aggressive introduce such kind of applications into contact center and internal purposes.

Daniel:

And I'm just thinking about this scenario where you have the contact center, you're introducing whatever it is, a chatbot, a voice assistant, that sort of thing. Then I'm thinking back, to what Kenny was talking about kind of how the company started evaluating models that were not GenAI models yet. I'm wondering, Kenny, if you could help us think about you know, what needs because if I'm understanding, you know, part of what you all are doing, part of it is evaluating the risks of a particular model or system, and part of it is observing those and monitoring those in real time. And if I think about a traditional model, let's say a model that detects tumors in medical imagery, right? I can have a very nice sort of ground truth dataset and Maybe it's hard to get because there's some privacy concerns, but I can still get it.

Daniel:

I need to get it to train my model. I have very specific metrics around whatever it is, accuracy, F1 score, etcetera, can sort of grasp what the performance of that model is, maybe even compare it to human performance. With something like a call center, in some ways people might struggle to connect that to real metrics that make sense, right? Because it's like, Oh, well, people could say anything to the chatbot. How do I know, one, what's going to come in either from a normal usage or malicious usage or whatever, and how do I connect that to any sort of metric around a model?

Daniel:

I think sometimes people struggle with this idea of metrics and Gen AI models or Gen AI systems. Could you help maybe clarify, what are some of the relevant metrics that people could think about in terms of these systems that might help them understand how the systems are behaving?

Kenny:

Sure. Yeah, that's a very spot on question. I guess before I talk about specific metrics, I'll just take a step back first. If we sort of think at a high level, what is the same between predictive AI and generative AI? I think the structure of how you maintain reliability is basically the same, right?

Kenny:

You need testing before deployment, and you need monitoring after deployment. And it's also very similar to like traditional software applications, right, where you have automated tests and automated monitoring. And so I think that part is the same. But the part that's much trickier for generative AI is that usually, as you mentioned, you don't have ground truth in the same way that you do for a classification data set, for example. And so the metrics that you use for evaluation are not as well defined.

Kenny:

So you can't measure accuracy, you can't measure position or recall. And the output of a generative AI model is also much more complex than just like a probability score. And so in that environment, it's very hard to determine how do we actually evaluate these things in a quantitative and objective way. So the approach that most of our customers take and most of the industry has gone in is basically using LLM as a judge. So you can craft these evaluation prompts that ask an LLM to evaluate some quality of some generated text.

Kenny:

So it could be, you know, a very simple example is sentiment. So you evaluate the sentiment of, of some text, you could do that with traditional, like a sentiment classifier as well. But there are more sophisticated metrics such as, you know, detecting hallucinations against some ground truth document, or measuring the relevance of the answer relative to the question. Or you might have more, we call them custom metrics that are designed to be domain specific. So if you have like a refund chatbot, you can design a metric that measures if the chatbot adheres to your company's refund policy.

Kenny:

And so these metrics, they're very flexible, because you can design the evaluation prompt in natural language. In our tools and our open source libraries, we have a set of built in metrics. It's like a library of metrics you can choose from. But for many of our customers, they also extend those built in metrics to customize them to fit their business applications.

Daniel:

So Kenny, you were mentioning this sort of idea of LM as a judge, which is using a model to evaluate the model in some sort of axis of performance or some quality, which definitely seems like a flexible option. But also some people might be thrown off by this sort of circular using a model to evaluate a model. Also, there's sort of this you then have a model that's evaluating the model, so how do you evaluate the model that evaluates the model? You kind of get in this. How have you all navigated that side of things, both in terms of using the larger model, making sure that the evaluations are sound and also transferable, one model to the other, and maybe benchmarking the system over time?

Daniel:

Because also the models you might want to use as evaluators might change over time.

Kenny:

Yeah, also a very good question. It's a question that we get from our customers quite a lot as well. And the way that we generally approach the evaluation workflow, which includes designing these metrics is not usually human judgment and taste is treated as the gold standard. But the problem with having humans evaluate every experiment with your LLM system is that it's very expensive and it's very slow. In the ideal world, you would design these LLM as a judge metrics that can mimic human preferences.

Kenny:

And so in our software tooling, this is what we design specific workflows to help users to do. Usually when they when a customer starts on a valuation project, they'll, you know, of course, think about the evaluation criteria that's important. But then they'll also have humans do a small set of that evaluation. So maybe like 50 to 100 of these manual annotations. And from there, you can design LLM automated metrics and measure their correlation and accuracy against the human judgment.

Kenny:

You usually need to iterate a few times to get that custom LLM metric as close as possible to the human judgment. But then once you have that, it's very powerful, right? You have this automated metric that is a very good proxy for human judgment and it's automated, which means you can run it at scale, you can deploy it during evaluation, but also in monitoring as well, and you can also potentially use that as a production guardrail in our firewall.

Chris:

Rick, I was wondering, I know in you know, we've kind of alluded to it that, you know, the the two sides of the equation in terms of testing the models and then monitoring. Could you talk a little about Citadel Lens and Citadel Radar? How you bring them to customers and what the relationship is between those two products that you're bringing to your customers and you know, how do you how do you go to how do you present them when when somebody is interested in, in being able to bring this level of security to the models that they're interested in?

Rick:

Sure. First of all, now we are merging radar features into lens. Lens can provide both testing function and the monitoring function together at this stage. And, as for the the balancing or difference between testing and monitoring, as Kenny mentioned, especially in the case of LLM, how to customize human, say, say, setting back. So so our system is in the concept that the system should follow the human rather than human has to follow the system.

Rick:

In essence, we set the human annotation or human judge as a first priority and try to our metrics to be customized to the human judge. So that is a very important point. And to make it happen, we need to or the customer need to go through testing phase first so that all the metrics should be aligned with the human judgment so that and we can make use of the same custom metrics during the, say, the monitoring phase or fire firewall stage. So in essence, even though final goal might be a monitoring or firewall, but the two before going to that stage, how to test and customize a metrics is very important I mean, very critical, to protect the safety and security and reputational risks. So we recommend strongly recommend to start from testing phase first.

Rick:

So testing phase is not just testing, but customize your metrics into your, say, professional, the person's. So that is a testing phase. And after that, they can go into monitoring phase. So that is our approach to the customers.

Daniel:

And I'm wondering, either one of you could answer this, but why is it and this may be obvious to maybe more of the software engineering type crowd, but maybe less so to some others outside of that crowd. Why is it important to once you've tested your model to actually monitor it online for maybe the things that are are potentially problematic inputs, whether that's a security thing like a prompt injection or, maybe something outside of the you know, some type of input that you wanna filter out like IP going in or something that doesn't fit your policy. Why is it important to have that monitoring piece and not just the testing piece? Because if I test my model and I convince myself that it can't be prompt injected, which I'm saying that sort of in jest because, you know, as you all know, every model there's no perfectly aligned model. There's no every model is vulnerable to various things.

Daniel:

But let's say that I convince myself of high performance in in one of these areas. Why then is it useful and necessary then to monitor that over time or in real time?

Rick:

Technically, probably Kenny is also again the best person. But even if the customers can go through the testing phase, the market condition or say, the human reaction may change over time. So if we do are safe right now, but if anything, say something new happens today, what we say guaranteed today may not apply tomorrow. So it's a very general things. But say, in that sense, keep monitoring is very important to protect our customers, even if the market condition or the world condition change or economic condition change.

Kenny:

Yeah, and just to give a concrete example of why you may want monitoring, so we really view them as complementary, and you really need both, if you want to make a system reliable. So for example, if you have like an answer quality metric that measures how high quality and answers, you should of course use that for testing to make sure that it meets some like, you know, 8090% bar. But then in monitoring, you actually want to measure quality of the real answers that your chatbot is giving to real customers, right. So from a quality perspective, it makes a lot of sense from like a safety and risk reduction perspective. Another example is that, you know, as you mentioned, Dan, during testing, you might test a bunch of prompt injections against your system.

Kenny:

But then in deployment, have real users, some of them are adversarial, some of them are actually trying to prompt inject. They may do it in creative ways that you haven't tested before. You may want some guardrail that will automatically detect those attempts and filter them out, even if you're sure that the model is robust to 90% of these attacks.

Chris:

I'm curious, you guys have some open source available out there. I know one of the tools is lang check. Can you talk a little bit about your approach to open source?

Kenny:

For context, lang check is our open source Python library that contains a suite of metrics that are built in that you can use for evaluating the quality of text. One of our motivations for creating this library is that I think we launched it in October 2023, roughly. And around that time, there weren't a lot of sort of industry standard metrics and practices for evaluating text, particularly in non English languages as well. So there was some focus on, you know, these metrics in English, but we work with a lot of customers that have Japanese texts or Chinese or German and these other languages. We wanted to make a library of these metrics that anyone can use.

Kenny:

We view this as a pretty good starting point. If you just need one or two metrics, you're comfortable writing code, you can use LangeCheck and integrate that into your test pipeline or your monitoring system. But then if you want something production scale, and you want an easy workflow to design custom metrics and test them against manual annotations, that's where our commercial product comes in.

Daniel:

Makes sense, yeah. And as we get a little bit closer to the end here, there's so many things and this is such a, so many in-depth areas to go in, which is why it's great that there's wonderful people like yourselves exploring, the topic. But I'm wondering, if we could maybe talk just a little bit as we close out here about what you're excited about, about kind of, yes, Citadel, but maybe the kind of general ecosystem that that you're a part of. As as you look to the future, you know, what's what's exciting to each of you about how the ecosystem is developing? What's becoming possible with the technology?

Daniel:

What's inspiring to you or what are you thinking about in terms of the future? Maybe I'll start with you, Rick.

Rick:

Okay. So now the chat GPT-five is released. But when we say look back today from say five years later, I believe that, oh, this is a very premature say model or something like that. So in that sense, the technology advancement in the AI field is so rapid. And in that sense, yeah, there may be a lot of risks coming in.

Rick:

But on the other hand, there are mostly infinite opportunities. So you can find that huge variety of possibilities, not only just say the AI directly related technologies, but I believe some of the material products or machines or anything will change maybe within five or ten years. So in essence, we are in the midst of the say period where anything can change in that sense, especially technology related. So there are lots of say risks may come in and we like to protect such a kind of risk as a company. But on the other hand, people can find many possibilities, opportunities for you to try.

Rick:

So I strongly believe that so even though there are lots of issues in the world that people can enjoy or say, make best use of this opportunity, everybody.

Daniel:

Yeah, that's great. What about yourself, Kenny?

Kenny:

Yeah, I think as a consumer of AI in both my personal life and work, from a consumer perspective, it's really exciting to benefit from all the advancements in these new AI tools and models. I really loved Oh, three as a model in chat GBT. And I love using cursor. I'm excited for these tools to become more and more agentic over time. I think that's the trend that you see.

Kenny:

If you just look at ChatGPT, originally it was just 3.5 and GPT-four that just answer a question based on forward pass of the model. But now these models will search the internet and it'll sort of reason and think about what to search next. As a result, the outputs have become a lot better. So really excited for that to improve even more from a consumer perspective. And then from a business perspective, I'm really excited to help bring these capabilities to our business customers and help them use AI more reliably and more effectively in in their business.

Daniel:

That's great. Yeah. Well, thank you both for taking time to to chat with us today. And thank you both for the work and thought that you're putting into the tools that you're building and the open source projects that you're putting out there. It's a great benefit to the to the community and to the business world, of course.

Daniel:

So thank you for the work that you're doing. And, yeah, we'll look forward to, to keeping keeping, an eye on on what you, what you evaluate and and protect us from next. So, appreciate you both, hope you have a great evening. Thank you for joining.

Rick:

Thank you very much.

Kenny:

Thank you for the conversation.

Jerod:

Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for the show.

Jerod:

Check them out at predictionguard.com. Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.

GenAI risks and global adoption
Broadcast by