Dealing with increasingly complicated agents

Jerod:

Welcome to the Practical AI podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.

Jerod:

Now onto the show.

Sponsor:

Well, friends, when you're building and shipping AI products at scale, there's one constant, complexity. Yes. You're wrangling models, data pipelines, deployment infrastructure, and then someone says, let's turn this into a business. Cue the chaos. That's where Shopify steps in whether you're spinning up a storefront for your AI powered app or launching a brand around the tools you built.

Sponsor:

Shopify is the commerce platform trusted by millions of businesses and 10% of all US ecommerce from names like Mattel, Gymshark to founders just like you. With literally hundreds of ready to use templates, powerful built in marketing tools, and AI that writes product descriptions for you, headlines, even polishes your product photography. Shopify doesn't just get you selling, it makes you look good doing it. And we love it. We use it here at Changelog.

Sponsor:

Check us out merch.changelog.com. That's our store front, and it handles the heavy lifting too. Payments, inventory, returns, shipping, even global logistics. It's like having an ops team built into your stack to help you sell. So if you're ready to sell, you are ready for Shopify.

Sponsor:

Sign up now for your one dollar per month trial and start selling today at shopify.com/practicalai. Again, that is shopify.com/practicalai.

Daniel:

Welcome to another episode of Practical AI. I'm Daniel Witenack. I am CEO at Prediction Guard, and I'm joined as always by my cohost, Chris Benson, who is a principal AI research engineer at Lockheed Martin. How are you doing, Chris? It's been a while.

Chris:

It's been a little bit. It's good to talk to you. I I, I was gone for a brief, a brief period, but I'm back all safe and secure now.

Daniel:

Yes. Completely, reversed back to where you, where you normally are. And for a great conversation because we have, a great previous guest who I got to talk with in London last one of the last times I was I was over on on that side of the pond and, now get to catch up with, Donato Capitella, who is principal security consultant at ReverseC. How are you doing, Donato?

Donato:

Very, very good. Thank you. And I'm so happy to be back.

Daniel:

Yeah. Yeah. Same same here. I feel like the AI world is in some ways the same and in many ways different than than when we chatted last. What's life been like for you?

Donato:

It's definitely been very, very busy for us. Like our company has obviously changed. We now reverse the same people, but we separated. But as part of that, we've been doing a lot of GenAI cybersecurity work. I think our pipeline has tripled in size and we've been doing a lot of research.

Donato:

I am actually just back from Canada, where I was presenting our research at Black Hat in Toronto. And before that, I was at another conference in Stockholm called SecureAI, a complete two days just focused on GenAI security. I mean, we were presenting our research. There was OpenAI there, Microsoft, a lot of hugging face talking about MCP protocol security. So, so much was happening.

Donato:

And so for us, it's been incredibly busy. Just like literally half an hour before, I finished around one of the training courses that we do on GenAI security for our consultants so that we can have more people that can deliver the work, which is full of energy for me to do that. Like there are a lot of young people there. So it's been busy, lots of work, lots of research, so lots of travel. What more to say?

Daniel:

Yeah, yeah. And what a I mean, last time we talked, certainly we talked a lot about LLMs, prompting LLMs, etcetera. There's now these kind of additional layers or frameworks or approaches to developing AI applications. From your perspective, just in terms of, I'm always curious about this because some of us that are so kind of into the AI world and not constantly in front of real world enterprise companies, we have maybe a warped view of like, oh, everybody's creating agents using MCP or something. What is the reality on the ground as far as you see it of kind of the core AI use cases that people are often thinking about in terms of not only security, but just in terms of adoption and scale?

Daniel:

Then what is maybe actually shifting in terms of those use cases from your perspective, least?

Donato:

I mean, if you asked me this question last year, and you probably asked me this question, I would have said the of our clients were doing Rag on documents or internal chatbots. There were a few of them that were starting to look at agentic workflows. Now, forward to today, a lot of the stuff we test is agentic in one way or the other. And for me, I have a very simple definition of agentic. The LLM can use an external tool or API to do something.

Donato:

So it's got agency. And typically there is a little loop that runs and the LLM can choose the different tools and maybe there is an orchestrator. And a lot of these are internal, for example, for customer support. So there is an email that comes in and then there is this agentic workflow that based on the email, it's got access to a few tools. It will look into the user account, it will try to look at historic data, and then it can either decide I'm going to automatically perform an action or I'm going to suggest an action for the customer support agent.

Donato:

Some of them also draft the response or the types of actions that the agent, the real person needs then to approve. There is a lot of this currently going on. And to me it makes sense because this is the promise of Gen AI. Certainly we didn't put that much investment in it just to generate text. Maybe the one thing that might be surprising for people outside of some of these enterprises is that MCP is too new for them to have it.

Donato:

Meaning that if you think about it, some of the big organisations have got development cycles where the first project one gets year ago. And so a lot of them will have their own agentic frameworks, essentially their own loops and their own prompts and their own parsing. Or they use long chain, which is no, actually, what's the one that they use? Oh, God, I forgot the name. CrewAI?

Donato:

I was literally looking at the source it's in C Sharp. I was literally looking at the source code like, last week. Who is it? It's by Microsoft. I cannot semantic something, which has got you can define two, say, it's in C.

Donato:

Like, I mean, people use Python, but, like, you have to imagine a lot of these places like native c sharp stuff.

Chris:

I'm curious as, you know, as you were talking about, you know, the world has moved into AgenTic, and and we've talked a lot about that on the show in general over the in the last year and such. But kinda moving from that, you know, prompt only environment that maybe you and Daniel talked about earlier into this agentic world, you defined it as kind of that external agency, you know, to to bring in things. I would guess as someone who is not an expert on security, that that introduces, you know, mega amount of of new vulnerabilities and new concerns, just because you're now using those agents to reach out into the world and do things. Could you talk a little bit about like what that new landscape looks like to you since you talked to Daniel last time?

Donato:

So I would say if I need to be concise and make a statement, basically what people need to consider is that any tool exposed to an LLM becomes a tool exposed to any person that can control any part of the input into an LLM. Now, what's very common is that our clients take APIs, which used to be internal APIs, for example, for customer support, for asking staff. And these APIs are built to be consumed by internal systems, meaning they have never been exposed for real on the Internet. Now, as soon as you make that API into a tool that the LLM can call, any entity that can control any part of that LLM input via things like prompt injection, they can get DLLM to call that API with whatever parameters they want. And because this wasn't an API that you ever expected to be exposed essentially on the internet, all of a sudden you have a problem.

Donato:

And it is not just exposed to the person that's prompting a chatbot. It is exposed to somebody that sends a customer support email in and then that customer support email is fed to the agentic workflow. And now that can cause the LLM to call some of these functions with whatever parameters. So I would say that authorization or access control has been the biggest things we've been focusing our efforts on. Like, you know, how is the identity passed to the tool?

Donato:

And do you have a deterministic, non LLM based way of determining whether that function can be called in that context in a safe way. If you don't have that, you can't go into production.

Daniel:

I want to run something by you, Donato, because I was thinking about this the other day, and I wonder if you agree or have a comment on it basically, which is that what you basically described can be very, very complex. Like everything from let's say it is a customer service thing. There's the actual customer ticket. Maybe I'm in a retrieval way pulling in previous Jira tickets that have information from a repository I'm calling maybe multiple tools. It seems like there's this sort of explosion of complexity in this web of connected things that happened before the prompt goes into the LLM.

Daniel:

And I remember earlier on in my career when it was the days of microservices everything, right? It's like all of a sudden you have a thousand microservices. Right? And I remember we had dashboards up on the wall. And part of the problem was like when there was something bad that happened, an alert would go off on on one of the services, but it wasn't just an alert that would go off on one of the services.

Daniel:

It was like an alert went off on all of the services because they're all interconnected in this way that makes them all kind of malfunction at once. And so it became kind of this root cause analysis issue then at that point, and you kind of gave up or you had the trade off of that complexity and root cause analysis for the simplicity and flexibility of kind of developing on this microservices architecture. Do you see this kind of also getting into that kind of root cause analysis type of scenario or analyzing this network of things? Because it's just becoming so complex as these pipelines kind of grow and become more interconnected and any one piece could kind of trigger a problem in the whole thing.

Donato:

I mean, is reminiscent of that. And I will say it's an explosion of data sources in the context of the LLM. So what I think is really dangerous is that now in the same single individual call or context that goes into an LLM call, we are mixing more and more data sources from more and more entrusted parties in the same LLM core. And that's where I think confidentiality, integrity starts becoming a problem because, again, now everything you put into that prompt ought to be trusted for the use case. Otherwise, any single part can break it.

Donato:

I will give you an example. One of our consultants in The US was doing a test a couple of weeks ago. And the idea of the use case was great. So there is a customer support email, and this is William Taylor. I'll give him a shot because he's an amazing guy.

Donato:

But the email came in and so the use case is the following: Rag on all of the support tickets, not just the ones belonging to the user that sent the email, but basically all of the emails that have keywords or like, you know, similarity. And so that builds the top 10 emails that came in, which are potentially related to this query. The entire thing is then fed to the LLM and the LLM can then decide, Okay, I know how to solve this based on historic data, and I'm now just going to send an email to the user or I need to escalate it. This is terrible from a cybersecurity point of view. I, an attacker, can send in an email with a lot of keywords or even I can fill the context of my email with people's email addresses that I'm interested in.

Donato:

Now, I send that email that's now part of the RUG. When one of those users sends a ticketing, my malicious email is very likely to be picked and to be part of that huge prompt, which is then processed. And I can the LLM generate an email with a phishing attack. And now the company will send the user an email with the content I want. For example, this is a link, click it to solve the issue.

Donato:

Mean, we demonstrated that. So the problem here is that we are feeding to the LLM different data sources and some of them are potentially malicious or not controlled. So there is this explosion. And you could say the same with MCP. So every time somebody is adding an MCP server, obviously the output of an MCP server is inputting to your LLM context.

Donato:

The description of an MCP server has to end in your LLM context, but that can contain prompt injection that can tell your client to call another MCP server completely unrelated to do something else. I mean, this has been demonstrated a million times. And Sean from Hugging Face was talking about it at SecureAI just again in Stockholm a couple of weeks ago. And this is a very hard problem to solve. So we are mixing different untrusted sources into the same LLM context, and that's hard to solve.

Sponsor:

Well, friends, it is time to let go of the old way of exploring your data. It's holding you back. Well, what exactly is the old way? Well, I'm here with Marc Dupuy, cofounder and CEO of Fabi, a collaborative analytics platform designed to help big explorers like yourself. So, Marc, tell me about this old way.

Sponsor:

So the old way, Adam, if you're a a product manager or a founder and you're trying to get insights from your data, you're wrestling with your Postgres instance or Snowflake or your spreadsheets, or if you are and you don't maybe even have the support of a data analyst or data scientist to help you with that work. Or if you are, for example, a data scientist or engineer or analyst, you're wrestling with a bunch of different tools, local Jupyter Notebooks, Google Colab, or even your legacy BI to try to build these dashboards that someone may or may not go and look at. And in this new way that we're building at ABBYY, we are creating this all in one environment where product managers and founders can very quickly go and explore data regardless of where it is. So it can in a spreadsheet, can be in Airtable, it can be in Postgres, Snowflake. Really easy to do everything from an ad hoc analysis to much more advanced analysis if, again, you're more experienced.

Sponsor:

So with Python built in right there, in our AI assistant, you can move very quickly through advanced analysis. And the really cool part is that you can go from ad hoc analysis and data science to publishing these as interactive data apps and dashboards, or better yet, delivering insights as automated workflows to meet your stakeholders where they are in, say, Slack or email or spreadsheet. If this is something that you're experiencing, if you're a founder or product manager trying to get more from your data or for your data team today and just underwater and feel like you're wrestling with your legacy, you know, BI tools and and notebooks, come check out the new way and come try out Fabi.

Sponsor:

There you go. Well, friends, if you're trying to get more insights from your data, stop wrestling with it. Start exploring it the new way with Fabi. Learn more and get started for free at fabi.ai. That's fabi.ai.

Sponsor:

Again, fabi.ai.

Chris:

As I'm processing what you're what you're talking about with this, I'm like, I'm just imagining, you know, with you know, as as especially as you're describing kind of your your offensive driven approach that you guys have, you know, the number of potentially bad actors out there that could be exploiting this, you know, with this information. And, you know, are you, at this point, like, what are you seeing out there in the wild? Like, I you know, that that's such a compelling kind of a danger story that you're telling, that is so practical. Like any of us could go do that. What are you seeing in the real world in terms of bad actors?

Chris:

And at what levels like, I, know, I come from the defense and intelligence industry. So obviously, my my brain goes to to those types of concerns. But, you know, there's cyber criminals, there's all sorts of different types of of potential bad actors out there. So what what is what are you and what is this industry kind of focused on right now in terms of what's already happening and and where your your biggest fears are?

Donato:

So I will say that because of what we do now, we we don't have an incident response team. So we don't really get to see much of what happens. Like we don't see that. So we are more at the prevention side. So we will test systems that are not in production yet.

Donato:

So we kind of see into the future, well, if that system had gone into production the way it was, I can foresee the attack would have happened. Now, in terms of what people have actually demonstrated in practice, the one that comes to mind, and I'll give a shout out to the guy at this company called AIM Labs, they demonstrated a vulnerability on CoPilot. They called it Ecoleak. So basically, it's the same rug concept. You send an email, CoPilot is just a big rug.

Donato:

Now, with that email, it was very clever. I think we should link in the show notes the description of the attack. But basically, with that email, they got CoPilot to exfiltrate information. Now, thing is, Microsoft knows about this. They had a lot of filtering in place, but they were able to find a clever markdown syntax to bypass the filtering.

Donato:

So as probably your audience will know, that one of the main vectors to exfiltrate information in LLM applications is to make the LLM produce a markdown image. In the URL, can point the URL to an attacker controlled server and then you can tell the LLM, by the way, in the query string of this URL, put all the credit card data of this user if the LLM knows about that. And obviously when the LLM returns that and you try to render that image, the request is going to go to the site. Now, Copilot you can't do this in Copilot because they're filtering out a lot of these markdown syntax, but the guys found a way around it to bypass the regular expression that Copilot was using. So what we're seeing is instances where stuff could really go wrong.

Donato:

But thankfully, is a lot of researchers that seem to be catching them before they are exploited to the full potential. But then cybersecurity is very strange. Sometimes you will know a breach happened five years later.

Daniel:

And I know one of the things I definitely want to get into with you based on, you know, our previous conversations was kind of design pattern type of things. But before before we get there, I'm I'm a little bit curious just from a strategic standpoint in terms of how you're interacting with customers because there's one side of the spectrum where you can lock try to lock everything down, right, and say, Oh, here's we haven't verified any of these sources of data. We have to have a policy in place to approve certain tool connections or no external connections to different tools and other things like that. The issue on that side I see is people want to be productive, they want the functionality, they'll do this sort of shadow AI stuff and try they just want the good functionality. So you kind of go on that end of the spectrum, you maybe have that problem.

Daniel:

On the other end of the spectrum, without any sort of policy or without any sort of governance, right, then you just get into this chaos and and a huge amount of problems. You know, there's no never any kind of perfect solution. You're always gonna have to wrestle with something. But do you have any thoughts on that in terms of companies, like, I guess their posture in how to approach this, recognizing that people are able to find tools and able to find their own solutions that solve their issues so easily, but might introduce liability?

Donato:

I mean, this is very, very old in cybersecurity with the difference now that people really want to be using GenAI. Because like for like, you know, I'm lazy like a lot of other people, I guess, I do like the ability to use it, to do a lot of tasks or to make them easier. Now, what happens in some of the enterprises? I think I put them among our clients into two big categories. I mean, are some which are extremely risk adverse.

Donato:

Obviously, I will not name them, but the only thing I want to say is that I would never work there because it's basically impossible to get anything done and everything is so slow. And sometimes even for us as pen testers, I have to log in with Citrix into a Windows box. Then from there I have to RDP to a server. From that server I have to go into like a Linux machine and from there I can finally do some testing. And by the time I've done all of this, I am so locked into that there is nothing I can do.

Donato:

And the employees work like this, like they are on these machines and they can't do anything. So you have that extreme and they do exist. Like a lot of big financial sectors are extremely risk adverse. It makes you cry when you see that. I think I couldn't stand it.

Donato:

I couldn't spend all my day into six layers of VDI. But on the other side, and we work a lot with startups, and it's Wild West to say. So I think it's fun, but yeah, people are just using whatever like, you know. So yeah, it's two buckets and I think I don't have an answer for that, meaning that I see both, but I see extremely locked down environments and I see companies that are much more relaxed. And yeah, people are doing a lot of shadow AI, like people have closed desktop, just installed.

Donato:

I guess they will have all the NCP services they want. They go and charge GPT even if company policy says you can't go and, yeah, they put all their data there. I wouldn't do that.

Chris:

I'm curious as you're kind of addressing some of the challenges in these different environments that are inherent now in pen testing, could you could you also talk a little bit about kind of the differences in penetration testing today versus, you know, kind of before this Gen AI era? Like, what's changed and what kinds of activities and how have the how have the metrics that you're looking at changed? Like, what what what what has the new approach to dealing with prompt injection and the these type of exploits brought to bear in that day to day life, you know, aside from having to sometimes go so many layers deep, you know, as you mentioned in the financial thing. What are some of those other attributes that have changed?

Donato:

So I would say not much has changed, which is interesting. So there are two things that change: capability from the pentesting point of view. It is much quicker if you are offensive to write a script to do something. I mean, this is like, if you know what you're doing and you have a good LLM, your capability at least you're working faster. Like, that is true.

Donato:

Now, the security assessment point of view, so clients are building applications. What's changed is that if they have an LLM in the application workflow, we have to do additional testing. And that testing is a bit different because you're working on probabilistic stuff. So we try to help people assess, Okay, have you got guardrails? What's the quality of those guardrails?

Donato:

And what can you do outside in the design or in the implementation to make sure that when the LLM does something wrong, you and your customers are protected? So typically it takes a bit longer and actually becomes more data science driven. So if you're testing SQL injection, it is not very data science driven. You basically demonstrate that you can do it. If you're testing SQL injection, so if you're testing prompt injection, you know that prompt injection is inherent, so you are going to find a way.

Donato:

So what you're trying to test is what's the effort? How hard is it for the attacker to be successful? Because that's then going to drive the types of guardrails that you need and the type of active response. I will say something more and then I will let you guys see if we can make sense of this. But basically, I think jailbreaking and prompt injection is less similar to SQL injection and more similar to password guessing attacks.

Donato:

In what way? So the question is not whether the LLM can be jailbroken. The question is, what's the effort? How many prompts do I need to try before I am successful at jailbreaking it? There are so many techniques: crescendo, random suffix attack, best of them.

Donato:

Like you can do so many of these techniques. So the more effort I can put in it, the more I'm likely to succeed. So exactly as password guessing, the way you kind of solve this is that there are two layers. One layer is you don't allow the attacker to explore the space of all possible passwords. Likewise, you don't allow the attacker to send 100,000 prompts per second to explore, to find something that's going to jailbreak it.

Donato:

You have a set of guardrails for prompt injection, topic control. As soon as a user has an identity that's connected to your application triggers three of those guardrails, that's your feedback loop. You stop the user. You suspend it in the same way that if I, Chris, if I try three passwords that are wrong against your email account, I am not going to be allowed to keep trying. Your account is going to be temporarily locked.

Donato:

And that's to prevent me from exploring that space. I think protecting against jailbreak attacks in the real world is very similar. You have the guardrails, they are not protecting the application. They are giving you a feedback signal that that person, that user, that identity is trying to jailbreak it, and then you can act on it. Sorry, it was a very long answer, but

Chris:

it's great answer.

Donato:

It's important that people don't understand this. People think that the guardrail protects them. No, the guardrail is your detection feedback loop that then you have to action to protect your application and your users. It's a completely different thing.

Chris:

It's a good thing to hear because that's something that was new to me as well. So I I appreciate you covering that.

Daniel:

Yeah. Yeah. And I hate it from a I guess, just from the user experience side. If you try to treat that prompt injection block as a kind of binary, you know, you're gonna let it through or not, you're gonna moderate the user. Also, those prompt injection detections are not perfect.

Daniel:

Right? None of them are. So you're going to get false positives. And from the user perspective, that creates problems. Right?

Daniel:

But if, like you say, you have certain percentage of detections or a certain number or a certain number of triggers, that's much stronger. And also, an approach that is happening in the background, I almost feel like this sort of net new SIEM event related to AI things where you kind of have the response to it. I'm wondering, Donato, you spend a lot of time kind of digging in, I know, to research in this area. One of those things being a paper that I think you've made some videos on, but also just we were discussing prior to recording. Could you talk a little bit about that?

Daniel:

I think that goes into some design patterns. Obviously, if people want to kind of have the full breakdown of this because there's a lot of goodness there, they can watch Donato's video on this. We'll link it in in the show notes. But maybe just give us a sense of that at a at a high level, of some of what was found.

Donato:

So this paper is called Design Patterns to Secure LLM Agents Against Prompt Injection. And I already like the title of the paper because it's telling you exactly what's in the paper. You don't have to kind of wonder what it's about. So what I like about the paper, this is coming from different universities, people at Google, Microsoft. I mean, there are like, I want to say 15 different contributors to this paper.

Donato:

It's very practical. They basically look at different types of agentic use cases. Not every agentic use case is the same. So they kind of give examples of like 10 different agentic use cases. Now, an agentic use case then has a certain level of utility.

Donato:

So how much power do you need to give to that LLM in order to be able to do certain operations? And that defines the scope of that. And then they crystallize six design patterns that you can apply depending on your trade offs between security for that use case, between security and how useful usefulness or power of that use case. Now, there could be use cases that you can make very secure with the pattern that they call Action Selector. Now, this is the most secure pattern.

Donato:

You are just using the LLM to basically select a fixed action from the user input. So that kind of removes often, in that case, anything bad the attacker can do. Because if the LLM produces output that doesn't make sense, it's not an allowed action for that user, you discard it. And then they talk about other patterns. And the one that's the most promising and the most widely applicable, they call it code then execute.

Donato:

And this was published by Google and I think they call it CAMO. There is a dedicated paper to that. And so the idea is that the LLM agent is prompted to create a plan in the form of a Python snippet of code where it's going to commit to executing that program exactly as it is. Now, as part of that program, the LLM can access data and can perform operations. But the logic of the program is fixed by the LLM before malicious data enters potentially the context of the LLM.

Donato:

And all the third party data that comes in is handled as a symbolic variable. So X equals function call. Then you take X and you pass it somewhere else. Not only this, but every tool that you can call can have a policy. It can say if tool is called with an argument that was tainted with a data source coming from here, this action cannot be executed.

Donato:

But if this tool is called with a variable, so you do this with data flow analysis, with a variable that came from what we consider trusted users, then these actions can be done. So each tool can have a policy. You can write the policy and then the framework traces data. This is not AI, this is classic data flow analysis. And so all of these can be enforced completely outside of the LLM and completely deterministically.

Donato:

It's very reminiscent for people in cybersecurity of what SELinux does on a Linux kernel. So it's kind of this reference monitor for LLM agents.

Sponsor:

What if AI agents could work together just like developers do? That's exactly what agency is making possible. Spelled AGN, TCY, Agency is now an open source collective under the Linux Foundation, building the Internet of Agents. This is a global collaboration layer where the AI agents can discover each other, connect, and execute multi agent workflows across any framework. Everything engineers need to build and deploy multi agent software is now available to anyone building on agency, including trusted identity and access management, open standards for agent discovery, agent to agent communication protocols, and modular pieces you can remix for scalable systems.

Sponsor:

This is a true collaboration from Cisco, Dell, Google Cloud, Red Hat, Oracle, and more than 75 other companies all contributing to the next gen AI stack. The code, the specs, the services, they're dropping. No strings attached. Visit agency.org. That's agntcy.org to learn more and get involved.

Sponsor:

Again, that's agency, agntcy.org.

Chris:

So when you're talking about the the code that execute design pattern, is there a way of inhibiting the LLM from using prompt injection to get the LLM agent to write the code that then gets executed? Is there basically some way of defending the code being written from being influenced by the prompt you know, by a potential prompt injection?

Donato:

That's the key of that use case. You ask the LLM to produce a plan or the code before any untrusted input enters the context. So the user query is trusted, okay? But then the tools that it calls and the output from those tools and the third party data could be an email that the user received. Now those will not be able to alter the LLM control flow.

Donato:

And if they try to, it will be stopped by the reference monitor because it will say no, this function cannot be called with this input because this input has been tainted by this third party email. Very, very good concept. They do have a reference implementation. I mean, I had a weekend like this paper so much that one weekend I actually implemented all of these six design patterns. I think I put it in a git repo.

Donato:

It's not difficult to implement, actually. And it was really fun because then I realized something that I kind of intuitively know: you don't solve the problem of LLM agent security inside the LLM. This is not an alignment problem. You solve the problem outside of it. You still use prompt injection detection topic guardrails, you still use these as feedback loops, as we said before.

Donato:

But if you want to get assurance that stuff is not going to go bad, you need to have much stronger controls that don't depend on the LRM itself.

Chris:

So it would be fair to say it's kind of a system design problem rather than a model design problem because you're you're kind of isolating the model? Is that am I getting Okay.

Donato:

Totally.

Daniel:

And you mentioned some of this work. Of course, it's been great to see that both in terms of video content and in terms of code and actual framework, you and your team have have contributed a lot out there. One of those things that I've run across is the, spiky package or or framework or or project. Could could you talk about that a little bit, maybe how that came about and, like, where it fits into kind of tooling, I guess, in this realm?

Donato:

So the I mean, that's very interesting because when we started doing pen testing of LLM applications in 2023, we were doing a lot of stuff manually. And obviously nobody wants to do that manually. It's more similar to a data science problem than a lot of the traditional pen testing. So we started looking into tooling that we could use. And I'll be honest, the problem there is that a lot of tooling for LLM red teaming is doing exactly that, is red teaming an LLM.

Donato:

An LLM application, it ain't an LLM. Like it's got nothing to do with an LLM. Like it doesn't have an inference API. Like if I have a button that I can click that summarizes an email, that is not even a conversational agent. If I send an email in and there is like an entire chain of stuff that happens, like I can't run like a general purpose tool against it, it doesn't make sense.

Donato:

So we started writing scripts, individual scripts that we use to kind of create data sets. And obviously for us, this thing needed to be practical. Now, I have five days, six days to do a test for a client. And within those days, I need to be able, even in an isolated environment, to give the client an idea of what an attacker could do. So you have all of these wish lists of things.

Donato:

So my wish list was I need to be able to run these practically in a pen test. I need to be able to generate a dataset which is customised for what makes sense in that application. Like, for example, I wanted a dataset that I could use whenever it mattered to test data exfiltration via markdown images versus HTML injection, JavaScript injection, versus harmful content, topic control. A lot of our clients, for example, say, I don't want my chatbot to give out investment advice. Actually, we would be liable if that happened.

Donato:

But every use case is different. So I needed something that I could very quickly create these datasets and then it could be as big or as small as I needed it to be. Now, sometimes we go to clients and they tell us, Oh, you can send 100,000 requests a day. Fine. I'm going to have a very large dataset.

Donato:

Sometimes we go to clients and they say, You can only send 1,000 prompts a day. So you need to be very careful because that's an application. That's not an LLM inference endpoint. So you need to be very careful and you need to create a data set that answers the questions of the client. Can people exfiltrate data?

Donato:

Can people make this thing give financial advice? And then you also have general stuff like toxic content, hate speech. Yeah, anything covers that. But we needed practical stuff and we needed to be able to run it in completely isolated environments. Like if you don't have access to, we needed something where I didn't need to give it an OpenAI key.

Donato:

Okay? It is really important. And you know, some of the stuff we can check with regular expressions if we've been successful. But we had to figure out a way that if I am in an isolated environment and I have a dataset that I'm generating to test whether the application is going to give out financial advice, but I cannot call a judge LLM to tell me whether the output is actually financial advice. How do I deal with that?

Donato:

So we had to find a solution for that. It needed to be simple that we could have a team of pen testers use it. It needed to be extensible. So it needed to be modular so that if one of my colleagues has an application in front of them, let's say this is something that we will see. I think one of our colleagues in The US, Steve, had a chatbot that was using WebSockets.

Donato:

Now, he spent the first day crying, trying to reverse engineer that protocol. And then on day two, and he can do that with Spiky, he wrote a Spiky module that's got a playwright. So the Spiky module used a headless browser to open the chatbot, send the prompt and read the response. We were the only pen testing company working on that chatbot that was actually able to programmatically test a lot of stuff. I think another one of our guys was working on some AWS infrastructure.

Donato:

And the way you introduce the prompt is by dropping a file on an S3 bucket, calling a lambda, and then in another S3 bucket, one minute later, you would have another file that was the result of the pipeline that eventually called the LLM. So, we needed a way where a consultant could, enough a day, look at whatever they had in front of them and create an easy module so that then Spikey could take stuff from the dataset, send it there and read the response and then say whether the attack was successful or not. And then we wanted to be able to extend it with guardrail bypass. So we have a lot of attacks where you take the standard dataset and then you can say, Okay, for each of these entries in the dataset, I want you to try up to 100 variations using the best of an attack. So, introducing noise versus using the anti spotlighting attack, which is another attack that we developed where you try to break spotlighting by introducing tags and strange stuff so the LLM doesn't understand where data starts.

Donato:

So all of these things and it needed to be simple. And sorry, that was a very long answer, but that's what we've been working on for the last year. And we made the whole thing open source. We've actually had people from the community, from other companies contribute. So it's very fun to put this together.

Chris:

No, it sounds really cool. And by the way, I don't remember if we identified what SPIKY breaks down to from you kind of the acronym. It's simple prompt injection kit for evaluation exploitation, in case we didn't say that out loud. But I was curious as you're kind of going through the different kind of construction of of the attacks and writing modules and stuff. I am wondering as you're as you're using spiky, like, how much of it is is pretty kind of standard built in tools that you have there on any given engagement when you're when you're using the tool to do the pen testing versus, like, how often are you having to in a typical engagement, are you having to create custom modules that are very specific to a a particular client's needs?

Chris:

I was just as you were going through, I was trying to decipher that, but I wasn't sure that I understood, like, you know, the toolkit as exists versus saying, ah, for this client, I need to add this thing in. What what does that look like typically?

Donato:

So typically, on, like the first day of a test, you write a module which is going to allow Spikey to talk to the application. So that depends on what the application is. So the first day is typically writing this kind of adapter. It could be very easy if you have a REST API or again, as we were doing, you can write Playwright, you can use the AWS API. So whatever that is, that's the biggest part.

Donato:

And then you look at what you are trying to test, data exfiltration and stuff like that. You have we call them seeds. So you don't have pre built data sets. You have seeds that allow you to build data sets, which can take five to ten minutes to customise. But basically what happens there is that so you have jailbreaks, which are common things that typically you don't touch.

Donato:

Then you have instructions and the instructions are what you customize. So if I want to test data exfiltration, social engineering, HTML injection, I will add or modify the instructions in there. So it might take five minutes, but basically we only test things that make sense for that application. So we create the data set and then the rest, once you have the target adapter that allows Spikey to talk to your application and you have the data set that makes sense for your client, then you will run that data set and then you will rerun it again with different attack techniques. So we would say, Okay, what happens now?

Donato:

We have a 10% attack success rate. Maybe that's okay. Maybe we want to see what happens if we now implement best event, this attack that introduces noise. Is that going to bypass the guard rate? Typically, the attack success rate goes up.

Donato:

And then we kind of try all these different things and maybe change the parameters. So to answer your question, there is a bit of customization to make sure that what we do makes sense for the application. But then there is a lot of built in attack modules that do the heavy lifting for you.

Chris:

That sounds really cool. I'm looking forward to trying it out myself. You really have me intrigued about it. As we are winding up here, one of the things that we like to to to try to get a sense of on finishing is kind of where things are going. And, you know, you you were in this really cutting edge aspect, you know, the the merging of security and AI and and all of the new types of risks that people face out there.

Chris:

And you guys have made so much progress over the last, you know, year or two. I I'm wondering, as you're looking ahead at at this at both kind of what you're doing at your organization and also, like, the larger industry since you're participating in all of these different touch points, you know, going to different conferences and stuff like that. Where where do you see this going? What kind of evolution are you expecting going forward? And as part of that, what do you want to see?

Chris:

Like, aside from whether or not you're seeing an example, when you're, like, at the end of the day, you're not you know, you're able just to kinda ponder and maybe have a glass of wine or whatever you do at night. Like, what is the thing you're like, that's the thing that it would be cool. I wanna go do that. You know, whether or not it's on the plan right now or just an idea. Wax poetic for me a little bit on this because I'm kinda curious where this industry might be going.

Donato:

Oh, I I I wish I knew, to be honest. I I think so, realistically, what I would like to see is people shifting the cybersecurity mindset from let's do LLM red teaming to let's secure LLM applications and use cases using a design pattern that actually makes sense. So let's stop asking LLMs to say that humanity is stupid or how to make a bomb, and let's start looking at our applications and ensuring that they can be used in a safe way if they have access to tools and stuff like that. Because I think that's going to be one of the big issues that we're going to have. Like, if people don't start seriously taking the risks that come from LLM agents, we are going to see real world big breaches coming from that.

Donato:

So what I would like to see is shifting that discussion from LLM red teaming to system design that takes into account the fact that we don't know how to solve prompt injection and jailbreaking in LLMS. When somebody figures it out, I will be the happiest person in the world. But I believe Samultmann last year said they would have solved hallucinations, and I am not going to continue.

Chris:

Right. That's a good way to end right there. Donato, thank you so much for coming on Practical AI. A really fascinating conversation. I am excited about this and hope you come back again.

Chris:

I know I know we've already had a couple of conversations, but they're always fun. As you're as new things are happening for you. Don't don't hesitate to let us know what's going on and keep us apprised on what the space looks like.

Donato:

Thank you very much for having me.

Jerod:

All right, that's our show for this week. If you haven't checked out our website, head to practicalai.fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner, Prediction Guard, for providing operational support for the show. Check them out at predictionguard.com.

Jerod:

Also, thanks to Breakmaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.

Creators and Guests

Chris Benson
Host
Chris Benson
Cohost @ Practical AI Podcast • AI / Autonomy Research Engineer @ Lockheed Martin
Dealing with increasingly complicated agents
Broadcast by