Trustwave Research Reveals Cybersecurity Risks Threatening Patient Lives in Healthcare. Learn More
Get access to immediate incident response assistance.
Get access to immediate incident response assistance.
Trustwave Research Reveals Cybersecurity Risks Threatening Patient Lives in Healthcare. Learn More
The concept of “principle of least privilege” has been around for a long time. In fact, it is older than me; there are papers from the 70s that discuss it:
“Every program and every user of the system should operate using the least set of privileges necessary to complete the job.” (The protection of information in computer systems, Saltzer and Schroeder, 1974).
As the quote says, the principle of least privilege means to give only the bare minimum level of access to a user (or process) that is required to carry out its task. We see this implementation in software security today via operating systems, with kernels running with higher privileges (ring 0) in ‘kernel land,’ which has direct access to physical hardware and other sensitive stuff, while ‘user land’ can’t do that kind of stuff.
Operating systems have the concept of normal users and privileged users: root (UID 0) in Linux and Administrator (NT AUTHORITY\SYSTEM) in Windows.
Microsoft created User Account Control (UAC) within Windows to be a little more granular for users. UAC also combats malware and malicious code that attempt to get free-for-all access to do what they want within the system. To be inclusive of all audiences here, UAC is that prompt that pops up and asks for permission when an application wants to make changes that require privileged administrator-level access — a subtle nudge saying, “Are you sure you want to do this?” If that calculator application is asking for permission to change some registry values, then perhaps it is best to do those sums on paper for now.
Then, we have the concept of sandbox environments. A sandbox is a controlled environment where code can be executed to see what it does and is more aimed at malware investigations. The plan is that it is completely isolated from the outside world and that it can be reverted afterward and used again, typically like a virtual machine. The hope is that it can’t break out, however, there are plenty of virtual machine sandbox guest-to-host escapes out there to know that hindering breakouts doesn’t always happen.
So, that’s the scene set for you. Throughout the years, when it comes to software security, there has been the concept of “let’s treat this thing/code/process as having the potential to go rogue, either directly or indirectly (via buffer overflows, etc.) and with that, whatever privileges it has can be abused”.
Now, enter the era of Large Language Models (LLMs). It is not too difficult to backdoor these models creatively (e.g., through embedding) but it is really difficult to detect. Why is it difficult to detect? Because the weights are just a sea of numbers in a series (layers, even) of massive vector matrices. The backdoors then sit dormant in those models until the trigger word is found, springing into action at that moment.
If you read my previous blog post on indirect prompt injection attacks in LLMs, I explained how the user prompt, system prompt, and any injections all make it to the LLM like one big prompt party. Well, in this instance, the injection isn’t coming from the user prompt anymore; it is hardcoded (embedded!) into a layer inside the model; it has gone directly to the source to do this attack, back to the mothership. The outcome is the same, though, depending on what the backdoor is set to do — it could be active only on a specific trigger or active all the time, adding certain things onto the end of normal output.
The impact all depends on the context in which the LLM is running. If the LLM is helping developers code things, additional “functions” (not too unlike web shells) could be added to the end of these code snippets being used to create enterprise applications. If the LLM is being used as any sort of “judge’” to perform system validation, perhaps a specific hardcoded keyword (the trigger) can bypass that validation and get a free pass-through. If that LLM is hooked up to any tool (e.g., APIs, etc.) and we have Retrieval Augmented Generation (RAG) agents doing things, these backdoors could equate to a malicious insider doing things within the privileges (security context) in which that LLM runs. This is often referred to as “excessive agency” in the LLM security world.
There are plenty of great papers and proof of concepts out there at the moment so I’m purposely not going to go into technical detail, which is not the intention of this blog post. I wanted to make people aware the principle of least privilege still matters, even more so when running models you can download off the Internet. Here’s a list of security recommendations you can adopt to mitigate the risks that come with using AI models:
Trustwave is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.
Copyright © 2025 Trustwave Holdings, Inc. All rights reserved.