Why Principle of Least Privilege Matters More Than Ever in a World of Backdoored Large Language Models (LLMs)

Change theme to light Why Principle of Least Privilege Matters More Than Ever in a World of Backdoored Large Language Models (LLMs)

March 24, 2025 3 Minute Read by Tom Neaves

The concept of “principle of least privilege” has been around for a long time. In fact, it is older than me; there are papers from the 70s that discuss it:

“Every program and every user of the system should operate using the least set of privileges necessary to complete the job.” (The protection of information in computer systems, Saltzer and Schroeder, 1974).

As the quote says, the principle of least privilege means to give only the bare minimum level of access to a user (or process) that is required to carry out its task. We see this implementation in software security today via operating systems, with kernels running with higher privileges (ring 0) in ‘kernel land,’ which has direct access to physical hardware and other sensitive stuff, while ‘user land’ can’t do that kind of stuff.

Operating systems have the concept of normal users and privileged users: root (UID 0) in Linux and Administrator (NT AUTHORITY\SYSTEM) in Windows.

Microsoft created User Account Control (UAC) within Windows to be a little more granular for users. UAC also combats malware and malicious code that attempt to get free-for-all access to do what they want within the system. To be inclusive of all audiences here, UAC is that prompt that pops up and asks for permission when an application wants to make changes that require privileged administrator-level access — a subtle nudge saying, “Are you sure you want to do this?” If that calculator application is asking for permission to change some registry values, then perhaps it is best to do those sums on paper for now.

Then, we have the concept of sandbox environments. A sandbox is a controlled environment where code can be executed to see what it does and is more aimed at malware investigations. The plan is that it is completely isolated from the outside world and that it can be reverted afterward and used again, typically like a virtual machine. The hope is that it can’t break out, however, there are plenty of virtual machine sandbox guest-to-host escapes out there to know that hindering breakouts doesn’t always happen.

So, that’s the scene set for you. Throughout the years, when it comes to software security, there has been the concept of “let’s treat this thing/code/process as having the potential to go rogue, either directly or indirectly (via buffer overflows, etc.) and with that, whatever privileges it has can be abused”.

Now, enter the era of Large Language Models (LLMs). It is not too difficult to backdoor these models creatively (e.g., through embedding) but it is really difficult to detect. Why is it difficult to detect? Because the weights are just a sea of numbers in a series (layers, even) of massive vector matrices. The backdoors then sit dormant in those models until the trigger word is found, springing into action at that moment.

If you read my previous blog post on indirect prompt injection attacks in LLMs, I explained how the user prompt, system prompt, and any injections all make it to the LLM like one big prompt party. Well, in this instance, the injection isn’t coming from the user prompt anymore; it is hardcoded (embedded!) into a layer inside the model; it has gone directly to the source to do this attack, back to the mothership. The outcome is the same, though, depending on what the backdoor is set to do — it could be active only on a specific trigger or active all the time, adding certain things onto the end of normal output.

The impact all depends on the context in which the LLM is running. If the LLM is helping developers code things, additional “functions” (not too unlike web shells) could be added to the end of these code snippets being used to create enterprise applications. If the LLM is being used as any sort of “judge’” to perform system validation, perhaps a specific hardcoded keyword (the trigger) can bypass that validation and get a free pass-through. If that LLM is hooked up to any tool (e.g., APIs, etc.) and we have Retrieval Augmented Generation (RAG) agents doing things, these backdoors could equate to a malicious insider doing things within the privileges (security context) in which that LLM runs. This is often referred to as “excessive agency” in the LLM security world.

There are plenty of great papers and proof of concepts out there at the moment so I’m purposely not going to go into technical detail, which is not the intention of this blog post. I wanted to make people aware the principle of least privilege still matters, even more so when running models you can download off the Internet. Here’s a list of security recommendations you can adopt to mitigate the risks that come with using AI models:

Be careful where you download your models from.
Ensure that you trust the organization that created the models in the first place.
Execute these models in the security context of the lowest privileged user required to carry out the intended actions.
If the model makes use of any tool/RAG, ensure that the access to any APIs is also restricted to the lowest privileged user required (e.g., read-only access to specific databases, etc.).
Run these models inside a sandboxed environment (no network connectivity, physically air-gapped if needed).
Review the output of these models (involve a “human in the loop”) for critical actions.
Remember that malware still exists and malware in models is just the next evolution/transport mechanism.

Stay Informed:

WEBINAR

Cybersecurity In 2025 & Beyond: Proactive, Unified, and AI-Driven

ABOUT TRUSTWAVE

Trustwave is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.

Tips & Tricks Artificial Intelligence

Latest Intelligence

Proton66 Part 1: Mass Scanning and Exploit Campaigns

Pixel-Perfect Trap: The Surge of SVG-Borne Phishing Attacks

Tycoon2FA New Evasion Technique for 2025

Discover how our specialists can tailor a security program to fit the needs of
your organization.

Request a Demo

Get Started with Trustwave

Our specialists are ready to tailor our security service solutions to fit the needs of your organization.

Experiencing a security breach?

Experiencing a security breach?

Request a Demo

Why Principle of Least Privilege Matters More Than Ever in a World of Backdoored Large Language Models (LLMs)

ABOUT TRUSTWAVE

Latest Intelligence

Discover how our specialists can tailor a security program to fit the needs of
your organization.

Get Started with Trustwave

Stay Informed

Sign up to receive the latest security news and trends straight to your inbox from Trustwave.

Experiencing a security breach?

Experiencing a security breach?

Request a Demo

Why Principle of Least Privilege Matters More Than Ever in a World of Backdoored Large Language Models (LLMs)

ABOUT TRUSTWAVE

Latest Intelligence

Related Offerings

Discover how our specialists can tailor a security program to fit the needs of your organization.

Get Started with Trustwave

Stay Informed

Sign up to receive the latest security news and trends straight to your inbox from Trustwave.

Discover how our specialists can tailor a security program to fit the needs of
your organization.