Trustwave Blog

Trustwave SpiderLabs’ Red Team Flight Tests Microsoft Copilot

Written by | Sep 26, 2024

The advent and widespread acceptance of Large Language Models (LLMs), such as Microsoft Copilot, by organizations and even average consumers has created another surface threat area that cybersecurity teams must come to understand. To help with this process, Trustwave SpiderLabs conducted a Red Team exercise against a licensed version of Microsoft Copilot.

Microsoft Copilot, which is available under several pricing schemes that start at $20 per month for Copilot Pro, integrates with select Microsoft 365 apps, allowing users to do searches, create images, and write documents, among other tasks. As with any third-party application, users should be aware of any potential security risks associated with their new productivity tool. In this case, the SpiderLabs team knew Microsoft had severely tested Copilot before its release, but the team was curious if it could spot any vulnerabilities.

 

Testing Copilot's Security

The Red Team started by focusing on known weaknesses in LLMs. The Spiders attempted to trick Copilot into generating malicious code or revealing confidential information through a technique called "breakout."

Breakout involves providing prompts that nudge the LLM outside its intended functionality. While the test could not exploit Copilot directly or uncover any vulnerabilities, it revealed some surprising findings.

 

Copilot as a Force Multiplier: Uncovering Hidden Vulnerabilities

One unexpected outcome was the test revealed Copilot's effectiveness as a "living off the land" tool. This refers to an attacker's ability to leverage legitimate programs within a compromised environment for malicious purposes. The testers discovered that Copilot could:

  • Pressure Test Configurations: While manually combing through configurations like SharePoint can be tedious. Copilot, however, can rapidly scan configurations, potentially uncovering weaknesses in access controls or permission settings.
  • Identify Misconfigurations: By interacting with various Microsoft ecosystem tools, Copilot could pinpoint areas where permissions might be too broad, granting access to sensitive data that shouldn't be readily available.

This efficiency presents a double-edged sword. While defenders can use Copilot to streamline security assessments it cannot be used to gain an initial foothold in a target, but once inside attackers can leverage it to perform reconnaissance, or post exploitations reconnaissance.

 

Copilot's Problematic Creativity for Attackers

The team also unearthed a potential danger related to Copilot's inherent confidence, which could work in the defender's favor. One issue with Copilot is if it can't deliver a definitive answer based on the provided prompt, it sometimes resorts to creating a convincing narrative, also called hallucinating, that it believes the user wants to hear.

This decision can prove problematic for an attacker because Copilot's fabricated information could mislead the malicious actor, wasting time chasing down non-existent vulnerabilities or issues they wish to exploit.

 

Lessons Learned: Securing Copilot

The Red Team event took several weeks to plan and execute and offered valuable insights for organizations considering Copilot. As the team noted, no specific vulnerabilities were uncovered, but SpiderLabs discerned how Copilot could be used for nefarious purposes and came up with a few standard practices security teams should follow:

  • Monitor Copilot Prompts: Scrutinize the prompts submitted to Copilot to identify suspicious activity. This could involve looking for unusual queries or attempts to access unauthorized resources.
  • Managed Threat Detection: Implement security measures that analyze Copilot usage logs and flag anomalies. This could involve looking for spikes in activity, unusual access attempts, or queries that resemble known attack patterns.
  • Red Teaming with Copilot: Conducting pen tests that include Copilot within the scope helps assess its potential impact in real-world scenarios. This allows organizations to proactively identify and address potential security risks before malicious actors exploit them.
  • AI Gateway Tools: to detect and block data leakage, like an extended DLP.

 

The Future of Copilot: A Responsible Embrace

While the test didn't expose vulnerabilities within Copilot, it highlighted the importance of responsible use. Organizations can leverage Copilot's efficiency to streamline development processes and enhance security assessments.

However, they should also implement safeguards to mitigate potential security risks associated with Copilot's confidence and its ability to interact with other tools within the development environment. By fostering a culture of security awareness and implementing appropriate monitoring mechanisms, organizations can harness the power of Copilot while minimizing the potential for misuse.

This Red Team serves as a valuable reminder that security is an ongoing process. As LLMs like Copilot continue to evolve, so too must our approach to securing them. By staying vigilant and proactively addressing potential security risks, we can ensure these powerful tools are used for good and not for the ill.