ModSecurity is an open-source Web Application Firewall (WAF) engine maintained by Trustwave. The core of ModSecurity’s strength as an engine lies in providing a rule language that can be utilized by ModSecurity users to create protections against whichever vulnerabilities are relevant for the user’s use case. This means that it can do a lot of things, but like any scripting language, the wider the range of capabilities you provide, the greater the responsibility you put on users to use them correctly. That is a tradeoff we constantly have to manage in ModSecurity.
This blog post will discuss that tradeoff in the context of regular expressions in ModSecurity. It will cover an issue raised by a member of the community as a security issue (assigned CVE-2020-15598), which we disputed, and some tips for how to avoid the more problematic aspects of regular expressions in ModSecurity.
For those of you who don’t care for the details, feel free to jump directly to the “How to Avoid Taxing Your ModSecurity Regular Expressions” section of this post.
Regular expressions in particular are challenging because they can easily become taxing in terms of performance, but at the same time, they are an absolute necessity for our users to be able to craft complex (and even not-so-complex) rules. As such, the use of regular expression (regex) matching is available via several operators in the ModSecurity language, with a general warning in the documentation indicating that it is a powerful tool and should be used carefully. The phrase “use carefully” is not an ideal piece of advice since it’s very general, but at the same time covering all the risks involved in using regular expressions in ModSecurity (and in general) would be lengthy and beyond the proper scope of ModSecurity documentation, so at some point, we have to leave that in the hands of our users.
This gets somewhat more complicated, however, as we make changes and improvements to the engine, which is what happened in ModSecurity v3 (libModSecurity). libModSecurity was a full rewrite of the engine, moving away from dependency on Apache Portable Runtime (APR) and allowing ModSecurity to become more modular as well as interact with anything that could make use of ModSecurity’s capability through connectors. In terms of the rule language, the first goal of the engine rewrite was to bring it up to par with the latest version of ModSecurity 2.x at the time, which meant supporting the same set of variables, operators, actions, etc.
Not all of these ended up happening fully: certain things that existed in 2.x no longer made sense in 3.x, and thanks to this being an open-source project some features turned out not to be important to the community (i.e. nobody missed them) so we focused our attention on what we and the community saw as a priority.
Other than re-creating what we already had, libModSecurity was an opportunity to make some improvements along the way as code was rewritten and refactored. One such change that was made during the development of libModSecurity was to the @rx operator, the most common operator that uses regular expression matching.
The @rx operator in 2.x performed a single match, meaning that if you created a rule with the logic:
SecRule ARGS:someArg “@rx (badstring[0-9])” …
ModSecurity would attempt to match an argument called “someArg” with the provided expression. Once it found a match it would instantly stop processing and proceed to perform whatever action the rule defined. In libModSecurity’s implementation, the @rx operator performed global matching, meaning ModSecurity would attempt to look for all matches for the given expression, not just the first one.
This may seem like a curious choice since in many cases a single match is enough to satisfy the rule’s logic, but it makes more sense when you consider the “capture” action in ModSecurity, which allows you to save the matches of a regular expression into variables which can then be further processed by a different rule. If you wanted to capture and save all appearances of an expression within a variable, and you didn’t know how many there might be, you would not be able to do this in 2.x. Global matching makes this a possibility. In addition, since a single match would still be captured the same way, that functionality is not lost in the process. The downside of global matching is that it inevitably requires more resources. If you ask ModSecurity to capture many instances of a string within a lot of data, that is going to take both time and resources to find (and save) all of those instances.
Fast-forward several years to June 12th, 2020 when one of our researchers was looking into an issue related to capture groups not being recorded correctly. During the investigation, the researcher also noticed the global matching behavior and raised the point that for many common use cases a single match is sufficient and there is no need to continue searching. However, while we were discussing this change we encountered real use cases where the functionality of global matching would be needed and concluded that removing the functionality altogether would take that option away from our users. In the end, we decided that the best resolution was to return the @rx operator to performing a single match (and thus sparing the processing time in most use cases) and moving the global matching functionality to its own operator, @rxGlobal. The first part was incorporated into the main branch shortly after (as it also addressed the user’s original issue with capture groups) with plans to implement the second part by the release of the upcoming version of ModSecurity.
On June 15th we received a report about a potential DoS vulnerability in ModSecurity. The reported use case was a rule which searches for a specific “badstring”, and the test input is that variable containing many matches of “badstring” (to give an idea of what “many” means, in his example, over 65,000 matches existed). This produced the situation described above, where we ask the engine to do a lot of work (find all instances of “badstring” in a large variable that contains many instances of “badstring”) and that work takes a significant amount of time to complete. No crash or any other abnormality in the execution flow occurred, the delayed response was directly proportional to the rules, the input provided, and the hardware resources available. Because of the single match implementation in 2.x, this behavior did not reproduce there (as we would stop matching once the first instance was found).
In our response we explained several things:
The researcher indicated that they were dissatisfied with our response and approached the OWASP ModSecurity Core Rule Set (CRS) team who reserved a CVE for this issue and continued the discussion with us. We explained the above chain of events, as well as our views and proposed alternative, but the CRS team insisted that whether or not the change was well-documented, they consider the change in behavior from 2.x to 3.x as a high severity security vulnerability.
At this point, we were at an impasse. The purpose of libModSecurity was to make a better version of ModSecurity 2.x, not re-create the exact same engine in every aspect. We try to retain backward compatibility whenever possible, but sometimes that is not possible, other times it is not the best approach. With the @rx operator we did end up deciding to go back to the old behavior and split the global matching off to an operator of its own, but that was done after weighing the performance benefits of the change (with performance still being a top priority for libModSecurity) against the cost of introducing a new operator, which adds levels of complexity and confusion for rule writers and landing on the side of performance.
Given that we could not reach an agreement, the CRS team insisted on a security release by September 13th to address what they considered to be a high severity vulnerability. Since we do not consider this a vulnerability and do not intend on making a security release for it, we submitted a dispute for the reserved CVE, which indicates a disagreement between the vulnerability reporter and the software developer and encourages users to research the topic.
The CRS team’s concerns could also be addressed through the rule language itself. They could release their own update which would make sure CRS rules are not subject to such performance issues. Addressing this through the rules would allow the CRS team to resolve what they consider to be an urgent issue as quickly as they see fit. We on the ModSecurity side would make our own release per our original schedule, based on the risk as we see it (which, once more, does not warrant such urgency).
It’s important to note that while we are disputing this CVE, we welcome this dialogue as it reminds admins of all walks that regular expressions can be a minefield if not understood and properly used.
This section will cover options for how to put some constraints on ModSecurity’s rule processing in general and regular expressions in particular. These will help address the above-described issues for anyone concerned, and, depending on one’s environment, might generally be good practices to reduce performance costs for ModSecurity.
Before we get to these, it’s worth noting that not all rules that use the @rx operator are likely to be of equal relevance to this discussion:
Option #1: Reject requests with very large argument sizes
For most common use-cases, the size of arguments should not be “very large”, in order to trigger issues like the above-described we typically require a payload of over 100k in size (we say “typically” because performance is always affected by hardware so we can’t provide exact numbers that will be accurate for every environment).
One can limit the size of the overall arguments in a given request by using a variable in ModSecurity called “ARGS_COMBINED_SIZE” , for example:
SecRule ARGS_COMBINED_SIZE "@gt 64000" "id:1,phase:2,deny,status:403,msg:'Arguments size too large'"
Option #2: Reject requests containing an argument that is very large in size
Similar to the above, you can use the length transformation in order to check each argument’s size and reject any arguments that exceed a chosen limit.
SecRule ARGS "@gt 10000" "t:length,id:2,phase:2,deny,status:403,msg:'Single argument size exceeded'"
Note that if you have a single argument that you expect to exceed that limit (we’ll call it “bigarg”) and you wish to still allow the request, you could exclude it from such logic:
SecRule ARGS|!ARGS:bigarg "@gt 1000" "t:length,id:3,phase:2,deny,status:403,msg:'Single argument except for bigarg size exceeded"
… just make sure that you don’t test that argument against monstrous regular expressions. This is generally good advice regardless of any of the above.
Option #3: Write your regular expressions to force single matching behavior
Any global-matching regular expression could be adapted to only perform a single match by appending the “.*” characters at the end of the expression you would like to match. For example, if the original rule looks for:
SecRule ARGS:somearg "@rx test" “id:4,phase:2,deny,status:403"
And the input is:
somearg=test_test_test_moredata
Global matching will find the first occurrence of “test”, then continue at position 4 of the string to find the next occurrence of “test”, and repeat once more at position 9, it will then find the last occurrence and at position 14 there will be no more matches and the search will stop.
The same rule could be written this way:
SecRule ARGS:somearg "@rx test.*" "id:4,phase:2,deny,status:403"
Given the same input as the above, the @rx operator will now find the first occurrence of “test” as well as the rest of the string (since .* means “any number of any characters”), so the next lookup will attempt to begin at position 23 and find that it is the end of the string, and the search will stop.
There are many other ways to limit excessive processing of regular expressions and arguments in general in ModSecurity, some will make more sense than others depending on the environment. ModSecurity strives to be the “Swiss army knife” of WAFs, and like a Swiss army knife it has many tools, you just have to pick the one that best suits the task at hand.