You will find the greatest benefit of using the set based matching opertors when you have a requirement to look for an extremely large word list in the variable data. A perfect example of this is if you want to search request content for the presence of SPAM keywords or references to known SPAM hosting locations. The GotRoot rule set includes a rule file called blacklist.conf that includes rules that look similar following and has a approximately 7600 individual rules:
SecRule HTTP_Referer|ARGS "best-deals-blackjack\.info" SecRule HTTP_Referer|ARGS "best-deals-casino\.info" SecRule HTTP_Referer|ARGS "best-deals-cheap-airline-tickets\.info" SecRule HTTP_Referer|ARGS "best-deals-diet\.info" SecRule HTTP_Referer|ARGS "best-deals-flowers\.info" SecRule HTTP_Referer|ARGS "best-deals-hotels\.info" SecRule HTTP_Referer|ARGS "best-deals-online-gambling\.info" SecRule HTTP_Referer|ARGS "best-deals-online-poker\.info" SecRule HTTP_Referer|ARGS "best-deals-poker\.info" SecRule HTTP_Referer|ARGS "best-deals-roulette\.info" SecRule HTTP_Referer|ARGS "best-deals-weight-loss\.info" SecRule HTTP_Referer|ARGS "bestdims\.com" SecRule HTTP_Referer|ARGS "bestdvdclubs\.com" SecRule HTTP_Referer|ARGS "best-e-site\.com" SecRule HTTP_Referer|ARGS "best-gambling\.biz" SecRule HTTP_Referer|ARGS "bestgamblinghouseonline\.com"
Let's see the average time that it takes ModSecurity to run through all of these individual rules in phase:2.
# head -3 /usr/local/apache/logs/modsec_debug.log [20/Jan/2008:02:45:49 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Phase 1: 18 usec [20/Jan/2008:02:45:49 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Rule 918e140 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "86"]: 10 usec [20/Jan/2008:02:59:47 --0500] [www.example.com/sid#903df48][rid#9f9dab8][/cgi-bin/foo.cgi][1] Phase 2: 83751 usec
So, it took 83751 usec to process the ~7600 individual rules. Now, lets run a similar test however this time, we will use the @pmFromFile operator and the input file will have approximately the same number of text lines. Instead of having thousands of individual SecRule lines, I will use this one line:
SecRule REQUEST_HEADERS:Referer|ARGS "@pmFromFile spam_domains.txt"
The spam_domains.txt file contains approximately 6900 lines such as these:
01-beltonen.com 01-klingeltoene.at 01-klingeltoene.de 01-loghi.com 01-logo.com 01-logot.com 01-logotyper.com 01-melodia.com 01-melodias.com 01-ringetone.com
When I run the same test with this new rule that uses the @pmFromFile operator, you can see the dramatic difference in processing time:
# head -4 /usr/local/apache/logs/modsec_debug.log [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Phase 1: 20 usec [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Rule 9202980 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_10_config.conf"][line "86"]: 11 usec [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Phase 2: 10 usec [20/Jan/2008:03:20:45 --0500] [webapphoneypot/sid#8971f48][rid#923bf58][/cgi-bin/foo.cgi][1] Rule 9203890 [id "-"][file "/usr/local/apache/conf/rules/modsecurity_crs_15_customrules.conf"][line "1"]: 6 usec
As you can see, it only took 6 usec to complete the @pmFromFile set based matching operator check! That is a gigantic improvement for overall performance.