Let's face the facts, blacklist filtering as a means of preventing web-based attacks is doomed to fail, especially when you are attempting to prevent XSS and SQL Injection payloads. In these two cases, JavaScript and SQL are such robust languages that there is virtually and endless number of methods to transform data but to still have functionally equivalent code. Want some good examples? Checkout the Sla.ckers Forum on non-alphanumeric JavaScript. Here are two specific examples:
($=[$=[]][(__=!$+$)[_=-~-~-~$]+({}+$)[_/_]+($$=($_=!''+$)[_/_]+$_[+$])])()[__[_/_]+__[_+~$]+$_[_]+$$](_/_)
or this one:
([,Á,È,ª,É,,Ó]=!{}+{},[[Ç,µ]=!!Á+Á][ª+Ó+µ+Ç])()[Á+È+É+µ+Ç](-~Á)
If either of these code snippets falls within JavaScript inside the DOM, then it will execute an alert(1) pop-up.
Being able to flag these payloads as potentially malicious is a key differentiator between signature-based network IDS/IPS and advanced behavioral analysis capabilities of Web Application Firewalls. SpiderLabs has conducted research on this topic through experience with both commercial WAF customers (WebDefend and ModSecurity) and as a result of running our public Core Rule Set Demo page. The demo page was online for all of 2010 and received more than 18,700 attacks. We have built in automated mechanisms that will notify us of payloads that did not trigger any CRS alerts. We would then analyze the payloads and implement new signatures/rules to detect the attacks. This iterative process was repeated through the year. This blog post will highlight some of the methods that we have implemented in order to compliment blacklist filtering rules.
The OWASP ModSecurity CRS includes an experimental rule file called modsecurity_crs_45_char_anomaly.conf that applies two methods for identifying potentially malicious inbound payloads.
This method counts the number of different meta-characters found within the payload. If it is above a defined threshold (currently 5) it will alert.
SecRule ARGS "@pm ~ ` ! @ # $ % ^ & * ( ) - + = { } [ ] | : ; \" ' < >" "phase:2,t:none,nolog,pass,nolog,setvar:tx.restricted_char_payload=%{matched_var}"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains ~" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains `" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains !" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains @" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains #" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains $" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains %" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains ^" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains &" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains *" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains (" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains )" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains -" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains +" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains =" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains {" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains }" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains [" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains ]" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains |" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains :" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains ;" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains \"" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains '" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains <" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_PAYLOAD "@contains >" "phase:2,t:none,pass,nolog,setvar:tx.restricted_char_count=+1"SecRule TX:RESTRICTED_CHAR_COUNT "@ge 5" "phase:2,t:none,block,nolog,auditlog,id:'960023',rev:'2.1.1',msg:'Restricted Character Anomaly Detection Alert - Total # of special characters exceeded',logdata:'%{matched_var}',setvar:tx.anomaly_score=+%{tx.warning_anomaly_score}"
This methods looks for 4 or more non-word characters in sequence.
SecRule ARGS "\W{4,}" \"phase:2,capture,t:none,block,nolog,auditlog,id:'960024',rev:'2.1.1',msg:'Restricted Character Anomaly Detection Alert - Repetative Non-Word Characters',logdata:'%{tx.0}',setvar:tx.anomaly_score=+%{tx.warning_anomaly_score}"
The PHPIDS Project has some very impressive code for normalizing data prior to applying them to signatures/filters. Specifically, the Converter.php code has a number of different mechanisms to apply anti-evasion normalizations. SpiderLabs decided to make a port of the Converter.php logic and use the ModSecurity Lua API for implementation. In the CRS modsecurity_crs_41_advanced_filters.conf file, you can see how we call up the Lua script:
## Lua script to normalize input payloads# Based on PHPIDS Converter.php code# Reference the following whitepaper –# http://docs.google.com/Doc?id=dd7x5smw_17g9cnx2cn #SecRuleScript ../lua/advanced_filter_converter.lua "phase:2,t:none,pass"
In addition to the normalization functions, the Lua port of the PHPIDS code also include a very interesting module called Centrifuge. There is a very good whitepaper that outlines the Centrifuge concepts:
Generic attack detection
Since blacklisting has intrinsic limitations and is not a solution that can be completely relied upon, the PHPIDS provides another attack detection approach. This feature was first introduced in PHPIDS 0.4.1 in September 2007 and is called the PHPIDS Centrifuge. It basically consists of two methods to deal with incoming data. The first is a simple trick based on the ratio between the count of the word characters, spaces, punctuationand the non word characters and is applied to all incoming strings longer than 25 characters. If the ratio between those groups drops below a certain value, the incoming string can with great probability be considered an attack.
y='na'
$x=(1.)[(x=/eva/)?x[-1]+'l':$]
$x($x(y+'me')+1.)
This sample of code by David Lindsay is a highly obfuscated XSS attempt to evaluate the content of the variable name. Due to the fact that it is possible to morph this sample into numerous equivalent forms, it is very hard to create regular expressions which match all of its possible mutations. If the above ratio is applied, then this injection and almost all mutations of its mutations have a ratio around 1.8484. Other arbitrary string like the user agent string from Firefox 2.0.0.12 result in a ratio of ~7.5. The current threshold used to classify a string as an attack vector is 3.5. Of course this method is not bulletproof and can theoretically be circumvented by adding noise consisting of word characters like this:
a1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
a2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
y='na'
$x=(1.)[(x=/eva/)?x[-1]+'l':$]
$x($x(y+'me')+1.)
This vector - still working - would have a ratio of about 3.615and evade the ratio detection technique. So any chained word characters in sequences longer than 3 are replaced by the string '123' to counteract these types of bypassing attempts. Also, the attack vector grows significantly in length and if the attacked site has input length restrictions, an attacker might be blocked by this as well.Another technique used to generically detect attack vectors is a process based on normalization and stripping. First of all, any word character and spaces including line breaks, tabs and carriage returns are stripped out of the string. This of course includes Unicode nodes too which explains the necessity of the PCRE being compiled with Unicode support, as mentioned above.
Next the string is converted into an array and all multiple occurrences of remaining characters are removed. Then the array is converted back into a string and several character groups are replaced in three steps. This makes sure the resulting string only consists of a very limited amount of characters. The last step of the string preparation is to remove all remaining unwanted characters. Those can be backslashes due to PHP's sometimes bothersome magic quotesfeature. After that the string consists of most times 4 to 6 characters which match a certain set of patterns in a surprising high amount of tested vectors. The string we chose above and modified to circumvent the first detection technique before the processing:a1='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
a2='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
y='na'
$x=(1.)[(x=/eva/)?x[-1]+'l':$]
$x($x(y+'me')+1.)
And after the processing:((+::
Same goes for this remote code execution vector - this time PHP and no JavaScript, provided by tx on sla.ckers.org:
" ; //
if (!0) $_a = base64_decode ;
if (!0) $_b = parse_str ; //
$_c = "" . strrev("ftnirp");
if (!0) $_d = QUERY_STRING; //
$_e= "" . $_SERVER[$_d];
$_b($_e); //
$_f = "" . $_a($b);
$_c(`$_f`);//After the processing again:
((+::During the testing phase, 150 - 200 different vectors were tested with the PHPIDS Centrifuge and the results were often comparable to the above mentioned pattern. Thus a regular expression like the following was able to match more than 60% of the vectors that were processed:
(?:\({2,}\+{2,}:{2,})|(?:\({2,}\+{2,}:+)|(?:\({3,}\++:{2,})
At the moment the PHPIDS Centrifuge is optimized carefully and producing good results, including few false positives. The minimum length for a string to be processed by the centrifuge is set to 40 to make sure it won't decrease performance too much. On the other hand the centrifuge code is fairly simple and should not noticeably affect the overall performance.
Here is an example link you can test of a highly obscured JS payload that will send it to the CRS demo page. The results shows that the four techniques that we have highlighted above triggered:
– | Centrifuge Threshold Alert - Ratio Value is: %{tx.0} |
---|---|
Matched 2.5641025641026 at TX:ARGS:test_centrifuge_ratio | |
960023 | Restricted Character Anomaly Detection Alert - Total # of special characters exceeded |
Matched = at TX:restricted_char_count | |
960024 | Restricted Character Anomaly Detection Alert - Repetative Non-Word Characters |
Matched / $ at TX:ARGS:test_normalized | |
9000067 | Detects unknown attack vectors based on PHPIDS Centrifuge detection |
Matched ((++:: at TX:ARGS:test_centrifuge_converted |