ModSecurity Advanced Topic of the Week: Credit Card Tracking
1. Introduction
The Payment Card Industry Data Security Standard (PCI-DSS for short), requires that credit card numbers are not transmitted in clear and are not presented to users unmasked. The following post outlines several detection accuracy issues that must be addressed by a monitoring solution and we will focus on ModSecurity.
2. Matching a Credit Card Number
2.1 Matching a Credit Card Number Sequence
A credit card number includes 13 to 16 digits. In addition, real world presentation of a credit card number often include delimiters such as dashes or spaces, usually in specific positions. The following regular expression can be used to match potential credit card number sequences:
\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4}
2.2 Boundaries
For long sequences of digits, which are common in network traffic, the above regular expression would match multiple sequences of the desired length. In order to avoid that, we need to define the sequence delimiters. What can or cannot be a valid delimiter might vary according to the application. Not requiring any delimiter would generate many false positives while requiring delimiters might lead to false negatives. For example, should we allow a leading "0"? A reasonable choice for a delimiter would be any non-digit character. The resulting regular expression is:
(?<!\d)\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4}(?!\d)
or if a regular expression engine does not support look-ahead and look-behind searches:
(?:^|[^\d])(\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4})(?:[^\d]|$)
2.3 Validate the number against the LUHN checksum algorithm
However sequences of 13 to 16 are not always credit card numbers. There are many other long numbers in typical network traffic. For example, we often find that identification numbers such as product IDs used in online stores are also 13-16 digit numbers. Luckily, a credit card number has to conform to the LUHN checksum function. ModSecurity has implemented this algorithm in the @verifyCC operator that checks that each sequence of digits detected is a valid credit card number.
verifyCC
Description: This operator verifies a given regular expression as a potential credit card number. It first matches with a single generic regular expression then runs the resulting match through a Luhn checksum algorithm to further verify it as a potential credit card number.
Example:
SecRule ARGS "@verifyCC (?:^|[^\d])(\d{4}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{1,4})(?:[^\d]|$)" \
"phase:2,sanitiseMatched,log,auditlog,pass,msg:'Potential credit card number'"
Is this enough to avoid false positives? The LUHN function is a checksum function that generates an additional digit for each number and therefore it matches 1 out of 10 consecutive numbers. Since in most cases applications use numbers of this length as identification numbers, the applications would probably use many consecutive numbers, and therefore 1 out of 10 numbers used would be a valid credit card number. Therefore validating sequences using the LUHN formula reduces false positives by 90% but does not eliminate them.
2.4 Checking Prefixes
To reduce the amount of false positives, a monitoring system can check that the credit card number is not just valid but was also assigned. Naturally the monitoring system cannot include a list of all assigned numbers, but it can check for prefixes which where assigned to different financial institutes. A pretty good table of assigned prefixes can be found on Wikipedia. Prefixes further reduce false positives and can be implemented using a regular expression.
Example rules to detect MasterCard, Visa and American Express Credit Card Numbers:
# MasterCard
SecRule ARGS "@verifyCC (?:^|[^\d])(5[1-5]\d{2}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{4})(?:[^\d]|$)" \
"phase:2,t:none,sanitiseMatched,log,auditlog,pass,msg:'MasterCard Credit Card Number detected in user input',id:'920005',tag:'PCI/10.2',severity:'5'"
# Visa
SecRule ARGS "@verifyCC (?:^|[^\d])(4\d{3}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d(?:\d{3})??)(?:[^\d]|$)" \
"phase:2,t:none,sanitiseMatched,log,auditlog,pass,msg:'Visa Credit Card Number detected in user input',id:'920007',tag:'PCI/10.2',severity:'5'"
# American Express
SecRule ARGS "@verifyCC (?:^|[^\d])(3[47]\d{2}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{3})(?:[^\d]|$)" \
"phase:2,t:none,sanitiseMatched,log,auditlog,pass,msg:'American Express Credit Card Number detected in user input',id:'920009',tag:'PCI/10.2',severity:'5'"
Assigned numbers account for 1% to 17% of the valid credit card numbers, depending on the sequence length. Prefixes are especially useful for eliminating the less often used sequences of 14 and 15 digits (1.2% and 2.5% prefix coverage respectively), leaving us with mostly the 13 and 16 digits sequences.
Here is an example of an audit log entry of a Credit Card submittal transaction:
--5052354b-A--
[04/Jan/2011:12:23:31 --0500] TSNXk8CoqAEAAGkkLpgAAAAB ::1 61526 ::1 443
--5052354b-B--
POST /cart/purchase.php HTTP/1.1
Host: www.buymore.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: https://www.buymore.com/cart/view.php
Content-Type: application/x-www-form-urlencoded
Content-Length: 39
--5052354b-I--
items=1&cc%5fnumber=***************
Notice that since these rules use the sanitiseMatched action, the CC data is obscured to prevent data leakages within the logs. This transaction would generate the following alert message:
Message: Warning. CC# match "(?:^|[^\d])(3[47]\d{2}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{3})(?:[^\d]|$)" at ARGS:cc_number.
[offset "0"] [file "/usr/local/apache/conf/modsec_current/optional_rules/modsecurity_crs_25_cc_known.conf"] [line "31"] [id "920009"]
[msg "American Express Credit Card Number detected in user input"] [severity "NOTICE"] [tag "PCI/10.2"]
3. Logging
Logging is just as important as detection for a monitoring system. This is all the more so with credit card numbers detection: in many cases a security breach can be mitigated better if the organization knows what actual information leaked. For example, different state disclosure bills such as California SB-1386 require an organization to notify all affected clients in case of a breach. If the organization does not know who the affected clients are, it must notify everyone, raising the price of the breach and the media exposure.
Unfortunately, logging credit card leakage incidents is not trivial. PCI DSS does not allow the credit card number itself to be logged. On the other hand, the logging record must include enough information to be useful. Useful implementation must keep two levels of log:
- Alert logs that can be used to analyze what happened, but do not include the actual credit card number, or possibly a masked version of it.
- Encrypted store for the credit card data itself.
Here are some example CC data leakage rules:
# MasterCard
SecRule RESPONSE_BODY|RESPONSE_HEADERS:Location "@verifyCC (?:^|[^\d])(?<!google_ad_client = \"pub-)(5[1-5]\d{2}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{4})(?:[^\d]|$)" \
"chain,logdata:'Start of CC #: %{tx.ccdata_begin}***...',phase:4,t:none,ctl:auditLogParts=-E,block,msg:'MasterCard Credit Card Number sent from site to user',id:'920006',tag:'WASCTC/5.2',tag:'PCI/3.3',severity:'1'"
SecRule TX:1 "(\d{4}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{1,4})" "chain,capture,setvar:tx.ccdata=%{tx.1}"
SecRule TX:CCDATA "^(\d{4}\-?)" "capture,setvar:tx.ccdata_begin=%{tx.1},setvar:tx.anomaly_score=+{tx.critical_anomaly_score},setvar:tx.%{rule.id}-LEAKAGE/CC-%{matched_var_name}=%{tx.0}"
# Visa
SecRule RESPONSE_BODY|RESPONSE_HEADERS:Location "@verifyCC (?:^|[^\d])(?<!google_ad_client = \"pub-)(4\d{3}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d(?:\d{3})??)(?:[^\d]|$)" \
"chain,logdata:'Start of CC #: %{tx.ccdata_begin}***...',phase:4,t:none,ctl:auditLogParts=-E,block,msg:'Visa Credit Card Number sent from site to user',id:'920008',tag:'WASCTC/5.2',tag:'PCI/3.3',severity:'1'"
SecRule TX:1 "(\d{4}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{1,4})" "chain,capture,setvar:tx.ccdata=%{tx.1}"
SecRule TX:CCDATA "^(\d{4}\-?)" "capture,setvar:tx.ccdata_begin=%{tx.1},setvar:tx.anomaly_score=+{tx.critical_anomaly_score},setvar:tx.%{rule.id}-LEAKAGE/CC-%{matched_var_name}=%{tx.0}"
# American Express
SecRule RESPONSE_BODY|RESPONSE_HEADERS:Location "@verifyCC (?:^|[^\d])(?<!google_ad_client = \"pub-)(3[47]\d{2}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{3})(?:[^\d]|$)" \
"chain,logdata:'Start of CC #: %{tx.ccdata_begin}***...',phase:4,t:none,ctl:auditLogParts=-E,block,msg:'American Express Credit Card Number sent from site to user',id:'920010',tag:'WASCTC/5.2',tag:'PCI/3.3',severity:'1'"
SecRule TX:1 "(\d{4}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{1,4})" "chain,capture,setvar:tx.ccdata=%{tx.1}"
SecRule TX:CCDATA "^(\d{4}\-?)" "capture,setvar:tx.ccdata_begin=%{tx.1},setvar:tx.anomaly_score=+{tx.critical_anomaly_score},setvar:tx.%{rule.id}-LEAKAGE/CC-%{matched_var_name}=%{tx.0}"
Handling logging of CC leakages is tricky. The sanitiseMatched action isn't as useful here as it would end up obscuring the entire RESPONSE_BODY payload within the audit logs. So for these rules, we are instead using the "ctl:auditLogParts=-E" action to dynamically remove this data from the logs. While this is required from a PCI logging perspective, it makes identifying false positive more challenging. What we chose to do here is to capture the first 4 digits of the matched payload and to show that within the generated alert. This will help to ensure an accurate match while still protecting sensitive data within the logs. Here is an example CC Leakage alert message:
Message: Warning. Pattern match "^(\d{4}\-?)" at TX:ccdata. [file "/usr/local/apache/conf/modsec_current/base_rules/modsecurity_crs_25_cc_known.conf"] [line "73"] [id "920010"]
[msg "American Express Credit Card Number sent from site to user"] [data "Start of CC #: 3723***..."] [severity "ALERT"] [tag "WASCTC/5.2"] [tag "PCI/3.3"]
Notice the bolded section in the data action, as it is only showing the beginning portion of the matched string.
4. Handling False Positives Using Exceptions
In real world systems 1% is still a high number, especially as sequences of digits are quite common in web traffic. If a human would have to examine even hundreds of alerts a day, the WAF becomes not useful. How can we make the credit card detection accuracy better?
One way to do that would be to create exceptions for traffic known to generate such false positives. Exception can be defined both for non credit card sequences as well as for intentional and legal transmission of credit card numbers. Such exceptions are a curse as much as a blessing, as overusing them or defining them too broadly will open big security holes.
Lets take for example a 16 digit sequence used as a product ID in a web site. The product ID may have some unique attributes such as its own prefix or surrounding text that can help to make the exception narrower. A good example is Google AdSense. A site running Google ads needs to add the following piece of code to each page displaying ads:
<script type="text/javascript"><!-- google_ad_client = "pub-0000000000000000"; google_alternate_color = "ffffff"; ...
Many times the 16 digits ID in the google_ad_client parameter is a valid credit card number. The following modified regular expression will compensate for that:
(?<!google_ad_client = \"pub-)(?<!\d)(\d{4}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{1,4})(?!\d)
ModSecurity can also define a fine grained exception using SecRules. In this case, a ModSecurity rule can be defined to exclude credit card number detection for the specific field on a specific page used for the product ID. Let's assume for example that a ModSecurity rule number 955555 detects credit cards in an application output, but the page /support/manual_payment.php, available only to store personal, must display a credit card number. The following is a simple ModSecurity exception for ignoring this rule for a single page:
SecRule REQUEST_FILENAME "@streq /support/manual_payment.php" \
"phase:1,t:none,nolog,pass,ctl:ruleRemoveByID=955555"
5. Other Sensitive Identifiers
While credit card numbers are the most well known sensitive identifier for which PCI DSS requires special attention, it is neither the only one, nor the most sensitive. Card Verification Code (CVV) is a 3 or 4 digit code on the back of a credit card that is often used as an additional identification number in online transactions. CVV is even more sensitive than a credit card number, but much harder to detect as it is so short and has no checksum digit.
One way to detect use of CVV numbers is to find a 3 or 4 digits value in a field on a form where a credit card number was found. This method is far from immune to false positives, but in paranoid environments might pull the trick.
The latest version of the OWASP ModSecurity CRS has some experimental CVV/Track 1&2 leakage rules.
6. Conclusion
Detecting theft of credit card numbers is challenging however it can be tremendously useful for detecting unintentional leakage of credit card numbers. Hopefully this blog post has provided useful information which may aid in your efforts to track this type of sensitive data within your web applications.
ABOUT TRUSTWAVE
Trustwave is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.