Trustwave's 2024 Retail Report Series Highlights Alarming E-Commerce Threats and Growing Fraud Against Retailers. Learn More

Trustwave's 2024 Retail Report Series Highlights Alarming E-Commerce Threats and Growing Fraud Against Retailers. Learn More

Services
Managed Detection & Response

Eliminate active threats with 24/7 threat detection, investigation, and response.

Co-Managed SOC (SIEM)

Maximize your SIEM investment, stop alert fatigue, and enhance your team with hybrid security operations support.

Advisory & Diagnostics

Advance your cybersecurity program and get expert guidance where you need it most.

Penetration Testing

Test your physical locations and IT infrastructure to shore up weaknesses before exploitation.

Database Security

Prevent unauthorized access and exceed compliance requirements.

Email Security

Stop email threats others miss and secure your organization against the #1 ransomware attack vector.

Digital Forensics & Incident Response

Prepare for the inevitable with 24/7 global breach response in-region and available on-site.

Firewall & Technology Management

Mitigate risk of a cyberattack with 24/7 incident and health monitoring and the latest threat intelligence.

Solutions
BY TOPIC
Microsoft Security
Unlock the full power of Microsoft Security
Offensive Security
Solutions to maximize your security ROI
Rapidly Secure New Environments
Security for rapid response situations
Securing the Cloud
Safely navigate and stay protected
Securing the IoT Landscape
Test, monitor and secure network objects
Why Trustwave
About Us
Awards and Accolades
Trustwave SpiderLabs Team
Trustwave Fusion Security Operations Platform
Trustwave Security Colony
Partners
Technology Alliance Partners
Key alliances who align and support our ecosystem of security offerings
Trustwave PartnerOne Program
Join forces with Trustwave to protect against the most advance cybersecurity threats

Unicode Visual Spoofing for Good: Confusable CAPTCHAs

In this blog post, I will show a proof of concept method of leveraging Unicode Visual Spoofing/Lookalikes for use in a CAPTCHA to help prevent automated bots from scraping pages and autosubmitting data.

Unicode Visual Spoofing/Lookalikes

An in-depth discussion of Unicode and the security challenges it poses is beyond the scope of this post, however there are a few salient points to mention. The first of which is the issue of Visual Spoofing. Chris Weber of Casaba Security has an outstanding presentation entitled "Exploiting Unicode-enabled Software" in which he outlines this issue. Here are two applicable points:

Visual Spoofing

  • Over 100,000 assigned characters
  • Many lookalikes within and across scripts

AΑАᐱᗅᗋᗩᴀᴬ⍲ꜲA����

Example IDN Homograph Attack

www.google.com is not www.gooɡle.com

g = LatinU+0069
ɡ = LatinU+0261

The main issue for security is that, unless data is properly canonicalized before security checks, it is possible for attackers to evade detections. Unicode Visual spoofing can easily be used by criminals in phishing attacks. Even savy Internet users may be tricked into clicking on links at the these Unicode code points are oftentimes visually indistiguishable from one another.

CAPTCHAs

The underlying issue outlined above is that computer programs and humans may interpret Unicode characters differently. We can leverage this issue in our favor if we implement the same concept in a different context - CAPTCHAs.

A CAPTCHA (pronounced /ˈkæptʃə/) is a type of challenge-response test used in computing as an attempt to ensure that the response is not generated by a computer. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are supposedly unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted to a human, in contrast to the standard Turing test that is typically administered by a human and targeted to a machine. A common type of CAPTCHA requires the user to type letters or digits from a distorted image that appears on the screen.

Here is an example of typical CAPTCHA usage where a graphic is used with obscured text characters displayed:

8525_2dec3f8e-45ac-475e-adc8-12bec5e51ad5
The user must visually decipher the test and input it into the text box.

Turning the Tables: Visual Spoofing in CAPTCHAs

Rather than using an image file with obscured text in it, the concept presented here is to use Unicode Visually Spoofing/Lookalikes to essentially "trick" the user into entering the text that you desire.

Here is an example Comment form CAPTCHA that implements this concept by adding in an addition field to the end of the form:

            <form method="post" action="http://www.example.com/cgi-bin/mt/mt-c.cgi" name="comments_form" id="comments-form" onsubmit="if (this.bakecookie.checked) rememberMe(this)">             <input type="hidden" name="static" value="1" />             <input type="hidden" name="entry_id" value="43271" />             <input type="hidden" name="__lang" value="en" />             <input type="hidden" name="parent_id" value="" id="comment-parent-id" />            <div id="comments-open-data">                 <div id="comment-form-name">                     <label for="comment-author">Name</label>                     <input id="comment-author" name="author" size="30" value="" />                 </div>                 <div id="comment-form-email">                     <label for="comment-email">Email Address</label>                     <input id="comment-email" name="email" size="30" value="" />                 </div>                                 <div id="comment-form-remember-me">                     <label for="comment-bake-cookie"><input type="checkbox" id="comment-bake-cookie" name="bakecookie" onclick="if (!this.checked) forgetMe(document.comments_form)" value="1" />                         Remember personal info?</label>                 </div>             </div>             <div id="comments-open-text">                 <label for="comment-text">Comments (You may use HTML tags for style)</label>                 <textarea id="comment-text" name="text" rows="15" cols="50"></textarea>             </div>   <div id="comments-open-footer">                 <!--input type="submit" accesskey="v" name="preview" id="comment-preview" value="Preview" /-->                 <br><label for="challenge_answer">Type the word &#1072;pple below. <strong>(required)</strong>:</label><br /><input type="text" id="challenge_answer" name="challenge_answer" /><br><input type="submit" accesskey="s" name="post" id="comment-submit" value="Submit" />                 </div>         </form> 

This html adds in a new text field called "challenge_answer" where this data will be sent along with the standard POST arguments when the form is submitted to the web app. Notice the highligted text area at the end of the form? It includes an encoded A (Cyrillic) character (&#1072) instead of a Latin small letter "a" to display the word "apple".

Here is how the form would look to user in a web browser:

Screen shot 2011-05-10 at 10.51.39 AM

So the concept is that a malicious SPAM bot program would most likely scrape the raw html above and either insert the raw &#1072 or а (A_(Cyrillic) data into the text field, while a human would insert a normal a (Lating small letter "a") when spelling the word "apple".
</form>

Implementation/Validation of Confusable CAPTCHA using ModSecurity

We can implement this Confusable CAPTCHA concept dynamically into forms by using new ModSecurity v2.6 capabilities such as Content Modification.

Enabling Content Modification

In order to dynamically modify outbound response bodies in ModSecurity, you must enable the following two directives:

Modifying Outbound Forms

In order to modify the existing html form data, you can use the following example ModSecurity rules which uses the new @rsub operator which allows for data substitution:

SecRule STREAM_OUTPUT_BODY "@rsub s/<input type=\"submit\"/<br><label for=\"challenge_answer\">Type the word &#1072;pple below. <strong>(required)<\/strong>:<\/label><br \/><input type=\"text\" id=\"challenge_answer\" name=\"challenge_answer\" \/><br><input type=\"submit\"/" \"phase:4,t:none,nolog,pass"

This rule will trap any existing form "Submit" button elements and then prepend our Confusable CAPTCHA data before it.

Validating CAPTCHA Data

We now implement two SecRules to validate the CAPTCHA data.

SecRule REQUEST_FILENAME "@streq /cgi-bin/mt/mt-c.cgi" "chain,phase:2,t:none,block,msg:'Comment Post Error: CAPTCHA Challenge Missing.'"        SecRule &ARGS:CHALLENGE_ANSWER "@eq 0"SecRule REQUEST_FILENAME "@streq /cgi-bin/mt/mt-c.cgi" "chain,phase:2,t:none,block,msg:'Comment Post Error: Invalid CAPTCHA Challenge Answer.',logdata:'%{args.challenge_answer}'"        SecRule ARGS:CHALLENGE_ANSWER "!@streq apple"

These rules check the Comment Form receiving page (/cgi-bin/mt/mt-c.cgi) and then ensure that that the challenge_answer is present and that is contains exactly the word "apple" with a Latin lower case "a". If these checks fail, then the requests will be blocked and alerts generated.

Example alert:

[Tue May 10 08:42:30 2011] [error] [client xxx.xxx.xxx.xxx] ModSecurity: Warning. Match of "streq apple" against "ARGS:challenge_answer" required. [file "/usr/local/apache/conf/crs/base_rules/modsecurity_crs_14_customrules.conf"] [line "9"] [msg "Comment Post Error: Invalid CAPTCHA Challenge Answer."] [data "&#1072;pple"] [hostname "www.example.com"] [uri "/cgi-bin/mt/mt-c.cgi"] [unique_id "TckytsCoAW0AAB9vOWoAAAAD"]

Confusable CAPTCHA Effectiveness

Keep in mind that this is simply a proof of concept at this point and it has not yet been field tested. This implementation is not meant as a replacement for programs such as ReCAPTCHA. The idea is that this implementation would stop automated programs from scraping your comment form data and auto-submitting SPAM posts. This concept would obviously be circumvented by CAPTCHA answering services as well.

If you decided to field test this concept, we would love to hear from you.

ABOUT TRUSTWAVE

Trustwave is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.

Latest Intelligence

Discover how our specialists can tailor a security program to fit the needs of
your organization.

Request a Demo