Hey Buddy, Can You Spare a Log? Adventures in Log-Based Threat Hunting
Introduction
A long time ago, in a blog far, far away (August 1, 2016: Slinging Hash: Speeding Cyber Threat Hunting Methodologies via Hash-Based Searching) I presented how I used hash algorithms to speed up searching large DNS log files.
The only problem with that blog was that, at the time, I didn't have any really large DNS logs to play with. I had to make my own fake DNS logs. Now, even though I've worked several cases since then and now have millions of lines of actual DNS logs, I still can not use them, because they aren't mine and customer privacy is a cornerstone of incident response. So I'm still at the point I was at last August, when I promised that my next blog would be about creating humongous fake DNS logs. Despite working several interesting cases (one of which I wrote up in Raiding the Piggy Bank: Webshell Secrets Revealed), I found time to work on and finally finish my fake DNS log generating tool.
The Makednslog.exe Tool
Makednslog.exe is a Win32 executable I developed using Visual Studio and is written in C. Why C? Because I write C code with my right hand and Python code with my left… and I'm right handed.
Makednslog is used to generate DNS logs essentially for use in learning how to develop your skills at threat hunting. I will explain my motivation and reasoning for that aspect shortly, but here is a sample of what a fake log looks like:
20170208 20:40:10 544 PACKET 028B766A UDP Rcv 10.124.50.16 1F19 Q [0001 D NOERROR] A www.servo.com
20170208 20:40:10 588 PACKET 06926D9E UDP Rcv 10.124.26.113 3389 Q [0001 D NOERROR] A milestoneabroad.com
20170208 20:40:11 544 PACKET 034235B6 UDP Rcv 10.124.185.160 53C9 Q [0001 D NOERROR] A gp41jk3eg.izvestiia.ru
20170208 20:40:11 588 PACKET 03835084 UDP Rcv 10.124.36.48 0ED2 Q [0001 D NOERROR] A sentimentindia.com
20170208 20:40:12 8A8 PACKET 0CA32E7C UDP Rcv 10.124.10.76 6E5A Q [0001 D NOERROR] A nightcontrol.net
The program has multiple options that are used to control how large the log files are, how many lines they contain, the format of the date and domain names, and many other features.
Let's first look at the command-line options:
Usage: makednslog [-v] [-ofile OUTPUT-FILE] [-noms] [-ampm] [-datemode 1|2]
[-goback N] [-pci] [-split N] [-lines N] [-net 172[,192][,10]]
-v Verbose, print out lots of extra nonsense as the program runs.
-ofile Specify the OUTPUT-FILE filename. The default is dns.log.
-noms Select "No-Microsoft" format in domain names.
Microsoft Format: (3)www(5)yahoo(3)com(0)
No Microsoft Format: www.yahoo.com
-ampm Specify AM/PM time format. The default is 24-hour.
-datemode Specify the date mode in log entries. Options are:
Default: 20170225
Mode 1: 2017-02-25
Mode 2: Feb 25 2017
-goback Go back N months for initial log entry. Default is one month.
-pci Go back three months (90 days) for initial log entry.
-split Split log file every N MB.
-lines Generate log file with N lines. Default is 500,000.
-net Specify one or more networks. Default is 10.x.x.x.
And/or: 172.x.x.x
And/or: 192.168.x.x
Here are some sample command lines and their meaning:
C> makednslog –ofile server.log –pci –split 250
This command line will use server.log as the output filename, go back three months from the current date to begin logging, and split the log files every 250 MB. So, there will be multiple log files created as follows:
server.log
server1.log
server2.log
server3.log
And so on.
C> makednslog –net 10,172
This command line specifies two networks to be used in log creation. This is a common practice in organizations, with servers residing on one logical network and workstations on a different logical network. You can specify one, two, or all three networks.
C> makednslog -lines 4000000 –goback 2
This command line will create a single log file containing 4,000,000 lines, saved in dns.log. The starting date for the log is two months behind the current date.
Here is an actual execution:
It may be interesting to note that the average length of a line in the log file based on calculations from the results shown here is 106 characters (bytes). That could be useful if you want to estimate how many lines you will get for a certain sized log file, or how large a log file might become for a specific number of lines.
You may also want to check out the -v option at some point as there may be useful troubleshooting information or program activities that could be of interest. For example, using -v you will see each domain as it is read from "domains.txt" and converted into Microsoft format. The length of the longest Microsoft domain will also be displayed, as will other possibly interesting things such as when a new day of logging begins.
What's In The Logs?
While I was thinking about the features I wanted to add to makednslog I was also thinking about how I could test the threat hunting concepts I was researching into the types of Indicators of Compromise (IoCs) found in DNS logs.
There are plenty!
To make the fake DNS logs as useful as possible I wanted to be sure to load them up with all sorts of IoCs that someone could search for (via regular expressions, Splunk, or whatever other tool). At the time I was using Splunk's powerful search mechanism (which allows for the use of regular expressions) and have since utilized grep and awk as well.
Accordingly, several systems in the imaginary network represented by the fake log are "infected" and sending out queries to several different groups of domains. But these malware domains are the needles in the haystack of all the other "normal" DNS queries from systems on the network. This is the nature of threat hunting: wading through the harmless chatter to find something of value. Even an infected machine still sends out legitimate DNS queries (along with malicious ones), so one aspect of threat hunting that we need to get good at is identifying patterns. These patterns may be related by IP address, time, domain name, subdomain characteristics, and even error codes. All of these patterns are scattered throughout the generated logs.
Specifically, the following IoCs (or things that might be IoCs until they are disproven) and other characteristics can be found in each log created:
Lots of Different Domains
Makednslog creates log files that are filled with DNS requests to numerous domains. The domains that are used are read from the text file "domains.txt" which begins like this:
Notice that www.yahoo.com appears twice. It actually appears more than two times (there are over 1000 domains in the file), and several other domains appear more than once as well. This is done on purpose to increase the frequency of "more popular" domains.
The presumption here is that the DNS logs represent an aggregate of the browsing activity of the users in your organization, as well as domains queried by various software applications, both legitimate and malicious. So, one would expect to see multiple requests for the same domain (such as yahoo.com, facebook.com, etc.). Yes, I am willfully ignoring the fact that an organization will typically use its own internal DNS server to cache domains it has already looked up. But I wanted to have lots of duplicate domains in my logs so that I could practice developing searches that could identify the top 10 domains as well as reject a specific domain to reduce "noise" in my search results.
The reason I stored the domains in a text file is to enable anyone using makednslog to be able to add their own domains, with the hope that it would assist in developing more realistic searches for their environment.
As shown in the following figure, the last line in "domains.txt" must be /END/ so any domains that are added must be placed before the last line.
Microsoft-format domain names are automatically created from the domains read from the file. A maximum of 2000 domains are allowed (sorry… no dynamic storage allocation for the domains arrays).
Periodic Beacons
Since malware often uses beaconing to advertise its presence on a system to its CnC server, there are three types of beacons built into makednslog, occurring at the following intervals:
- One minute
- Ten minutes
- One hour
These three beacons come from three different "compromised systems" and the nature of the domains in the beacons are different. Without giving too much away and ruining the joy of discovery for you, here are some examples of the malicious domains that are generated:
ddkrobmajnxcdghepoqpw.com (or .org, .net, .biz, .info, .ru)
download5f3c-zone.net (among other TLDs)
a9cbe23dd4fd12ed01b6502a7ab4ef0a.daisyland.net (among other TLDs)
DGA Simulation
Malware that employs a Domain Generation Algorithm will generate domain requests containing seemingly random domain names. Numerous NXDOMAIN errors are often associated with this type of activity until the DGA generates the currently-registered domain. An example looks like this:
1jtqtysq8564fxp85r7d4xb8eskdle913mrs.oldbooks.org (among other TLDs)
Bursts of Requests
Multiple DNS requests in a short period of time (less than one second) coming from the same source IP may indicate program behavior (possibly malicious) rather than human behavior. These bursts are associated with data exfiltration, which is described next.
Data Exfiltration
Malware can use DNS requests to specially-crafted subdomains to exfiltrate information. These subdomains may contain hexadecimal-only symbols or seemingly-random alphanumeric symbols that represent encrypted data. For fun, I also threw in some base64-encoded subdomains. If you can find them, they will decode to meaningful information. The decoded base64 strings contain exfiltrated information including:
- Operating system details
- User account names
- Credit card numbers (fake of course)
- AntiVirus status messages
There are also base64 subdomain strings that contain snippets of the following message (after encoding):
"If you are reading this I assume you have been correctly identifying and decoding the different types of IoCs I have placed into the fake DNS log. My congratulations for a job well done. Of course I could be wrong, and you may never read any of this, but I choose to remain positive that you will solve my little mystery. Good luck and happy threat hunting!"
Note that the base64 strings do not contain the "+" symbol or the "=" padding symbol because I chose the strings I was encoding carefully.
Here are some examples of exfiltrated data:
QWNjdDogc3lzYWRtaW43.petraplace.net (among other TLDs)
IHBvc2l0aXZlIHRoYXQgeW91IHdpbGwgc29sdmUgbXkgbGl0.axa.biz (among other TLDs)
a7fd03b9c629ffe7a9bee2c2a85bb18a3cf6e49b.starsearch.net (among other TLDs)
p3483-72ca-9e67-9f9a.secret.org (among other TLDs)
Foreign Country Codes
The "domains.txt" file contains a generous set of domains from foreign countries. These are included to help you practice locating all domains ending with ".ru" or some other country code, or even identifying all the country codes that appear within the log.
Possible Spoofing
Makednslog creates logs based on a fictitious organizations internal 10.x.x.x, 172.x.x.x, or 192.168.x.x network(s). However, occasionally, an IP address that is not in any of these ranges will generate a DNS request. Any "off network" source IP is a candidate for possible spoofing, or at least a misconfigured system that needs to be investigated. Once again, a threat hunter needs to be able to recognize things that should not show up in the logs, which may turn out to be harmless, but nonetheless need to be investigated.
Patience, Grasshopper
Depending on how many months you are going back, be prepared to wait a little bit while makednslog does its work. I'm running Windows 7 64-bit on a 2.8 GHz AMD A4-3420 CPU and it takes approximately 4 minutes per month for log generation.
Here are some statistics to help you get an idea of log sizes:
1 Month | 2 Months | 3 Months | |
Lines (noms) | 16.5M | 35.3M | 54.2M |
Size (noms) | 1.7 GB | 3.7 GB | 5.7 GB |
Lines | 16.5M | 35.3M | 54.2M |
Size | 1.9 GB | 4.1 GB | 6.2 GB |
As you can see, the Microsoft format logs are roughly 10% larger than the non-Microsoft format logs, due to the millions of extra "(0)" and similar terms present in the domain names.
It may also be interesting to note that a month of 30 days contains 2,592,000 seconds. So, on average, makednslog generates 6 logged events per second. This could be a lot of events for one network but just a fraction of events for a different network. But, 16 million log entries per month is plenty to experiment with as you hone your threat hunting skills.
Have Log, Will Hunt
I always consider time spent creating a new program worthwhile, as the benefits often extend beyond the immediate task the program solved. One benefit is that I can now use makednslog as the base code for other fake log creation, such as web server logs, with just a little more programming effort. Code reuse is a wonderful thing.
Makednslog also enabled me to begin practical threat hunting research and development faster than if I had to wait until I had enough real world log data to play with. Until you have massive amounts of text to search through you may not appreciate how challenging it sometimes is to craft the appropriate regular expression to tease out what you are looking for.
If there is anyone out there in the same position that I was in, the makednslog executable and source code is available for threat hunting testing and educational purposes on Trustwave's community release GitHub. Visit
https://github.com/jantonakos/ThreatHuntingExcursions for the goodies.
Happy hunting!
ABOUT TRUSTWAVE
Trustwave is a globally recognized cybersecurity leader that reduces cyber risk and fortifies organizations against disruptive and damaging cyber threats. Our comprehensive offensive and defensive cybersecurity portfolio detects what others cannot, responds with greater speed and effectiveness, optimizes client investment, and improves security resilience. Learn more about us.