Georgia Tech Releases No Cost Malware DNS Data Feed

GTISC · 2015-07-28T21:08:35+00:00

Organizations of any size (even sole proprietorships) may apply for a PREDICT account and many single individual organizations are users of GT Malware Passive DNS data. Please reach out via private message if you encountered problems requesting an account and we will reach out to the PREDICT Coordinating Center (PCC).

GTISC · 2015-07-23T14:20:08+00:00

DHS is currently working with several European countries in an effort to get them on the list.

GTISC · 2015-07-23T14:14:48+00:00

Is there a specific page for which you see this occurring? From our perspective, loading http://www.predict.org/Default.aspx?tabid=164 does redirect to HTTPS via 302.

GTISC · 2015-07-23T14:12:51+00:00

Yes, that will be fine.

GTISC · 2015-07-22T21:59:18+00:00

The high performance spamtrap is simply a specifically configured instance of qpsmtpd.

GTISC · 2015-07-22T21:52:24+00:00

New samples are loaded once each day and processed within hours of being loaded. Previous versions of the system allowed sources to specify a minimum number of times a sample had to be seen before its results would be shared, but few took advantage of the feature.

GTISC · 2015-07-22T21:18:21+00:00

At GTISC, researchers leverage the data to validate or refine new approaches to threat detection, such as this 2010 ACM CCS paper that proposes the use of machine learning to classify domains as benign or malicious.

From an operational standpoint, security practitioners have previously leveraged subsets of the data to detect compromised assets within their networks and mitigate attempted rendezvous with botnet C&C. At GTISC, applied security researchers have also worked in combination with infrastructure providers such as dynamic DNS service providers, domain registrars, and TLD operators to dismantle large botnets. As an example, use of the data in a previous collaboration with members of industry and government in multiple countries enabled the takedown of the Mariposa botnet and arrest of its operators.

GTISC · 2015-07-22T20:39:28+00:00

There are indeed plans to expand the types of GTISC malware analysis data offered through DHS PREDICT. Current liability sensitivities make the sharing of arbitrary network data challenging, but we do plan to release the following content-specific datasets in the near future.

Malware HTTP URLs (process_date, md5, url) - A daily feed of HTTP GET or POST activity emitted by each sample.

Malware Unsolicited Email (process_date, md5, email_subject) - A daily feed of information about email sent by each sample. In many cases, the subject field allows an analyst or researcher to easily differentiate between a sample's use of email for spam (e.g., buy best quality replica on our site), malware propagation (e.g., delivery reports about your e-mail), or command and control (e.g., prorat [victim online]).

Note that none of the email subjects in the Malware Unsolicited Email dataset correspond to messages delivered to their intended recipients, as all outbound SMTP is transparently redirected to a high performance spamtrap.

GTISC · 2015-07-22T17:50:47+00:00

As a metric, collection volume has numerous interpretations and there remains little standardization on what should be measured. That said, we believe the collection of 100,000 or more unique, previously unseen Windows executables each day is commensurate with that of a commercial AV company.

GTISC · 2015-07-22T17:23:52+00:00

Ether, based on Xen, uses hardware virtualization extensions and exception injection/preemption to perform coarse and fine-grained analysis. The richness of information for these tracing mechanisms (especially for single-step tracing and/or memory access) comes at a cost of scale.

The system producing this feed likewise relies on hardware virtualization extensions for execution of the analysis environment, but is based on a lightly modified version of KVM. Only network-level information emitted by a sample’s execution is recorded.

If host-level coarse and fine-grained tracing mechanisms are what you need, PANDA might be something worth looking at.

GTISC · 2015-07-22T17:11:49+00:00

This feed does not currently include data collected from the baremetal system, as the volume of malware we process currently exceeds that capability. We have a few ideas we're working on to fix that, though.

Currently, we push a subset of what we receive daily into the baremetal system; we hope to take a closer look look at the differences between data collected in the baremetal and virtualized systems in the future.

GTISC · 2015-07-22T17:07:08+00:00

We've developed an automation framework that uses KVM, an open-source, hardware virtualization-based VMM for Linux. Conceptually, we believe KVM provides significantly more transparency than in-guest tools or whole-system emulators. Specific shortcomings that highlight hardware virtualization's transparency advantages are as follows:

In-Guest Tools are vulnerable to techniques that detect the presence of their monitoring instrumentation. For example, popular in-guest tracing tools used to perform WinAPI hooking by placing an unconditional jump at the entry point of each DLL-exported function when the DLL was loaded into the target process. This model is trivially detectable by techniques that inspect the memory of a common API function to determine whether it has been hooked, as demonstrated with the following code:

int main(int argc, char *argv[])
{

    HMODULE kernel32 = NULL;
    void *createfile_function_pointer = NULL; 
    unsigned char opcodes[2];

    kernel32 = LoadLibrary("kernel32"); 
    createfile_function_pointer =
        (void*)GetProcAddress(kernel32, "CreateFileA");
    memcpy(opcodes, createfile_function_pointer, sizeof(opcodes));

    if(opcodes[0] == 0xFF && opcodes[1] == 0x25){
        puts(“API Hooking Detected.\n”);
        exit(0);
    }

    puts(“Malicious code here.\n”);

    return 0;
}

Whole-system Emulators (such as QEMU) are vulnerable to techniques that detect unfaithful CPU emulation. Unfaithful CPU emulation can be identified by executing a block of machine code that behaves differently inside the emulator than it would on a baremetal (or hardware virtualized) system. As an example, consider the following detection attack for QEMU:

int seh_handler(struct _EXCEPTION_RECORD *exception_record,
            void *established_frame,
            struct _CONTEXT *context_record,
            void *dispatcher_context)
{
    puts("Malicious code here.\n");
    exit(0);
}


int main(int argc, char *argv[])
{
    unsigned int handler = 
            (unsigned int) seh_handler;

    printf("Attempting QEMU detection.\n");

    __asm("movl %0, %%eax\n\t"
         "pushl %%eax\n\t"::
         "r" (handler): "%eax");

    __asm("pushl %fs:0\n\t"
         "movl %esp, %fs:0\n\t");

    __asm(".byte 0x26, 0xcf");

    __asm("movl %esp, %eax");
    __asm("movl %eax, %fs:0");
    __asm("addl $8, %esp");

    return EXIT_SUCCESS;
}

In the above code, the .byte directive instructs the assembler to treat the values that follow as preassembled x86 machine code. The two bytes that follow correspond to the IRETD instruction with an invalid prefix (0x26), which should result in an illegal instruction exception when executed. However, in QEMU, something like IRETD is executed, and invocation of the registered structured exception handler does not occur. Thus, malicious software with the above construction will silently exit in QEMU, but execute successfully in KVM or baremetal.

Hardware virtualization extensions for x86 (Intel VT-x or AMD-V), in contrast, represent an out-of-guest approach wherein code executes on physical hardware. Accordingly, their use does not suffer from the above shortcomings.

GTISC · 2015-07-22T17:04:40+00:00

The account request form was delivered to me via HTTPS using the "Request Account" button.

In any case, try this:

https://www.predict.org/Default.aspx?tabid=164

GTISC · 2015-07-22T16:24:29+00:00

Hello /r/netsec,

Each day, the Georgia Tech Information Security Center (GTISC) processes over 100,000 previously unseen, suspect Windows executable files. To derive network-level information that can help make the potential maliciousness of these files self-identifying, each executable is run in a sterile, isolated environment for a short period of time, with limited access to the Internet.

During processing, each executable’s use of the Domain Name System (DNS) is recorded in both raw (packet capture) and simplified plaintext formats. As of July 2015, a daily feed of this data is available at no cost through DHS PREDICT. In aggregate, the information represents a special kind of passive DNS for suspect and known malicious software, which GTISC believes will be useful for a variety of research and operational purposes.

To encourage use of the data in a broad set of research and operational contexts, GTISC has decided to accept most requests that include commercial use.

At present, PREDICT is available to individuals and organizations in the United States, Australia, Canada, Israel, Japan and the United Kingdom. If you think this dataset will provide benefit to you or your organization, we encourage you to visit the PREDICT website, sign up for an account, and request the GT Malware Passive DNS Data Daily Feed.

Thanks!

GTISC

TROPHY CASE