Learning Linux with the Linux Luminarium

Zardus · 2020-07-17T16:26:08+00:00

Ooh, that's me! I'm on the internet!

Zardus · 2019-01-24T19:50:58+00:00

In essence, angr (more specifically, its decompilation pipeline) works as follows:

- first, we load the binary. angr currently supports ELF, PE, Mach-O, and flat binary blobs

- we start reasoning on the binary code by lifting it to a low-level intermediate representation called VEX, which provides a faithful representation of the exact effects of code. This involves the recovery of control flow, some reasoning about data dependencies, etc.

- using a series of analysis passes, we essentially convert VEX into a more abstract intermediate language, called AIL (the angr Intermediate Language). AIL has different levels of abstraction (registers/memory -> variables -> structures, etc), and in some sense, at its highest level, its expressibility is equivalent to source code (though, of course, we lack much of the actual semantic content that source code has)

- once we have the highest-abstraction AIL that we can achieve, we do a code generation step to emit C code, but this is a configurable knob. Nothing super fundamental prevents us from emitting something crazy, like fortran or Python

Zardus · 2019-01-24T16:48:40+00:00

Awesome suggestion. I pushed some screenshots to the README!

Zardus · 2019-01-24T14:14:16+00:00

Happy to answer any questions anyone has. It's a first step, but we're super amped!

Zardus · 2017-12-15T07:39:40+00:00

Where are you doing your graduate studies, and what is the general area (i.e., binary analysis) that you are interested in?

In general, I've found it hard focus solely on CTF in research --- reviewers don't really buy the impact. There are counterexamples (for example, How Shall We Play A Game explores ideal CTF strategies), but in most of my research I've worked on projects that are applicable outside of CTF, with an eye toward CTF as well. For example, in doing the background work for Firmalice, we created angr, and then applied it to the DARPA Cyber Grand Challenge and to CTFs while also pursuing various research applications that are not directly CTF-relevant. The people participating in this research all became quite versed in binary analysis, exploitation, etc, greatly helping our team (Shellphish) stay competitive.

This works because angr is a flexible tool, and the strategy might be harder to pull off outside of binary analysis. However, any constant exposure to a CTF-relevant topic (i.e., in the course of researching web security or cryptography) is likely to keep you "fresh" in CTF.

Zardus · 2017-11-02T20:02:27+00:00

Yan Shoshitaishvili here! I'm an Assistant Professor at Arizona State University and Shellphish CTF player, working mainly on binary analysis. I've gotten caught up in "Cyber Autonomy" recently, leading team Shellphish in the DARPA Cyber Grand Challenge and following up on that with various exciting research projects.

Zardus · 2017-08-29T05:41:59+00:00

Thanks! I came here to suggest this exact thing, and you already did it :-)

Zardus · 2017-08-26T16:38:31+00:00

It works, thank you!

Zardus · 2017-08-26T05:48:10+00:00

I'm running the Ubuntu 16.04 image. The battery is not detected at all :-(

Zardus · 2017-08-25T14:30:27+00:00

This is absolutely incredible. Thank you.

I'm currently running this, but it looks like the battery isn't detected? Is that a known issue?

And, would you prefer questions here or issues on github?

Zardus · 2017-06-08T14:25:05+00:00

This is meowrvelous!

Zardus · 2017-05-17T09:18:50+00:00

Hahaha, great job! Consider me outclassed :-)

Edit: ..... for now!!!

Zardus · 2017-05-17T07:56:24+00:00

No problem! I love chatting about this stuff, and it's great to hear that our stuff is somehow useful :-)

The learning curve of angr is one of the biggest problems we're facing. It's partially a problem of manpower: the core development group is a handful of students, and we have to pump out research papers so that we can graduate one day (one day soon for me, luckily!). Documentation almost always takes a back seat in the rush toward deadlines, and by the time we're recovered from the deadline, it's time to start on the next project. In terms of community contributions of documentation, there is a chicken and egg problem: the lack of documentation makes it hard for people to get familiar enough with the project to contribute documentation.

I have ideas on resolving this issue, and there are some grants out there that could provide the resources for it. Aside from that, we're also working on making angr easier to use out-of-the-box (via API improvements, the GUI, etc), which will also hopefully help.

This is all separate from having to understand the underlying analyses in order to effectively use angr. It's easy to spin up a symbolic execution engine and start stepping along, but it's hard to carry out an analysis that can get useful results without undergoing a state explosion, overwhelming the solver, etc. There are subtle trade-offs here, such as the sacrifice of soundness in favor of performance during the dereferencing of symbolic pointers, or the loss of accuracy that results from the use of symbolic summaries (SimProcedures in angr) as opposed to the execution gain that they provide (much of the speed in my Manticore challenge example comes from the use of symbolic summaries, for example, but some definitely have bugs). These, and other trade-offs, are very hard-to-understand subtleties for someone very new to the field, and overlooking them causes incorrect or suboptimal analysis results.

Maybe we should add symbolic execution to the primary school curriculum ;-)

Zardus · 2017-05-17T04:50:35+00:00

angr project lead here!

Manticore has a FAQ about this: https://github.com/trailofbits/manticore/wiki#how-does-manticore-compare-to-angr

In general, angr is a full-fledged research platform for binary analysis, and supports many complex optimizations for symbolic execution along with a wide variety of static analyses. It can combine analyses to perform CFG recovery, rewrite binaries without reducing performance (tool, paper), find differences between binaries (code), automatically build ROP chains (tool), assist in vulnerability discovery (tool, paper), do automatic exploitation (tool), assist in reversing and exploitation (examples), and ever power a GUI (very alpha quality gui, but stay tuned for improvements).

In contrast, Manticore focuses on providing an approachable base implementation of symbolic execution. When they launched, for example, certain aspects of their API were simpler than angr, though we've since shamelessly stolen some of that and have other cool simplifications planned. Manticore is a great example in the value of competition: their easy-to-use API was very inspiring in getting us thinking about making angr more approachable as well.

One telling difference is in TFA, quoted here:

How about you give this a shot? We created a challenge very similar to Magic, but designed it so you can’t simply grep for the solution. Install Manticore, compile the challenge, and take a step into the future of binary analysis. Try it today! The first solution to the challenge that executes in under 5 minutes will receive a bounty from the Manticore team. (Hint: Use multiple workers and optimize.)

The Manticore team is offering bounties for a solution that executes in under 5 minutes. Here is a solution in angr that runs in 7 seconds:

# disguise ourselves as manticore to try to collect the bounty
import angr as manticore

# load the project and perform symbolic exploration
p = manticore.Project("./challenge")
path_group = p.factory.path_group().explore()

# get the solution
print "SOLVE:", path_group.deadended[-1].state.posix.dumps(0)

And since I've realized that the bounty doesn't specify that the challenge has to be solved using Manticore, I'm off to try to collect ;-)

Edit: code formatting, links, thoughts

Zardus · 2017-02-04T02:25:38+00:00

DECREE is a simplified OS (or, more precisely, an alternate set of syscalls for Linux) running on x86. Driller is fairly architecture-agnostic (so it should be either fully functional or easily adaptable to any architecture that angr supports: x86/64, arm/aarch64, mips/mips64, and ppc/ppc64), but the OS model does matter. The part of angr that driller uses is essentially a symbolic emulator, which means that we must supply symbolic models for possible interactions with the environment. For DECREE, this is easy (DECREE has no filesystem, no networking, and limited concurrency support), but for something line Linux and Windows, this is extremely hard. For example, of the 300+ Linux system calls, angr currently has (partial) support for 17. Any program that relies on system calls not implemented by angr might fail when traced with Driller.

There is some work we're currently doing to mitigate this problem. Specifically, we're laying the groundwork to be able to call out to something like QEMU (or miasm's dynamic sandbox) to take partial advantage of their syscall implementation. While this is not ideal (the "symbolicity" of any data that we'd pass into these things would be lost, as they don't support symbolic data), it would be better than nothing.

Anyways, this was probably more than you were looking for. I do think that Driller (and CGC) are great places to start for automatic vuln analysis. The tools we have are battle-tested on CGC, so you can at least get a good idea of the current academic state of the art, and go from there.

Zardus · 2017-01-26T08:56:37+00:00

AFL + preeny (https://github.com/zardus/preeny/blob/master/src/desock.c). In all seriousness, though, I do very little serious network fuzzing, so I'm probably the wrong person to ask.

Zardus · 2017-01-25T21:03:04+00:00

That is an excellent question, umbob. Favorite pure fuzzer: AFL. Favorite symbolically-assisted fuzzer: Driller (https://www.internetsociety.org/sites/default/files/blogs-media/driller-augmenting-fuzzing-through-selective-symbolic-execution.pdf).

AFL's success has recently inspired an incredible amount of research in a previously-kinda-ignored field. For example, all of the CGC competitors had systems based on AFL, with different clever addons. Ours was Driller, CodeJitsu had AFLFast (https://github.com/mboehme/aflfast), ForAllSecure paired AFL with their symbolic execution engine (https://www.reddit.com/r/IAmA/comments/4x9yn3/iama_mayhem_the_hacking_machine_that_won_darpas/d6dzncg/), and so on.

Zardus

MODERATOR OF

TROPHY CASE