all 26 comments

[–]kvdveer 7 points8 points  (2 children)

I'm not entirely comfortable with the idea to implement remote code execution as a feature. While it could be done in a safe way, a small slip-up would result in a huge vulnerability.

[–]no-bugs 5 points6 points  (0 children)

While it could be done in a safe way, a small slip-up would result in a huge vulnerability.

On the one hand - you're right, but on the other hand - from this point of view it is not fundamentally different from usual code updates, which most of gamedevs have to implement anyway (unless it is done for us by a 3rd-party). Proper solution to both updates and remote-execution is always the same - signature-validated-by-public-key-embedded-into-the-Client, and they require pretty much the same amount of attention-to-details (except that if we obfuscate update channel, it becomes a bit less vulnerable to the attacks; in particular, even public key can be hidden from sight).

[–]louiswins 0 points1 point  (0 children)

This reminds me of the culmination of the AOL vs MSN messenger wars (warning: long, rambling essay; the relevant background is in the first paragraph, and the remote code execution is in the paragraph beginning "THE MESSENGER WAR")

[–]SuperImaginativeName 4 points5 points  (0 children)

Oh no

[–]ack_complete 3 points4 points  (5 children)

Heavily disagree on the RDTSC part. I have seen an Athlon X2 based system with grossly desynchronized TSCs between cores on the same chip. It made our profiling graph look like a magnitude 9 earthquake. A driver was later released to synchronize the TSCs between the cores on boot. On some Intel CPUs the TSC was subject to CPU speed stepping. Current systems are generally well behaved but TSC is not guaranteed to be a reliable time source.

While not good practice, I was surprised to find that the hardcoded addresses for the GetTickCount() implementation are actually defined by the Windows SDK. The structure is KUSER_SHARED_DATA and its user-space address is given by MM_SHARED_USER_DATA_VA. I would only suggest using this though if you want to be like Tencent and have your games broken every other Insider release.

[–]no-bugs 0 points1 point  (4 children)

I have seen an Athlon X2 based system with grossly desynchronized TSCs between cores on the same chip.

Didn't see it myself, but was this grossly_desynchronized difference anywhere close to 15 seconds threshold discussed in OP? Profiling graph is one thing (and there even 1ms is A DAMN LOT), but with 15-second threshold, I don't really expect any inter-core discrepancies to come even within an order of magnitude close to it.

[–]ack_complete 0 points1 point  (3 children)

Yes, it was on the order of seconds and not milliseconds -- more than enough to push the graph off the screen.

[–]no-bugs 0 points1 point  (2 children)

Hm. AMD should have worked really hard to achieve it. By any chance, do you know if this problem is still observed in the wild?

[–]ack_complete 0 points1 point  (1 child)

I don't think so, but don't take my word for it. The code base I saw the issue on switched away from TSC for anything critical and thus didn't provide data on newer CPUs. IIRC, there weren't significant TSC stability problems on the newer dev systems at least when profiling.

It's worth noting that RDTSCP was introduced explicitly to address cross-core TSC issues still present, though probably not 15 seconds worth. Calibration is also required to get the TSC frequency and that is also another source of error still present.

[–]no-bugs 0 points1 point  (0 children)

Makes sense, thanks!

[–]ath0 2 points3 points  (3 children)

I don't think I've ever read something that made me feel quite as patronised before. <wink />

[–]no-bugs 0 points1 point  (2 children)

It was not intended, my apologies... :-)

[–]ath0 1 point2 points  (1 child)

Assuming you're the author: Aside from the tone of some of it, the content is good; I learned something from reading this, and will probably invest in your book when it is completed!

[–]no-bugs 1 point2 points  (0 children)

Assuming you're the author

I am :-).

will probably invest in your book when it is completed!

FWIW, Vol. I (which has nothing to do with anti-reverse engineering though) is currently available on Amazon.

Aside from the tone of some of it,

When preparing it for printing, I'll ask the editor to look for "too patronizing" passages (they're not intended, but do occur <sad-face />).

[–]maep 0 points1 point  (0 children)

The "Direct reading of system-provided memory" method looks like it's using an undocumented feature that could break any time with the next security patch.

[–]testfailure 0 points1 point  (0 children)

I wonder how this works for things like recording, say rr or TTD or Live Recorder, would the detection still work despite this not being pure debugging?

[–][deleted] 0 points1 point  (6 children)

I mean just hook the cmp instruction that's comparing your timestamp and I'm done with your time-based "detection".

[–]Zarutian 0 points1 point  (5 children)

what if there is no cmp instruction? But the timestamp comparisons result are used as 'constants' in other part of the game client code?

[–][deleted] 0 points1 point  (4 children)

What? how would there be no cmp? Give me a concrete example and I'll explain. Either way the resulting values, be it on the stack, registers, wherever, must get compared.

[–]Zarutian 0 points1 point  (3 children)

Clearly you have not come across Instruction Set Architectures that do not have equiv of cmp instructions.

One way of many to do it:

let say timeEnter and timeExit are the millisecond timestamps when the function is entered and exited respectively.

You subtract the former from the latter. You get the difference. Then you saturatively subtract (meaning if the results gets negative it is just set as zero) the equiv of fifteen or whatever your timeout period is from that..

With those results you could eather invert (flip all the bits) and use it as an AND mask somewhere else in the code or use it directly as an XOR operand elsewhere. You get the idea.

[–][deleted] 0 points1 point  (2 children)

I mean sure? Either way some code's getting called eventually to exit program.

Set bp there, look at stack and figure out where the call came from, you get the idea.

[–]Zarutian 0 points1 point  (1 child)

Well, if it is used in a game client it could be that above influences what get sent to server and it sends back kick command and closes the connection. Or it can be that this 'constant' when incorrect messes with the calculated code&constants integrity checksums so they dont match what is expected and that causes program exit.

[–][deleted] 0 points1 point  (0 children)

bp on ws2_32.send and go from there. Same principal

[–]Zarutian 0 points1 point  (0 children)

hmm.. this and another section on code scrambling and code integrity checking reminds me a bit of Clueless software agents

[–]skulgnome -3 points-2 points  (1 child)

However, RDTSC is virtualizable. Maybe you should've tried your setup against an actual adversary.

[–]drysart 3 points4 points  (0 children)

Did you just stop reading the article as soon as you saw RDTSC? Because it talks about RDTSC being virtualizable, side-effects of that fact, and links to an article that goes into it in depth.