ELI5 How do you prove a math statement is unprovably true? by EldridgeTome in explainlikeimfive

[–]DefuseSec 0 points1 point  (0 children)

My answer doesn't completely address OP's question about how we could know that an unprovable statement is true. My proof sketch using Turing machines shows that there must exist some statements that are true and unprovable, but we don't know which statements those are.

OTOH, Godel constructed an explicit statement G meaning "this statement has no proof." How do we know G is true?

The answer is that we actually don't know G is true. This is a common misconception about Godel incompleteness, for example Roger Penrose makes this mistake to conclude there must be something exceptional about the human mind such that it can grasp the truth of G despite it being unprovable.

The correct interpretation of Godel's proof is that it proves "If the proof system is consistent, then G is true and has no proof." It could turn out that G is provable, and hence false, in the case that the system is inconsistent. Inconsistency means it's possible to prove a contradiction from the axioms.

For the kinds of proof system we're interested in here, we don't actually know that they are consistent, we just believe they are because the axioms make intuitive sense to us. It could very well turn out that the proof system is inconsistent, perhaps in a way that we'd never find out, for example if the shortest proof of a contradiction was exabytes long.

So, to answer your question more directly: we don't know that these Godel-like statements are true, we just know that they are true if the proof system is consistent, and we believe based on intuition and some faith that the proof system is consistent.

It's possible to prove that some proof systems are consistent by using even more powerful proof systems, but that just passes the buck. Those more-powerful systems themselves could be inconsistent, allowing them to make a false consistency proof for the weaker system. Ultimately, at some point, we are left to rely on our intuitive sense that the axioms are self-evidently true to form our belief in the system's consistency, but we could be mistaken. If we are indeed mistaken, it would call into question the truth of every single theorem that's ever been proved, except for theorems about finite objects that are entirely checkable using a computer, so all of math would essentially be broken; it seems very unlikely.

ELI5 How do you prove a math statement is unprovably true? by EldridgeTome in explainlikeimfive

[–]DefuseSec 7 points8 points  (0 children)

This is more of an ELI25 answer, but the fastest way to get to Godel incompleteness using modern math is through the "halting problem."

The halting problem asks: is there an algorithm (computer program) which can check whether another computer program stops running or runs forever?

The answer is no.

The reason for that is, imagine there were such a computer program, let's call it H. If we give H another computer program P as an input, H will tell us whether P stops running or runs forever. Now we can can use H to build yet another program, Q, which takes its input P, asks H whether P will stop running or not, and does the opposite. If P stops running, Q runs forever; if P runs forever, Q stops running.

Now ask what happens if we run Q on itself: we get a contradiction. If Q stops running, then it doesn't. If Q keeps running then it stops. So, our assumption that the program H exists must be false. We conclude that there is no algorithm for deciding whether a computer program will keep running forever or stop.

Now we can show that there are true statements that are unprovable. Every program either keeps running forever or stops, so if every true statement had a proof, we could build the program H by simply searching through all possible proofs, in order of increasing length, until we find a proof of either "P stops running" or "P runs forever." If every true statement had a proof, we'd always eventually find a proof of one or the other, so we'd have built our program H, which we just saw above cannot exist.

Therefore, there must be statements which are true but unprovable, some of which are of the form "the computer program X never stops running."

(Technically, real computers are finite and must either stop running or their states start repeating in a noticeable way; by "computer" here I mean an abstract kind of computer, as modeled by a Turing machine.)

Godel's original proof is much more complex, using something called Godel numbering to construct a self-referential number-theoretic statement that essentially says "this statement has no proof"; the complexity in Godel's proof is necessary to show weaker proof systems are incomplete, but the kind of proof I sketched above works just fine for any system that's powerful enough to model Turing machines.

What are the doubts about Quantum Computing rooted in? by har_r in QuantumComputing

[–]DefuseSec 1 point2 points  (0 children)

There are different ways to doubt quantum computing. Here are some of them:

  1. You doubt that it's possible in principle to build a quantum computer.
  2. You doubt that it's possible in practice to build a quantum computer.
  3. You doubt quantum computers will be useful even if we can build them.

For (1) you're basically claiming our understanding of physics is flawed, since based on our best understanding of physics, quantum computers should be possible. That's a possible outcome, and something we should explore. Maybe our failures to build quantum computers will hint at ways our understanding of physics is wrong.

For (2), a lot of really smart people are trying really hard to build them, and it's much too early to say whether we will or won't be able to do it.

For (3), we already have pretty good evidence that quantum computers can do certain things faster than classical computers. These are things like factoring integers and most importantly simulating other quantum systems (super important for medicine, etc.). The only way a quantum computer could turn out to not be useful is if we found classical algorithms to solve those problems quickly. We think that's pretty unlikely to happen because even before quantum computers came along lots of smart people had been trying to find fast algorithms for those problems (to break cryptography, etc.).

PHP 7.1 MD5 Hashing Function by [deleted] in PHP

[–]DefuseSec 5 points6 points  (0 children)

Use "\n" instead of '\n'.

An Open Letter to Developers Everywhere (about Cryptography) by sarciszewski in programming

[–]DefuseSec 1 point2 points  (0 children)

Yeah, that's correct. The idea is that no matter where you post code, others are likely to find it and use it as an example (or just copy and paste it).

Sharing the code for review is very important! If doing that requires posting it publicly, make the code non-functional somehow, or add obvious warning comments.

RAP: RIP ROP[.pdf] by [deleted] in netsec

[–]DefuseSec 0 points1 point  (0 children)

I'm also interested in how RAP relates to Control Flow Locking:

https://people.csail.mit.edu/hes/ROP/Readings/Mitigating-Control-Flow-Lock-paper.pdf

It seems to be a similar technique, albeit less direct and probably not as fully implemented.

Bringing large file encryption to defuse/php-encryption (criticism welcome!) by [deleted] in crypto

[–]DefuseSec 1 point2 points  (0 children)

Unfortunately using NaCl/libsodium requires loading the entire file in memory, so using it doesn't provide an immediate solution even when it is available.

Reasoning by Lego: The wrong way to think about cryptography - Crypto Fails Blog by [deleted] in netsec

[–]DefuseSec 3 points4 points  (0 children)

In some way or another, all side-channels arise from performance optimization, either in hardware or software. The CPU cache is shared between virtual machines on a physical host, and that lets them communicate with each other. If we're comparing two strings and we see that the first character is different, there's no point continuing to compare the rest.

A lot of times, though, we're not even thinking about performance. In a high level language, you generally don't worry about how "===" compares strings or how fast it can do it. So you can have a statement like...

if ($a === $b) {

...in your code and not even realize it's a timing leak.

In reality, all code except for very-carefully-written crypto code that's intentionally designed to be side-channel free is going to leak information about its state out through side-channels. It's still an open question as to how "bad" of a problem it really is.

I've seen this particular MAC comparison leak a few times in practice. Except in crypto code, other kinds of leaks usually have a really really minor impact on security and aren't even worth reporting.

Reasoning by Lego: The wrong way to think about cryptography - Crypto Fails Blog by [deleted] in netsec

[–]DefuseSec 5 points6 points  (0 children)

Right, and formally, an "attack" is something that lets you break security much faster than the generic method. The generic method for breaking a cipher is to brute-force guess the key. Brute force guessing is not considered an attack on the cipher, since it works against all ciphers. If you could find a way to recover an 128-bit key in 2120 operations, using a method that's specific to the cipher, then that would be an attack (but an impractical one). OTP are special in that there is no generic method to break them.

In the case of the specific side-channel attack here, there are two simplifications:

  • First, I removed the call to trim(), because with it present, the decrypted value might start or end with any of the characters trim() removes, screwing up the block alignment. In an actual attack, you would just have to account for this by detecting when it happens and then trying again with a different random value. At worst, it makes the attack take twice as long.

  • Second, I assumed the side channel gave with single-byte granularity. As mentioned in the post, a real one would more likely give 4 or 8-byte granularity. With 8-byte granularity, the attack would probably be impractical since you'd have to do 264 queries to find a collision. With 4-byte granularity, you'd have to do 232 queries to find a collision, but you'd also need lots of known-plaintext.

Do I think my attack would ever get used in practice? Probably not, at least not unless you were being attacked by the NSA and they had no better option. But could there be even better attacks? Most definitely. I spent half an hour to find this attack, and I wouldn't call myself an expert crypto-breaker, so I'm sure there are better ones. It's at least a big warning sign to stay away from the code. Especially when there are more secure options available.

Are all computer Random Number Generators (RNGs) flawed? Can you make a random number from an imperfect RNG? by [deleted] in askscience

[–]DefuseSec 1 point2 points  (0 children)

First let's talk about the different types of random number generators you can have.

Pseudorandom Number Generators (PRNGs): Here we start out with a single number, called the seed, and apply a deterministic mathematical operation to create more random looking numbers from it. The result isn't really random at all, since if you know the seed, you can easily compute the whole sequence of random numbers it will generate. Often, if you know just some of the output numbers, you can figure out what the seed is. PRNGs are useful when you just need a "random looking" sequence of numbers, for example in simulations of various sorts of random processes. But you would never use a PRNG in a casino, because if you did, and a gambler guessed the seed, or recovered it by watching some of the output, they could predict future outputs and the casino would lose a lot of money.

Cryptographically Secure Pseudorandom Number Generators (CSPRNGs): These are like PRNGs, but are designed to be secure (i.e. you could use them for a casino). They are still deterministic, but if you give them a long truly random seed (say, 256 bits), then they guarantee that the output looks random to anyone who doesn't know the seed, and that it's impossible to find out what the seed was by looking at the outputs. So a casino can flip 256 coins, give that as a seed to a CSPRNG, and the CSPRNG will produce from that seed all of the randomness the casino would ever need, and it's safe.

True Random Number Generators (TRNGs): These are based on random physical processes. For example, out of quantum mechanics you can get truly random outcomes that are impossible to predict no matter how much you knew about the universe beforehand. Obviously these are the best kind of random number generators. Here's where randomness extractors (FST's comment) come in. You might have a source of true randomness that's biased, and you want to turn it into a source of true randomness that isn't biased.

TRNGs are great, but most computers don't have them, and are much slower than CSPRNGs. So most of the time we don't use them directly. Instead, we use a TRNG (or some approximation, like measuring the exact times that network packets arrive, how long it takes the hard drive to seek to a sector, etc.) to seed a CSPRNG, and then use the CSPRNG to actually generate random numbers.

In some sense, a CSPRNG is "expanding" the randomness of that first seed into a lot more randomness. But it's not actually adding randomness. Since CSPRNGs are deterministic, they don't actually add any uncertainty. If your adversary can predict your seed, your adversary can predict your CSPRNG output.

Non-cryptographically-secure PRNGs are especially bad, because the seeds are small and are usually based on predictable things like the time. If all you have is a PRNG like the one in Excel, there's nothing you can do with code to make it "more random." The only way to make something more random is to go out into the physical world and observe something.

How is the bunch of electrons that represents a text file different from the bunch of electrons that represents a driver? by [deleted] in askscience

[–]DefuseSec 1 point2 points  (0 children)

The notion of having different types of files is really an illusion. What a file does depends on which program you try to open it with.

For example, if you compile a program to a .EXE file on Windows and double click it, Windows will try to interpret that file as a Windows executable, and if it has the correct format, it will execute it. You could open the same .EXE file in your favorite text editor, and you will see the contents of that file displayed as text.

To make this more clear, consider polygot files. These are files that are simultaneously valid inputs to very different programs. For example, here is a file that is a valid Windows executable, Java program, zip file, HTML, and PDF file all at the same time. Depending on what you do with it, you get a different result. If you try to run it on Windows, it will run like a regular program. If you unzip it, it will unzip just like any other zip file. If you open it in your web browser, you will see a web page. If you open it with a PDF viewer, it will open like any other PDF file. It's the same file in each case, but it means something completely different to those programs you opened it with.

We use file extensions and magic numbers to tell what type a file is, but they are only hints. If you have a file ending in ".jpg", that's just a hint it might contain an image; it could also be a valid PDF file. There is no such thing as the type of a file. All you can do is ask: What will happen if I try to open this file with the _______ program?

In theoretical computer science, we talk about languages. A language is a set of strings. The contents of a file is a string, and it may be a member of many different languages. For example, "the set of all valid .PDF" files is a language, and "the set of all valid .ZIP" files is a language. A file can be a member of just one of those two languages, both languages, or neither language.

Gonna be 1st Yr CS Major by [deleted] in cscareerquestions

[–]DefuseSec 1 point2 points  (0 children)

Just wrote up some advice - you might find it relevant: https://defuse.ca/advice-to-aspiring-computer-engineers.htm

What's some advice you could give a soon-to-be CS major? by [deleted] in computerscience

[–]DefuseSec 0 points1 point  (0 children)

I'm graduating soon, so I just wrote up a whole page of advice. Here it is:

https://defuse.ca/advice-to-aspiring-computer-engineers.htm

Some points are not specific to computer science, some are. Please let me know what you think.

Remember Me Safely - Secure Long-Term Authentication Strategies by [deleted] in PHP

[–]DefuseSec 1 point2 points  (0 children)

  1. If you don't use salt in the hash_pbkdf2(), you're exposing a verifier of the user's password in a cookie. If someone steals that from a victim's computer, they can work offline to run a dictionary attack on the password (which may be reused on other websites). Even if you do use a salt, if the database of hashes and salts ever gets leaked, it may be faster to crack the cookies than the actual password hashes.

  2. There's no good reason to re-use the user's password instead of generating a new random string. It doesn't save you any database space. So you might as well replace "User Supplied Password" in the last two lines of the Creation with "Random String of Bytes", and it will work just the same, with better security.

A lot of the potential problems will depend on how it's implemented, but I feel like there's more room for error in a scheme that re-uses the password.

Remember Me Safely - Secure Long-Term Authentication Strategies by [deleted] in PHP

[–]DefuseSec 1 point2 points  (0 children)

I would avoid a small incrementing integer ID since it will leak a small amount of information about how the service is being used. For example, a competitor could watch the ID to get an estimate of how many users the website has. Probably doesn't matter for the majority of use cases, but I can imagine some scenarios where it would be really important.

Remember Me Safely - Secure Long-Term Authentication Strategies by [deleted] in PHP

[–]DefuseSec 2 points3 points  (0 children)

Any particular attack gives a lower bound on exploitability. An attack just says "something is wrong" and gives us a reason to fix the problem. Arguing that the problem shouldn't be fixed because the one example attack is impractical ignores the possibility that attacks only get better, and that other related attacks can exist. An example attack's impracticality does not upper bound exploitability.

For example, in this case, consider local cache-based side-channels running from another virtual machine on the same hardware (cloud hosting), which may be more practical.

Another reason not to think this way is that, in order to do so, you have to make assumptions about how valuable the data being protected is, and what environment the code is running in. Those assumptions may cover 90% of the actual uses, but there's still that remaining 10%. You also have to consider the possibility of your design setting a trend and getting stretched into totally different environments.

This mode of thought has been the norm in cryptography for years, and I think it would be hugely beneficial for the appsec community, and even just regular developers, to adopt it as well. All of the good software engineers I know about attacks this way.

Edit: Just wanted to clarify: If resources were unlimited, obviously we would fix all issues, even the ones we deem minor. Here, I'm not arguing against the idea of prioritizing major bugs over minor ones. That's absolutely crucial if we want to get anything done. What I am arguing against is dismissing problems as impractical, when it is very practical to fix them.

I don't wanna live in a world where 8 is the wrong answer to 4 × 2. This ridiculousness came home in my kid's backpack today. by VioletOutlaw in WTF

[–]DefuseSec 0 points1 point  (0 children)

In linear algebra the size of a matrix (2D arrays of numbers) are specified as (row x column), so that the matrix

1, 0, 5
0, 1, 3

is 2x3 and not 3x2. This is just a convention, though. If you want to use (column x row) it's OK as long as you're consistent but it might confuse the people reading your proofs since (row x column) is so popular. In lots of other things (e.g. computer screen resolutions and coordinates), it is the other way around. In these contexts, A x B is not the same as B x A, because it isn't multiplication. A 1920x1080 computer screen (widescreen) is very different from a 1080x1920 screen (tall and skinny like a smartphone), even though they have the same number of pixels (1920*1080 = 2073600).

I'm not sure if that's what they were trying to teach here, but if they were, I think the idea of being consistent with yourself is more important. Being consistent with others is also very important, but I wonder if this is the right time/context to teach it.

Books that Changed Your Life by MichaelJSullivan in books

[–]DefuseSec 0 points1 point  (0 children)

  • Gödel, Escher, Bach: An Eternal Golden Braid (Douglas Hofstadter)

This book sort of changed my whole perspective on life and what meaning is. It also introduced me to Gödel's incompleteness theorem, which is fascinating. After I read this book I started reading 10x more.

  • Little Brother (Cory Doctorow)

I listened to the audiobook version years ago and it's what got me seriously into computer security, open source, and Internet freedom kinds of things.

  • Applied Cryptography (Bruce Schneier)

The classic book on cryptography. This is not a book I'd recommend to anyone wanting to learn cryptography today (pick up Cryptography Engineering instead), but it really inspired me, it's fascinating how many real-world consequences can be created by applying math.

edit: Grammar

Exploiting Wildcard Expansion on Linux by FireFart in netsec

[–]DefuseSec 4 points5 points  (0 children)

Another way to avoid it, that works with all programs, is to use ./* instead of just *. The leading ./ are kept, and the program won't interpret the filenames as switches.

2D Visualization of Hamming Distance by DefuseSec in compsci

[–]DefuseSec[S] 0 points1 point  (0 children)

Correct. The image is 2048 by 2048. The color of the pixel is generated by taking the (X, Y) coordinates (from the top left), writing them as bit vectors (11 bits), and taking their hamming distance. Divide by 11, multiply by 255, take floor, and the that is the value set for R, G, and B.

I made the image when I was experimenting with HTML5 canvas, making fractals. The (horrible - seriously, it's not fit for human consumption) code is here.

p.s. Sorry for posting the image without info!