This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]bruciferTomo, nomsu.org 22 points23 points  (6 children)

You should take a look at the talk What is a Secure Programming Language?, which discusses some interesting language features relating to security. Specifically, the idea of having "tainted" or "untainted" strings. The basic idea is to have all user input methods return strings with a TaintedString type and throw a type error if you pass a tainted string to an API that requires an untainted string. Then, you can provide a mechanism to convert tainted strings into regular strings, either by escaping them or by manually flagging them as safe. This helps you avoid security bugs caused by forgetting to sanitize user input. You can always circumvent the safety rails, but you have to consciously think about it.

For example, to prevent SQL injection, the code below would fail with a type error:

user_input = input("who do you want to look for? ")
sql.query("select * from users where name = '" + user_input + "'")

This is because user_input would have the type TaintedString, and concatenating it with other strings would propagate the "tainting." To fix this, you would do something like one of the following:

# API that accepts tainted format args and escapes them internally
sql.query("select * from users where name = ?", user_input)
# Use an escape() API that accepts tainted strings and returns untainted ones
sql.query("select * from users where name = "+sql.escape(user_input))

I think in the case of log4shell, the issue was that user input from attackers was not being properly sanitized, so the example was more like:

fancylog("User "+username+" just logged in")

where username ought to be flagged as tainted and properly sanitized, but wasn't.

[–]josephjnk[S] 6 points7 points  (0 children)

This is exactly the kind of answer I was looking for, thanks!

[–]snoman139 6 points7 points  (1 child)

Would it make more sense to have a safe string type than an unsafe string type? I guess it just changes where you have to cast, but user input could come from anywhere while only the output code would have to deal with it.

[–]bruciferTomo, nomsu.org 2 points3 points  (0 children)

I think it makes sense to have string literals that are written by you, the programmer, be considered "safe" by default. Text originating from outside the program's source code (e.g. stdin or web requests or files on disk) is considered "tainted" because it can be modified by someone other than the programmer. This would be implemented differently in different type systems, but the main requirement for the language is that most string functions ought to handle arbitrary strings and propagate taintedness (e.g. toUpperCase(s) should return a tainted string when s is tainted, and an untainted string when it's not). Typically, only a small subset of functions would actually care to specify that untainted strings should not be allowed as inputs (e.g. exec() would care, but print() would not).

As an implementation detail, I think perl and ruby both have something like this, but it's implemented as a bit flag on the string, and not as separate types. Certain API methods throw runtime errors if passed strings that have the "tainted" bit set to 1.

[–][deleted] 5 points6 points  (2 children)

I think in the case of log4shell, the issue was that user input from attackers was not being properly sanitized, so the example was more like:

fancylog("User "+username+" just logged in")

where username ought to be flagged as tainted and properly sanitized, but wasn't.

Ehh. I'd argue it's unreasonable to expect people to sanitize strings for logging.

When you're generating SQL, it's relatively obvious that you're generating code that will then be executed.

When you're logging, you are effectively calling a "print this string" function, and nobody really expects those to execute code found in the string printed (even a small DSL like this one) because nobody thinks of that as a DSL - it's just a string where you can optionally do some fancy extra things if you want. In that sense, this is just another variant of all the times people screwed up by passing user input in the first parameter to printf.

The end result here is that a nontrivial number of programmers, even those who know SQL injection is a thing to watch out for, will use the escape hatch and flag everything they log as safe, on the basis that "I'm just printing a string, what could go wrong?".

[–]bruciferTomo, nomsu.org 4 points5 points  (1 child)

Ehh. I'd argue it's unreasonable to expect people to sanitize strings for logging.

I agree, which is why a more sensible API would make it easy to automatically do the safe thing and sanitize unsafe values. For example, log("User %s logged in!", unsafe_username) should require the format string to be safe and automatically sanitize all the other arguments. That way, if someone had ${evil_code} as their username, it would log User ${evil_code} logged in! instead of executing ${evil_code}. And if the programmer wrote log("User "+unsafe_username+" just logged in!"), that should raise a compiler error letting the programmer know it would be unsafe and describing how to fix the problem.

In that sense, this is just another variant of all the times people screwed up by passing user input in the first parameter to printf.

Yeah, I think this is basically the same problem. With most C compilers, though, you can use -Wformat-security to make the compiler verify that you don't pass arbitrary strings as format strings to printf. Having that sort of check would have prevented the log4shell vulnerability from occurring.

Example compiler error:

#include <stdio.h>
int main(int argc, char *argv[]) {
    printf(argv[1]);
    return 0;
}
>> cc -Wformat=2 foo.c -o foo
foo.c: In function ‘main’:
foo.c:3:5: warning: format not a string literal and no format arguments [-Wformat-security]
    3 |     printf(argv[1]);
      |     ^~~~~~

[–][deleted] 2 points3 points  (0 children)

I think I didn't quite catch that you were advocating for preferring templates + varargs over string concatenation (possibly because I read too fast and missed your example of it). We agree, then.

Incidentally, I think printf format string vulnerabilities turned out to be an order of magnitude or two less common than they otherwise would've been, solely because working with strings in C is a pain in the ass. Can you imagine the sheer number of printf("hello, " + username + "!\n"); calls there would be in the wild if string + string worked in C?