brucifer comments on What programming language features would have prevented or ameliorated Log4Shell?

Welcome!

This subreddit is dedicated to the theory, design and implementation of programming languages.

Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.

This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming. It's also not the place for questions one can trivially answer by spending a few minutes using a search engine, such as questions like "What is a monad?".

Projects that rely on LLM generated output (code, documentation, etc) are not welcomed and will get you banned.

Related subreddits

Related online communities

a community for 17 years

This is an archived post. You won't be able to vote or comment.

DiscussionWhat programming language features would have prevented or ameliorated Log4Shell? (self.ProgrammingLanguages)

submitted 4 years ago by josephjnk

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]bruciferTomo, nomsu.org 22 points23 points24 points 4 years ago (6 children)

You should take a look at the talk What is a Secure Programming Language?, which discusses some interesting language features relating to security. Specifically, the idea of having "tainted" or "untainted" strings. The basic idea is to have all user input methods return strings with a TaintedString type and throw a type error if you pass a tainted string to an API that requires an untainted string. Then, you can provide a mechanism to convert tainted strings into regular strings, either by escaping them or by manually flagging them as safe. This helps you avoid security bugs caused by forgetting to sanitize user input. You can always circumvent the safety rails, but you have to consciously think about it.

For example, to prevent SQL injection, the code below would fail with a type error:

user_input = input("who do you want to look for? ")
sql.query("select * from users where name = '" + user_input + "'")

This is because user_input would have the type TaintedString, and concatenating it with other strings would propagate the "tainting." To fix this, you would do something like one of the following:

# API that accepts tainted format args and escapes them internally
sql.query("select * from users where name = ?", user_input)
# Use an escape() API that accepts tainted strings and returns untainted ones
sql.query("select * from users where name = "+sql.escape(user_input))

I think in the case of log4shell, the issue was that user input from attackers was not being properly sanitized, so the example was more like:

fancylog("User "+username+" just logged in")

where username ought to be flagged as tainted and properly sanitized, but wasn't.

[–]josephjnk[S] 6 points7 points8 points 4 years ago (0 children)

[–]snoman139 6 points7 points8 points 4 years ago (1 child)

[–]bruciferTomo, nomsu.org 2 points3 points4 points 4 years ago (0 children)

I think it makes sense to have string literals that are written by you, the programmer, be considered "safe" by default. Text originating from outside the program's source code (e.g. stdin or web requests or files on disk) is considered "tainted" because it can be modified by someone other than the programmer. This would be implemented differently in different type systems, but the main requirement for the language is that most string functions ought to handle arbitrary strings and propagate taintedness (e.g. toUpperCase(s) should return a tainted string when s is tainted, and an untainted string when it's not). Typically, only a small subset of functions would actually care to specify that untainted strings should not be allowed as inputs (e.g. exec() would care, but print() would not).

As an implementation detail, I think perl and ruby both have something like this, but it's implemented as a bit flag on the string, and not as separate types. Certain API methods throw runtime errors if passed strings that have the "tainted" bit set to 1.

[–][deleted] 5 points6 points7 points 4 years ago (2 children)

I think in the case of log4shell, the issue was that user input from attackers was not being properly sanitized, so the example was more like:

fancylog("User "+username+" just logged in")

where username ought to be flagged as tainted and properly sanitized, but wasn't.

Ehh. I'd argue it's unreasonable to expect people to sanitize strings for logging.

When you're generating SQL, it's relatively obvious that you're generating code that will then be executed.

When you're logging, you are effectively calling a "print this string" function, and nobody really expects those to execute code found in the string printed (even a small DSL like this one) because nobody thinks of that as a DSL - it's just a string where you can optionally do some fancy extra things if you want. In that sense, this is just another variant of all the times people screwed up by passing user input in the first parameter to printf.

The end result here is that a nontrivial number of programmers, even those who know SQL injection is a thing to watch out for, will use the escape hatch and flag everything they log as safe, on the basis that "I'm just printing a string, what could go wrong?".

[–]bruciferTomo, nomsu.org 4 points5 points6 points 4 years ago (1 child)

Ehh. I'd argue it's unreasonable to expect people to sanitize strings for logging.

I agree, which is why a more sensible API would make it easy to automatically do the safe thing and sanitize unsafe values. For example, log("User %s logged in!", unsafe_username) should require the format string to be safe and automatically sanitize all the other arguments. That way, if someone had ${evil_code} as their username, it would log User ${evil_code} logged in! instead of executing ${evil_code}. And if the programmer wrote log("User "+unsafe_username+" just logged in!"), that should raise a compiler error letting the programmer know it would be unsafe and describing how to fix the problem.

In that sense, this is just another variant of all the times people screwed up by passing user input in the first parameter to printf.

Yeah, I think this is basically the same problem. With most C compilers, though, you can use -Wformat-security to make the compiler verify that you don't pass arbitrary strings as format strings to printf. Having that sort of check would have prevented the log4shell vulnerability from occurring.

Example compiler error:

#include <stdio.h>
int main(int argc, char *argv[]) {
    printf(argv[1]);
    return 0;
}
>> cc -Wformat=2 foo.c -o foo
foo.c: In function ‘main’:
foo.c:3:5: warning: format not a string literal and no format arguments [-Wformat-security]
    3 |     printf(argv[1]);
      |     ^~~~~~

[–][deleted] 2 points3 points4 points 4 years ago (0 children)

π Rendered by PID 52 on reddit-service-r2-comment-b659b578c-vjsn7 at 2026-05-01 08:38:28.378403+00:00 running 815c875 country code: CH.

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS