> This looks worrying, but if you read the issue thread it seems that this can only be triggered if you can edit the pattern string of the logger. This is something you can only do on the server itself (if made configurable) or in the build artefact that you deploy.
Plenty of existing code does things like:
log.info(“foo: “ + request.getFoo());
Rather than using fixed format strings and {} place holders. You’re not supposed to, but it’s far from uncommon.
Oh yes, I’ve seen several programs that spend ~30% of their CPU cycles formatting strings that are immediately thrown away because the log level is not high enough. Now you can also include vulnerabilities with no extra effort!
You would still be doing the job of concatenating together the error message, possibly rendering some complex objects to a string, that is then fed to the no-op. The point parent was making is that there is also a performance aspect to this, in addition to the security aspect.
> You would still be doing the job of concatenating together the error message, possibly rendering some complex objects to a string, that is then fed to the no-op.
Why would you still do be doing that work? If a value goes into a no-op, then the value isn't computed. (In theory - I'm sure it doesn't always work out 100% of the time in practice.)
That is actually a reason that a JIT compiler can prune where an AOT compiler can't. JIT can deoptimize when assumptions change.
One of the bigger wins with the JVM is assuming that a virtual method can be called statically if there are no derived classes. That's a huge win for every leaf class in the class tree. The assumption can change every time a class is loaded of course. The optimization is called devirtualization and it can be combined with inlining to get even bigger wins.
Fair point... but it seem like, even so, doing that to monitor whether a single integer variable might change sounds like a lot of added complexity. Are there other use cases that would help to justify it? Reducing the impact of failing to follow logging best practices doesn't seem like an obviously sufficient cause.
`"hello" + myFooObject.toString()` Can you eliminate that? It's impossible to say without knowing whether or not `myFooObject.toString()` is effectful. Maybe the JVM can make that assessment. Maybe not. I don't know, but it's not as trivial as "knowing what string concatenation does".
A straightforward way to do this is for Java to assume built in types like strings are side effect free, but custom types may not be. This would likely cover a good portion of log lines but not all, obviously.
The noop determination is made during runtime within the log method called. So, it's LOG.info(obj.toString()), wherein that method inspects the log level setting, then decides whether to ignore.
Are you saying that Java does or should "optimize" in such a way that it branches to the logic within each method called during runtime, inspects which code would be executed per possibly mutable runtime values (like log level), considers possible side effects (in-method and in parameters) then decides whether to even invoke the method at all?
Isn't the point that you don't generally stuff your log of things that are only strings but of things that can become strings?
(Asking since I know of your work with truffle)
A typical logger.info("user {} did the thing", user) can skip the actual string interpolation and user value stringification, if the log level is not > info. However, logger.info("user " + user + " did the thing") cannot avoid at least the execution of user.toString(), even after jit optimizations, unless the jit could prove that toString does not have side effects. But I don't believe the jvm jit tries to do that. Am I wrong?
Just because a JIT is "making assumptions", it does not mean it could or should make the assumptions needed for what you are calling for here. Generally speaking, optimizers should only be making semantically-neutral assumptions (optionally under the further assumption that the code fully conforms with any constraints the language spec. puts on program behavior, which is somewhat controversial in C.) Furthermore, a JIT can only spend a limited amount of resources in determining whether skipping the evaluation of an argument expression would be semantically neutral. (At this point, functional-programming enthusiasts are rolling their eyes...)
If logging was part of the language, one could simply rule by fiat that arguments shall not be evaluated unless logging is enabled, but log4j is just another collection of user-defined classes, and gets no special treatment.
You might want to refresh yourself on who you're talking to here on this one. I'm sure that there are certainly people more familiar with the JIT behavior of HotSpot and Graal out there, but I don't know that any of them post here on the topic.
I'm not saying that when it comes to the JVM's optimization behaviors that I'd jump without checking that there's water on Chris's assurance, but I'm not saying I wouldn't either.
It occurred to me today that, somewhat ironically, JNDI+LDAP calls appear (at least to me on first sight) to be prime examples of expressions that would be difficult for a JIT optimizer to analyze for side-effects, given that this is a mechanism for making remote calls. If I am mistaken in so thinking, I would be genuinely interested in learning about how it is done.
To be clear, I am not suggesting that these sort of expressions are inherently insecure - that would depend, I think, on whether they involve user input in a way that allows an attacker to take control of the call. A hard-coded JNDI lookup from the arguments list of a log4j call might be inefficient, but no less secure than if made anywhere else.
I am duly chastened, and ready to learn what boneheaded mistakes I am making. For one thing, I take it that that the analysis is tractable for most practical cases.
Assuming a global mutable variable will never change is generally not a safe assumption.
I suppose, at least in principle, that something like the GraalVM AOT compiler has the option to scan all of the code to verify that it never is mutated. But HotSpot cannot, because it only JITs code as it is loaded.
Actually this is not true. Hotspot can deoptimize code. It optimizes making some assumptions. If these assumptions stop holding, code gets deoptimized, measured for a while, and reoptimized with new assumptions.
For example, an interface implemented by only 1 class might get the class inlined. If a second implementer pops up (which can happen at runtime for e.g. some dynamically generated class), all of this will get undone.
There was a series of small articles with all these things, but I can't seem to find them right now.
Obviously with your background you know this offhand, but is there a mere-mortals reference for JVM JIT behaviors and capabilities to keep in one's back pocket? I have a pretty strong understanding of a lot of the Java universe (particularly around the JMM) but modern JVM JIT behaviors are a black box to me.
I believe the problem is just that the toString methods implicitly called when concatenating strings and objects, can execute side effects. And the JIT has to keep those effects if it cannot prove they are just heating up your CPU
That struck me at first as bizarre - but is it because one can override a toString method with something that has side-effects? (Or give a user-defined class a tosSring() with side-effects, I suppose.)
I'm confused. Do people not overload toString() on a regular basis along with .equals() and .hashCode() anymore? Common case was prettyprinting of deep object structures, and yes, technically, you could update state while doing it, but dear God, why? Hell, API method names don't actually mean squat as far as the JVM is concerned. It's just the consensus of the community to go with convention over config. If everyone does the same thing with methods of the same name, it decreases cognitive overhead.
Did I fall into an alternate dimension where 90% of Java seems to only use preexisting objects now, and people hardly ever write their own code?
(No disrespect to the asker; it's just such a surprising question to stumble across, that mentally speaking, I had to deoptomize my mental model of the world to accommodate for people who may not have worked much with Java).
I get functional is all the rage, but wow. Default toString() impl is inherited from Object, and basically gives you class and instance number.
So you've moved the IF statement that was checking the logging level from the program code into the JIT. Great. No saving. (And I doubt it actually does this - as it's not gaining anything)
As long as you call the initial if more often than you modify (a), you’re fine (and the JVM was able to see that you called your if 10k times without modifying the value of (a) even once)
Respectfully, your post is very wrong :) JIT compilers are very fast and run once to prep the code that will run many times. If you had flat code that was always run exactly once the games would be lesser, yes
Sure it can, since the JIT has access to runtime information and all the code. It could reorder the steps so first the log level is checked, then that block is run.
Should the JIT call incrementAndReturnFooCount if debug logs are disabled? This is a long-recognized pitfall of C preprocessor macros that look like function calls but may simply be defined away, causing unexpected behavior.
I guess I’m not up on the state of the art with respect to the JRE’s ability to determine whether arbitrary code has side effects. That’s quite a broad problem. If I write fizzbuzz but use log.debug for output, does it optimize my main() to be empty unless I run it with debug enabled?
The JIT works by making assumptions using runtime information, and by discarding compiled code when the conditions change and the assumptions are not valid anymore.
Up until last week, I thought you’re not supposed to do this out of performance reasons (string needs to be evaluated even if not logged). That log4j does anything else on this string than writing it to some log destination is the real problem and the cause of all these bugs…
It's also a very bad idea in e.g. C, where printf has all kinds of exploitable behaviour if the attacker controls the format string passed into it. Format strings are a kind of code, and should be treated as such when considering injection attacks (by far the worst aspect of the log4j bug what that it would expand these strings on the final formatted strings: this was completely unnecessary for the features they were aiming to support and made the vulnerability even worse).
Since version 2.16, log4j doesn’t perform such substitutions on the message string anymore, unless explicitly requested by the log pattern in the configuration (i.e. by `%msg{lookup}`).
This is why Python's logging module treats the argument as a literal string without performing expansions if you only give it a single argument. That avoids such issues.
That makes no difference for the ${…} substitutions, which also aren’t applied to messages anymore. The new issue is about log patterns defined in the appender configurations.
Using format strings makes no difference for the log4j vulnerabilities. The stage at which the vulnerable substitutions are applied is after the application-level format string has already been formatted. The vulnerabilities occur at the level of applying the Appender log pattern, which is a mechanism separate from the application-level format strings.
To be fair, most descriptions of the vulnerabilities and of the fixes do a bad job of explaining when and where the substitutions are performed and how they (don’t) relate to application-level format strings.
It seems like the ability to declare a parameter to be a constant string (no runtime computation allowed) would have been useful here, or in any printf-like API. If you really want to do that, it could be using a different function.
Just for the record, it is possible to type check format strings with regard to their actual arguments. OCaml does this, for instance, probably since its inception ~30 years ago or so.
That was changed with version 2.16. The lookups are not applied to either the message format or the message arguments anymore, only to log patterns defined in the configuration.
Plenty of existing code does things like:
Rather than using fixed format strings and {} place holders. You’re not supposed to, but it’s far from uncommon.