> This looks worrying, but if you read the issue thread it seems that this can o...

karatinversion · on Dec 18, 2021

Oh yes, I’ve seen several programs that spend ~30% of their CPU cycles formatting strings that are immediately thrown away because the log level is not high enough. Now you can also include vulnerabilities with no extra effort!

jboy55 · on Dec 18, 2021

Years ago we had a system that was performing really badly. It turns out that it spent over 50% of its time on

log.debug(xmldoc)

The debug method took a string, and Java was converting the xmldoc to a string. This was back in the 1.4/1.5 days.

dmurray · on Dec 18, 2021

In Java? Surely one of the benefits of the JIT is to compile logger.debug() into a no-op if the log level is not high enough.

clon · on Dec 18, 2021

You would still be doing the job of concatenating together the error message, possibly rendering some complex objects to a string, that is then fed to the no-op. The point parent was making is that there is also a performance aspect to this, in addition to the security aspect.

chrisseaton · on Dec 18, 2021

> You would still be doing the job of concatenating together the error message, possibly rendering some complex objects to a string, that is then fed to the no-op.

Why would you still do be doing that work? If a value goes into a no-op, then the value isn't computed. (In theory - I'm sure it doesn't always work out 100% of the time in practice.)

christophilus · on Dec 18, 2021

Unless the JVM is sure the logic is not effectful, it couldn’t eliminate it.

mumblemumble · on Dec 18, 2021

There's an even simpler reason why the JIT compiler can't prune it: it's possible to dynamically change the logging level at run time.

barrkel · on Dec 18, 2021

That is actually a reason that a JIT compiler can prune where an AOT compiler can't. JIT can deoptimize when assumptions change.

One of the bigger wins with the JVM is assuming that a virtual method can be called statically if there are no derived classes. That's a huge win for every leaf class in the class tree. The assumption can change every time a class is loaded of course. The optimization is called devirtualization and it can be combined with inlining to get even bigger wins.

mumblemumble · on Dec 18, 2021

Fair point... but it seem like, even so, doing that to monitor whether a single integer variable might change sounds like a lot of added complexity. Are there other use cases that would help to justify it? Reducing the impact of failing to follow logging best practices doesn't seem like an obviously sufficient cause.

chrisseaton · on Dec 18, 2021

It's a standard part of the JVM https://docs.oracle.com/javase/7/docs/api/java/lang/invoke/S....

chrisseaton · on Dec 18, 2021

The JVM knows what string concatenation does.

christophilus · on Dec 18, 2021

`"hello" + myFooObject.toString()` Can you eliminate that? It's impossible to say without knowing whether or not `myFooObject.toString()` is effectful. Maybe the JVM can make that assessment. Maybe not. I don't know, but it's not as trivial as "knowing what string concatenation does".

sour-taste · on Dec 18, 2021

A straightforward way to do this is for Java to assume built in types like strings are side effect free, but custom types may not be. This would likely cover a good portion of log lines but not all, obviously.

envp · on Dec 18, 2021

But what if one of the arguments is a function call? Then it isn't easy to prove there are no side effects

chrisseaton · on Dec 18, 2021

If it’s too big to in-line then yes.

vlovich123 · on Dec 18, 2021

That’s says absolutely nothing about whether the function has a side effect.

chrisseaton · on Dec 18, 2021

If you can inline it into the compilation unit, then you can see if it has side effects or not.

unclebucknasty · on Dec 18, 2021

The noop determination is made during runtime within the log method called. So, it's LOG.info(obj.toString()), wherein that method inspects the log level setting, then decides whether to ignore.

Are you saying that Java does or should "optimize" in such a way that it branches to the logic within each method called during runtime, inspects which code would be executed per possibly mutable runtime values (like log level), considers possible side effects (in-method and in parameters) then decides whether to even invoke the method at all?

alserio · on Dec 18, 2021

Isn't the point that you don't generally stuff your log of things that are only strings but of things that can become strings? (Asking since I know of your work with truffle)

chrisseaton · on Dec 18, 2021

I assumed things like the names of resources that are already strings because you’re using them in the actual program?

alserio · on Dec 18, 2021

A typical logger.info("user {} did the thing", user) can skip the actual string interpolation and user value stringification, if the log level is not > info. However, logger.info("user " + user + " did the thing") cannot avoid at least the execution of user.toString(), even after jit optimizations, unless the jit could prove that toString does not have side effects. But I don't believe the jvm jit tries to do that. Am I wrong?

chrisseaton · on Dec 18, 2021

If user is a string, then #toString() is a trivial method, will inline, and become a no-op.

alserio · on Dec 18, 2021

However, if user happened be an user-defined User object it would be hard to guarantee that it could be inlined

chrisseaton · on Dec 18, 2021

isbvhodnvemrwvn · on Dec 18, 2021

Logger levels are mutable to allow switching them at runtime, JIT can not make an assumption that the log level will stay the same.

chrisseaton · on Dec 18, 2021

> JIT can not make an assumption

The whole point of a JIT is that it can make assumptions. 'Assumption' is literally the term used in the JIT to track things like this.

mannykannot · on Dec 18, 2021

Just because a JIT is "making assumptions", it does not mean it could or should make the assumptions needed for what you are calling for here. Generally speaking, optimizers should only be making semantically-neutral assumptions (optionally under the further assumption that the code fully conforms with any constraints the language spec. puts on program behavior, which is somewhat controversial in C.) Furthermore, a JIT can only spend a limited amount of resources in determining whether skipping the evaluation of an argument expression would be semantically neutral. (At this point, functional-programming enthusiasts are rolling their eyes...)

If logging was part of the language, one could simply rule by fiat that arguments shall not be evaluated unless logging is enabled, but log4j is just another collection of user-defined classes, and gets no special treatment.

eropple · on Dec 18, 2021

You might want to refresh yourself on who you're talking to here on this one. I'm sure that there are certainly people more familiar with the JIT behavior of HotSpot and Graal out there, but I don't know that any of them post here on the topic.

I'm not saying that when it comes to the JVM's optimization behaviors that I'd jump without checking that there's water on Chris's assurance, but I'm not saying I wouldn't either.

mannykannot · on Dec 19, 2021

It occurred to me today that, somewhat ironically, JNDI+LDAP calls appear (at least to me on first sight) to be prime examples of expressions that would be difficult for a JIT optimizer to analyze for side-effects, given that this is a mechanism for making remote calls. If I am mistaken in so thinking, I would be genuinely interested in learning about how it is done.

To be clear, I am not suggesting that these sort of expressions are inherently insecure - that would depend, I think, on whether they involve user input in a way that allows an attacker to take control of the call. A hard-coded JNDI lookup from the arguments list of a log4j call might be inefficient, but no less secure than if made anywhere else.

mannykannot · on Dec 18, 2021

I am duly chastened, and ready to learn what boneheaded mistakes I am making. For one thing, I take it that that the analysis is tractable for most practical cases.

gavinray · on Dec 18, 2021

You might want to google who Chris Seaton is lol.

mumblemumble · on Dec 18, 2021

It can only make safe assumptions.

Assuming a global mutable variable will never change is generally not a safe assumption.

I suppose, at least in principle, that something like the GraalVM AOT compiler has the option to scan all of the code to verify that it never is mutated. But HotSpot cannot, because it only JITs code as it is loaded.

hyperman1 · on Dec 18, 2021

Actually this is not true. Hotspot can deoptimize code. It optimizes making some assumptions. If these assumptions stop holding, code gets deoptimized, measured for a while, and reoptimized with new assumptions.

For example, an interface implemented by only 1 class might get the class inlined. If a second implementer pops up (which can happen at runtime for e.g. some dynamically generated class), all of this will get undone.

There was a series of small articles with all these things, but I can't seem to find them right now.

chrisseaton · on Dec 18, 2021

> Assuming a global mutable variable will never change is generally not a safe assumption.

Put your logging level behind a switch point and C2 will treat it as a constant but still let it be changed - that works today.

eropple · on Dec 19, 2021

Obviously with your background you know this offhand, but is there a mere-mortals reference for JVM JIT behaviors and capabilities to keep in one's back pocket? I have a pretty strong understanding of a lot of the Java universe (particularly around the JMM) but modern JVM JIT behaviors are a black box to me.

alserio · on Dec 18, 2021

I believe the problem is just that the toString methods implicitly called when concatenating strings and objects, can execute side effects. And the JIT has to keep those effects if it cannot prove they are just heating up your CPU

mannykannot · on Dec 18, 2021

That struck me at first as bizarre - but is it because one can override a toString method with something that has side-effects? (Or give a user-defined class a tosSring() with side-effects, I suppose.)

alserio · on Dec 18, 2021

Well, yes, you could simply put a System.out.println in there and it would be valid java

salawat · on Dec 19, 2021

I'm confused. Do people not overload toString() on a regular basis along with .equals() and .hashCode() anymore? Common case was prettyprinting of deep object structures, and yes, technically, you could update state while doing it, but dear God, why? Hell, API method names don't actually mean squat as far as the JVM is concerned. It's just the consensus of the community to go with convention over config. If everyone does the same thing with methods of the same name, it decreases cognitive overhead.

Did I fall into an alternate dimension where 90% of Java seems to only use preexisting objects now, and people hardly ever write their own code?

(No disrespect to the asker; it's just such a surprising question to stumble across, that mentally speaking, I had to deoptomize my mental model of the world to accommodate for people who may not have worked much with Java).

I get functional is all the rage, but wow. Default toString() impl is inherited from Object, and basically gives you class and instance number.

callmeal · on Dec 18, 2021

>Logger levels are mutable to allow switching them at runtime, JIT can not make an assumption that the log level will stay the same.

Java HotSpot can make and verify that assumption. And switch code when the level changes.

nbevans · on Dec 18, 2021

So you've moved the IF statement that was checking the logging level from the program code into the JIT. Great. No saving. (And I doubt it actually does this - as it's not gaining anything)

molyss · on Dec 18, 2021

You didn’t move the IF. you literally removed it.

If (a) DoExpensiveStuff());

Becomes

When modifyA() RecompileIfWithNewAValue();

As long as you call the initial if more often than you modify (a), you’re fine (and the JVM was able to see that you called your if 10k times without modifying the value of (a) even once)

ludamad · on Dec 18, 2021

Respectfully, your post is very wrong :) JIT compilers are very fast and run once to prep the code that will run many times. If you had flat code that was always run exactly once the games would be lesser, yes

akvadrako · on Dec 18, 2021

Sure it can, since the JIT has access to runtime information and all the code. It could reorder the steps so first the log level is checked, then that block is run.

hamburglar · on Dec 18, 2021

log.debug(“foo count: “ + fooHandler.incrementAndReturnFooCount());

Should the JIT call incrementAndReturnFooCount if debug logs are disabled? This is a long-recognized pitfall of C preprocessor macros that look like function calls but may simply be defined away, causing unexpected behavior.

akvadrako · on Dec 18, 2021

It should call it if it has side effects, otherwise not. Java isn't C; the runtime can be a lot smarter.

hamburglar · on Dec 18, 2021

I guess I’m not up on the state of the art with respect to the JRE’s ability to determine whether arbitrary code has side effects. That’s quite a broad problem. If I write fizzbuzz but use log.debug for output, does it optimize my main() to be empty unless I run it with debug enabled?

millerm · on Dec 18, 2021

How could it? The log level is not a constant. It is evaluated at runtime, every time. You can’t compile that out. Am I wrong?

alserio · on Dec 18, 2021

The JIT works by making assumptions using runtime information, and by discarding compiled code when the conditions change and the assumptions are not valid anymore.

vbezhenar · on Dec 18, 2021

JVM is not that smart.

planb · on Dec 18, 2021

Up until last week, I thought you’re not supposed to do this out of performance reasons (string needs to be evaluated even if not logged). That log4j does anything else on this string than writing it to some log destination is the real problem and the cause of all these bugs…

rcxdude · on Dec 18, 2021

It's also a very bad idea in e.g. C, where printf has all kinds of exploitable behaviour if the attacker controls the format string passed into it. Format strings are a kind of code, and should be treated as such when considering injection attacks (by far the worst aspect of the log4j bug what that it would expand these strings on the final formatted strings: this was completely unnecessary for the features they were aiming to support and made the vulnerability even worse).

layer8 · on Dec 18, 2021

Since version 2.16, log4j doesn’t perform such substitutions on the message string anymore, unless explicitly requested by the log pattern in the configuration (i.e. by `%msg{lookup}`).

alexiaya · on Dec 18, 2021

This is why Python's logging module treats the argument as a literal string without performing expansions if you only give it a single argument. That avoids such issues.

layer8 · on Dec 18, 2021

That makes no difference for the ${…} substitutions, which also aren’t applied to messages anymore. The new issue is about log patterns defined in the appender configurations.

acdha · on Dec 18, 2021

This is really making me feel better about badgering our contractors to use format strings for effective logging to Sentry.

layer8 · on Dec 18, 2021

Using format strings makes no difference for the log4j vulnerabilities. The stage at which the vulnerable substitutions are applied is after the application-level format string has already been formatted. The vulnerabilities occur at the level of applying the Appender log pattern, which is a mechanism separate from the application-level format strings.

acdha · on Dec 18, 2021

Ah, the person I was replying to had conflated the two? I retract my feeling of relief, as is the course for this week.

layer8 · on Dec 18, 2021

To be fair, most descriptions of the vulnerabilities and of the fixes do a bad job of explaining when and where the substitutions are performed and how they (don’t) relate to application-level format strings.

skybrian · on Dec 18, 2021

It seems like the ability to declare a parameter to be a constant string (no runtime computation allowed) would have been useful here, or in any printf-like API. If you really want to do that, it could be using a different function.

ericbarrett · on Dec 18, 2021

I think Go's log package, and similar, have the best take on this:

    log.Print(fixed_string)
    log.Printf(format_string, args...)

Simple, unambiguous.

rixed · on Dec 19, 2021

Just for the record, it is possible to type check format strings with regard to their actual arguments. OCaml does this, for instance, probably since its inception ~30 years ago or so.

skybrian · on Dec 18, 2021

Well, almost. If you change Print with one argument to Printf, the compiler doesn’t complain. Maybe lint or vet will warn about it.

If it were a special type then the compiler could do it.

thom · on Dec 18, 2021

I thought log4j performed no escaping so even if you use its logging calls that take parameters you’re still vulnerable?

layer8 · on Dec 18, 2021

That was changed with version 2.16. The lookups are not applied to either the message format or the message arguments anymore, only to log patterns defined in the configuration.