Everything has correctness issues somewhere. Julia ships an entire patched versi...

Strilanc · on May 16, 2022

> Everything has correctness issues somewhere.

This is fallacy of gray. The blog post isn't complaining that there are non-zero bugs, it's complaining that when you use the language you hit a lot of correctness bugs. More bugs than you'd hit using e.g. python.

Also, to the extent that Julia uses LLVM, a correctness bug in LLVM is also a correctness bug in Julia. So arguing "LLVM has lots of correctness bugs" is not helping the case...

> because the code is all Julia, it's really easy to dig in there and find potential bugs.

The blog post is about bugs hit while running code, not bugs found while reading code. The fact the issue can be understood and pointed at is great, but it's the number of issues being hit that's the problem.

suavesito · on May 17, 2022

> So arguing "LLVM has lots of correctness bugs" is not helping the case

It does not help the case about the correctness of Julia, but it does help the case about Julia having more bugs than other software (negatively for the other projects). Every library built with LLVM that touches those code paths will have those bugs.

Another thing to have in mind is that Julia ships patches for some of these, that are not used upstream yet. So Julia does not suffer from some bugs on LLVM that other projects might.

ChrisRackauckas · on May 17, 2022

It shows that Julia's tests are systematically finding (and leading to fixes) of numerical bugs that are pervasive throughout the rest of the LLVM ecosystem. And since Julia's LLVM is patched to solve these while other variants of LLVM are not, Julia is more correct in these aspects than other languages which rely on the Base build of LLVM. Of course Julia doesn't solve "all bugs", but some of them (like the correctness of certain math library implementations) really make you question how hard other language tests are hammering those for correctness testing (Julia has a lot of numerical tests checking the precision of such methods against MPFR bigfloats at higher precision to ensure ~X ulp correctness for example). Julia definitely spends a lot of time testing numerical correctness than it does testing something like a web server. It's just a prioritization thing.

nickitolas · on May 24, 2022

Do Julia devs not upstream their LLVM patches?

adgjlsfhk1 · on May 24, 2022

We do, but it takes a while for LLVM to accept and release patches, so by the time any one issue is fixed, there will be a new bug to take it's place.

mbauman · on May 16, 2022

I do think there's a particularly unique challenge to Julia in that so many packages can theoretically coexist and interoperate. While it quadratically increases the power of Julia, it also quadratically increases the surface area for potential issues. That — to me — is the most interesting part of the blog post. How can we help folks find the "happy" paths so they don't get lost in the weeds by trying to differentiate a distributed SVD routine of an Offset BlockArray filled with Unitful Quaternions? And — as someone who worked with and valued Yuri's reported issues and fixes — how can I more quickly identify that they're not someone who gets joy out of making such a thing work?

p33p · on May 16, 2022

Good comments, Chris. I think the author has a little bit of nuance in that Julia isn't correct in the specific use cases he needs them to be. While your point is also well taken that Julia is correct in cases where other languages aren't as well.

I'm a little unfamiliar with the versioning in the package ecosystem, but would you say most packages follow or enforce SemVer? Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?

ChrisRackauckas · on May 16, 2022

> but would you say most packages follow or enforce SemVer?

The package ecosystem pretty much requires SemVer. If you just say `PackageX = "1"` inside of a Project.toml [compat], then it will assume SemVer, i.e. any version 1.x is non-breaking an thus allowed, but not version 2. Some (but very few) packages do `PackageX = ">=1"`, so you could say Julia doesn't force SemVar (because a package can say that it explicitly believes it's compatible with all future versions), but of course that's nonsense and there will always be some bad actors around. So then:

> Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?

That's not the issue. As above, the dependency graphs are very strict. The issue is always at the periphery (for any package ecosystem really). In Julia, one thing that can amplify it is the fact that Requires.jl, the hacky conditional dependency system that is very not recommended for many reasons, cannot specify version requirements on conditional dependencies. I find this to be the root cause of most issues in the "flow" of the package development ecosystem. Most packages are okay, but then oh, I don't want to depend on CUDA for this feature, so a little bit of Requires.jl here, and oh let me do a small hack for OffSetArrays. And now these little hacky features on the edge are both less tested and not well versioned.

Thankfully there's a better way to do it by using multi-package repositories with subpackages. For example, https://github.com/SciML/GalacticOptim.jl is a global interface for lots of different optimization libraries, and you can see all of the different subpackages here https://github.com/SciML/GalacticOptim.jl/tree/master/lib. This lets there be a GalacticOptim and then a GalacticBBO package, each with versioning, but with tests being different while allowing easy co-development of the parts. Very few packages in the Julia ecosystem actually use this (I only know of one other package in Julia making use of this) because the tooling only recently was able to support it, but this is how a lot of packages should be going.

The upside too is that Requires.jl optional dependency handling is by far and away the main source of loading time issues in Julia (because it blocks precompilation in many ways). So it's really killing two birds with one stone: decreasing package load times by about 99% (that's not even a joke, it's the huge majority of the time for most packages which are not StaticArrays.jl) while making version dependencies stricter. And now you know what I'm doing this week and what the next blog post will be on haha. Everyone should join in on the fun of eliminating Requires.jl.

kazinator · on May 16, 2022

> Julia ships an entire patched version of LLVM to fix correctness bugs in numerical methods

Sounds like the banana ships with the gorilla which requires the entire jungle, and we're too busy fixing the gorilla to give the banana our undivided attention.

nightpool · on May 16, 2022

I'll be honest, based on my experience with Julia, this makes me more worried about using e.g. libuv in production systems now, not less. I understand your opinion that "The easier it is to look at the code, the easier it is to find issues with it", but I don't think that has anything to do with the fact that `prod((Int8(100), Int8(100)))` and `prod([Int8(100), Int8(100)])` disagree, because someone decided to special-case tuple multiplication. And to make it even worse, this bug was even documented(!) in the comments by whoever committed the original code:

   # TODO: this is inconsistent with the regular prod in cases where the arguments
   # require size promotion to system size.

How did this pass code review? Why would it be okay for a standard library function to be "inconsistent" in this way?

(EDIT: Since writing this comment, I've realized that (100 * 100) % 256 is in fact 16, so the results are a little less inexplicable to me. I think having the types annotated in the REPL would have made it clearer what was going on, and it's still a very difficult inconsistency to debug, especially as an end user)

I also think your argument that "[...] you'll always have some newcomers write code (and documentation)" that is broken is completely incorrect, and it shifts the blame from providing a safe and easy-to-use system from the language authors onto the users. The OP goes to pains to point out that this was not just an issue of "some newcomers"—it was a fundamental issue across the entire community, including what seem to be some of the most heavily-used packages in Julia's ecosystem, including Distributions.jl and StatsBase.jl. It's deeply misleading to blame issues like that simply on "people who don't have a ton of development experience" and "newcomers writing documentation", and it indicates a lack of responsibility and humility from Julia's proponents.

P.S: You're correct that the documentation about @inbounds was written by someone who was new to the language (https://github.com/JuliaLang/julia/pull/19726). But in fact the example itself was copied over entirely as-is from devdocs, where it was written by the author of the boundschecking feature(!) https://github.com/JuliaLang/julia/pull/14474. And it was only fixed last year. And the entire docs PR was reviewed thoroughly by two core team members, with lots of changes and suggestions—but nobody noticed the index issue. So I don't think you can blame this one on newcomers.

cmcaine · on May 16, 2022

Julia released experimental support for arrays whose indexes don't start at 1 in Julia 0.5, October 2016.

The boundschecking feature was added in 2015, so at the time they wrote their code and examples, they were correct.

The documentation and review happened in December and January 2016/2017 when the non 1-based indexing was still experimental and very new, so I don't think this is as big a fail as you've made out either.

Yes, the documentation should have been updated when non-standard indexing was made non-experimental, and the reviewers should maybe have noted the new and experimental array indexing stuff, but it's only natural to miss some things.

nightpool · on May 16, 2022

That's fair enough! I was unaware of that history. But my point wasn't that the issue was "a big fail", it's that the GP was unfair in assigning the responsibility of that failure to "some newcomers [who] write code (and documentation) with this" while "the people who really know to handle these cases" are fine. The responsibility should have been on the people pushing for the experimental array indexing code to make it work safely with the existing usage of boundschecks that existed in the ecosystem and the existing documentation. It's a fundamental disagreement between whether the onus of code safety is on the user (who is responsible for understanding the totality of the libraries they're using and all of the ways they can fail) or on the programming language (for ensuring the stability and correctness of its code, documentation and ecosystem when making changes).

leephillips · on May 16, 2022

Just to clarify, the prod() bug you mention was fixed about a year ago.

nalimilan · on May 16, 2022

The problem in this case (as with most issues regarding `@inbounds`) is that this text was written before arrays with non-standard indices existed in Julia. So the example was correct at the time it was written, just like the StatsBase code was correct. Old code needs careful checking to fix all these occurrences.

nightpool · on May 16, 2022

Discussed in a sibling thread: https://news.ycombinator.com/item?id=31401155.

kazinator · on May 16, 2022

> I agree, code should never do that. It should be `eachindex(A)`

Will that generate the same code as "i in 1:length(A)"?

Maybe whoever wrote that didn't believe so at least, or perhaps didn't find it so at the time.

The reason @inbounds would have been used is performance, so that's likely why the for loop header was written that way?

mbauman · on May 16, 2022

`eachindex` is — in quite a few situations — faster than `1:n`.

We've also been trying to promote a culture of not blindly putting `@inbounds` notations on things as the compiler gets smarter. `@inbounds` is a hack around a dumb compiler, especially when the loop is as simple as many of these examples. It's not needed there anymore (but was 5 years ago).

bjourne · on May 16, 2022

Perhaps that is part of the point of the article? If you accept things like @inbounds, which is a horrible hack and was a horrible hack five years ago, then perhaps the culture is a little too tolerant towards horrible hacks. Because many of the bugs the author enumerates are of the "fixes the problem for now, let's deal with the consequences later" type.

kristofferc · on May 17, 2022

I wouldn't say that @inbounds is a "horrible hack". Just like the `unsafe` part of Rust is not a "horrible hack". There are cases where it is impossible for a compiler to statically verify that an index access is in bounds and in those cases it will need to emit a check and an exception. This prevents many other optimizations (for example SIMD). So for a language that is intended for people to write low-level numerical routines there has to be a way to opt out of these checks or people would have to write their numerical routines in a completely different language. But the important part is that index access is memory safe by default (as opposed to e.g. C) and you can also force boundschecking to be turned on (to override @inbounds) with a command-line flag (--check-bounds=yes). So if you want, you could pretend "@inbounds" doesn't exist by just aliasing your julia executable to apply that command line flag.

grayclhn · on May 16, 2022

Yes and no... Julia's been focused on high performance numerical computing from the beginning (and other related scientific applications). Using macros to get good performance from relatively generic code was (from my outside perspective) a really effective way to support real applications early on and also give time for the compiler to get "sufficiently smart" to make the macros less necessary.

kazinator · on May 16, 2022

The question is: is it at least as fast in all situations? Was it always that way?

The 1 to length loop just has to initialize a local variable and step it; it cannot do anything else. It doesn't worry about the kinds of array that A may be, with its particular configuration of indexing, right?

You may promote a culture of not doing certain things, but that by itself won't make those things disappear from existing code.

Say you're trying to ship some product and you receive a bulletin from the language mailing list encouraging you, "try not to use @inbounds, it's a hack around a dumb compiler". You know you have that in numerous places; but you're not going to stop what you're doing and start removing @inbounds from the code base. If you're remarkably conscientious, you might open a ticket for that, which someone will look into in another season.

mbauman · on May 16, 2022

> The question is: is it at least as fast in all situations? Was it always that way?

Yes and yes.

cbkeller · on May 16, 2022

I think it should be fine for performance AFAIU to use `eachindex` instead; at least I know `eachindex` plays nicely with LoopVectorization.jl with no performance costs there.

That said, I think you're exactly right that people may wonder just this and use the seemingly "lower-level" form out of concern with or without testing it.

celrod · on May 16, 2022

One of my intentions with the rewrite is to let `@turbo` to change the semantics of "unreachable"s, allowing it to hoist them out of loops. This changes the observed behavior of code like

  for i = firstindex(x):lastindex(x)+1
    x[i] += 2
  end

where now, all the iterations that actually would have taken place before the error will not have happened. But, hoisting the error check out when valid will encourage people to write safer code, while still retaining almost all of the performance. There is also a class of examples where the bounds checks will provide the compiler with information enabling optimizations that would've otherwise been impossible -- so there may be cases with the rewrite where `@inbounds` results in slower code than leaving bounds checking enabled.

cbkeller · on May 16, 2022

Oh, nice!

rcthompson · on May 16, 2022

Is "for i in 1:length(A)" ever correct? Should Julia just emit a warning any time it encounters that pattern? Or maybe something slightly more complicated, such as that pattern followed by usage of i to index into A inside the loop?

TimTheTinker · on May 16, 2022

> Is "for i in 1:length(A)" ever correct?

In some rare cases, it very well might be exactly what the code's author intended and needed.

I tend to lean towards when Martin Fowler calls an "enabling attitude"[0] (as opposed to a "directing attitude") -- that is, when faced with a choice about how to design the primitives of an interface, I lean more often towards providing flexibility, and I try to avoid choosing ahead of time what users aren't allowed to do. It's better to document what's usually the wrong way to do something than to enforce it in the design. You can never guess what amazing things people will create when they are given flexible, unrestricted primitives.

So for cases like this, I think it's better to rely on a flexible linting tool (if available) than warnings or errors.

[0] https://martinfowler.com/bliki/SoftwareDevelopmentAttitude.h...

dan-robertson · on May 16, 2022

Why not have a feature to allow you to turn off the warning? E.g. have something recognise 1:length(x) and complain unless you write e.g. @nowarn eachindex before it.

kristofferc · on May 17, 2022

Warning about such things is the job for a linter. There is a linter for Julia so such a thing could be added there. It shouldn't be a runtime warning though, like you propose.

rashidrafeek · on May 16, 2022

It is correct if `A` is of type `Array` as normal Array in julia has 1-based indexing. It is incorrect if `A` is of some other type which subtypes `AbstractArray` as these may not follow 1-based indexing. But this case errors normally due to bounds checking. The OP talks about the case where even bounds checking is turned off using `@inbounds` for speed and thus silently gives wrong answers without giving an error.

An issue was created sometime ago in StaticLint.jl to fix this: https://github.com/julia-vscode/StaticLint.jl/issues/337

cmcaine · on May 16, 2022

It's correct if you want to do something `length(A)` times and want an iteration counter, but it's never better than `for idx in eachindex(A)` if what you actually want are indexes into A (which is of course the much more common case).

Julia did not initially support arrays that aren't indexed from 1 (experimental support added in Julia 0.5, I don't know when it was finalised), and at that time I'm not even sure we had something like eachindex, certainly there would be no reason why someone would use it for an array.

a1369209993 · on May 16, 2022

> Is "for i in 1:length(A)" ever correct?

Yes, actually. While I have approximately zero knownledge of Julia specifically, a language-independent example might be:

  B = OneBasedArray(length(A))
  A_ = iter(A)
  for i in 1:length(A) { B[i] = pop(A_) }
  assert(iter_isdone(A_))

And if that looks contrived... yes; it is contrived.

> that pattern followed by usage of i to index into A inside the loop?

I can't think of any legitimate uses for that, but there probably are some; make sure to allow:

  len = length(A)
  for i in 1:len ...

as a `if( (x = foo()) )`-style workaround.

poulpy123 · on May 16, 2022

Why allow iterating with 1:length(A) if it's not the good way ?

cmcaine · on May 16, 2022

I don't think there's any clean way to stop that at a language level (some languages prevent this by disallowing random access to arrays, but that's a non-starter for a performance-oriented language), and also it would be a massively breaking change.

adgjlsfhk1 · on May 16, 2022

you can't disallow it at a language level since either way, you are just indexing with Ints. That said, we can add better linting rules to catch stuff like this.

grumpyprole · on May 16, 2022

> Everything has correctness issues somewhere.

Yes but Julia is (yet another) dynamic language, presumably for "ease of use". A language with static types would have made it easier to build correct software (scientific code in e.g. OCaml and F# can look pretty good). Julia chose a path to maximize adoption at the expense of building a reliable ecosystem. Not all languages choose to make this trade-off.

pron · on May 16, 2022

> A language with static types would have made it easier to build correct software

This claim is repeated often, but numerous attempts have failed to demonstrate that this is generally the case in practice (there have been a couple of studies showing an effect in very specific circumstances). Static types might indeed assist with correctness, but they are not the only thing that does, and in some situations they could come at the expense of others. I.e., even if types were shown to significantly help with correctness, it does not follow that if you want correctness your best course would be to add types.

Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one (if it were big, detecting it would have been easy).

Note that Matlab, the workhorse of scientific computing for a few decades now, is even less typed than Julia. That's not to say that Julia doesn't suffer from too many correctness issues (I have no knowledge on the matter), but even if it does, there is little support for the claim that typing is the most effective solution.

ThenAsNow · on May 16, 2022

We can trade anecdotes on this topic, but I've written numerical code in OCaml and also Julia. The strictness of OCaml's type system is painful in a numerical context but for virtually all other things it is awesome to pass code into the interpreter/compiler and catch structural problems at compile-time rather than maybe at runtime.

OCaml's type system is almost certainly not the right model for Julia but the ad-hoc typing/interface system Julia currently employs is at strong odds with compile-time correctness. There's almost certainly some middle ground to be discovered which might be unsound in a strict sense but pragmatically constrains code statically so there is high likelihood of having to go out of your way to pull the footgun trigger.

You can see how little type annotations are used in practice in major Julia libraries. It should be integral to best practice in the language to specify some traits/constraints that arguments must satisfy to be semantically valid, but what you often see instead is a (potentially inscrutable) runtime error.

pron · on May 17, 2022

"Awesome", i.e. more enjoyable for you, and "more correct", i.e. fewer bugs in production, are two very different things. I also prefer typed languages for the software I tend to write and find them more enjoyable, but that still doesn't make me claim that types lead to more correct software.

ThenAsNow · on May 17, 2022

I am not familiar with the studies you are relying on to make the point that statically-typed languages have no significant difference in terms of number of bugs in production compared to dynamically-typed. Measuring such things is challenging, and the most useful measure may not be in terms of "bugs in production" but by a number of other measures, such as how long it takes to surface bugs after the code is accepted by the interpreter/compiler, how much time is spent on writing the implementation vs. writing & running tests, how many bugs occur on major refactorings, etc. If you have citations for studies you like, I'm certainly interested.

My use of colloquialism aside, it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches. In my own experience, despite being a more experienced programmer in my Julia-writing phase than in my OCaml-writing phase, it takes much more time to surface bugs in my "running" Julia code than OCaml. The lack of determinism in surfacing these bugs does not suggest as much confidence in the Julia code. You could counter by saying I'm probably able to implement more functionality in Julia per unit of up-front development time than the equivalent development time in OCaml, which I'd probably have to concede, but that just highlights measuring these things in a directly-comparable way is not easy.

In the physical engineering disciplines, we often have disagreements about the level of sophistication of physics-based models that should be used for design and analysis. It's very reminiscent of these static vs. dynamic typing discussions in software development. There isn't a "one size fits all" answer, but generally, the more complex and expensive the system, the more important the models incorporate greater physical fidelity. My analogous conclusion here is a lot of technical/numerical code is complex enough that more rigor enforced by the language would likely be the right tradeoff for a net win on up-front correctness (vs correctness as a result of testing).

pron · on May 17, 2022

Anyone is allowed to prefer a programming style that suits their aesthetics and habits, and like that one over all others. Aesthetic preferences are a very valid way to choose your programming language — ultimately that's how we all pick our favourite languages — and there's no need to make up universal empirical claims to support our preferences.

Here's a good talk to watch on the subject: https://youtu.be/ePCpq0AMyVk

And here's a summary of various studies done: https://danluu.com/empirical-pl/

As of today, what we know is that if there's a positive effect of types on correctness, then it is probably a small one.

There's really no need to assert what is really a conjecture, let alone one that's been examined and has not been verified. If you believe the conjecture is intrinsically hard to verify, you're conceding that you're only claiming a small effect at best (big effects are typically not hard to verify), and so there's even less justification for continuing to assert it. It's okay to prefer typed languages even though they do not, as far as we know, have a big impact on correctness.

ThenAsNow · on May 18, 2022

  Anyone is allowed to prefer a programming style that suits their aesthetics and habits, and like that one over all others. Aesthetic preferences are a very valid way to choose your programming language — ultimately that's how we all pick our favourite languages — and there's no need to make up universal empirical claims to support our preferences.

That's fine, but I'm not sure what it has to do with my comment as it was not about preferences based on aesthetics or habits.

Thanks for the links though.

  There's really no need to assert what is really a conjecture, let alone one that's been examined and has not been verified.

There's no unsupported conjecture in "it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches."

  If you believe the conjecture is intrinsically hard to verify, you're conceding that you're only claiming a small effect at best (big effects are typically not hard to verify), and so there's even less justification for continuing to assert it.

It's easy to fall victim to the Robert McNamara fallacy, that if something isn't easy to measure its effect or importance is insignificant. Anyone looking back at U.S. defense and procurement policy from his era is free to observe the lack of real-world congruence with such thinking. The Dan Luu page you cited, more than anything else, seems to reinforce that the cited studies are hard to interpret for any rigorous conclusions or for validity of methodology.

This is why I did not make sweeping statements along the lines of "the majority of dynamically-typed software in production [no qualifier on what "production" means] would have fewer bugs if it were statically-typed" or the like.

pron · on May 18, 2022

> That's fine, but I'm not sure what it has to do with my comment as it was not about preferences based on aesthetics or habits.

Because you made the claim that "it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches," but that claim was simply not found to be true.

> There's no unsupported conjecture in "it is strictly more rigorous to catch equivalent bugs through the interpreter/compiler than through testing or other runtime-dependent approaches."

There is, unless you define "more rigorous" in a tautological way. It does not seem to be the case that soundly enforcing constraints at compile time always leads to fewer bugs.

> It's easy to fall victim to the Robert McNamara fallacy, that if something isn't easy to measure its effect or importance is insignificant.

The statement, "you will have fewer bugs but won't be able to notice it," is unconvincing. For one, if you can't measure it, you can't keep asserting it. At best you can say you believe that to be the case. For another, we care about the effects we can see. If the effect doesn't have a noticeable impact, it doesn't really matter if it exists or not (and we haven't even been able to show that a large effect exists).

That the effect is small is still the likeliest explanation, but even if you have others, your conjecture is still conjecture until it is actually verified.

> The Dan Luu page you cited, more than anything else, seems to reinforce that the cited studies are hard to interpret for any rigorous conclusions or for validity of methodology.

It does support my main point that despite our attempts, we have not been able to show that types actually lead to significantly fewer bugs, i.e. that the approach is "more rigorous" in some useful sense.

Tainnor · on May 17, 2022

I'm not sure if that's true, even big effects can be hard to verify if there are significant confounders.

For example, let's imagine that writing OCaml code really leads to fewer bugs than writing code in Lisp (to just choose two languages) but only after you've trained people in OCaml for ten years. Or maybe, technically Java leads to measurably fewer bugs than Ruby, but because most popular Java projects make heavy use of reflection, the effect dissipates... and so on (these are just examples for potential confounders, I'm not claiming they're true).

You are correct that one cannot claim that "static typing leads to fewer bugs" is a demonstrably correct statement, but I don't think you can claim that there demonstrably can be no (big) effect either. And in the end, you're also allowed to believe in conjectures even when there is no solid evidence behind it. People do that all the time, even scientists.

pron · on May 17, 2022

You can believe in such a conjectures, but it's wise to consider the more probable possiblity that if an effect hasn't been found, then it is likely small.

Also, in the end it doesn't really matter, because the conjecture that's repeated as an assertion isn't said merely as a scientific claim, but as an attempt to convince. Companies are interested in some bottom line effect, and rather than trying to sell your favourite approach with something like, "I like it; maybe you'll like it, too", you make some unsupported assertion that goes like this: "you should use my thing because it will actually make an important contribution to some bottom-line effect you're interested in; oh, and by the way, you might not notice it." That isn't convincing at all, so it's best to stick with what we know: "I like it, maybe you'll like it, too."

Tainnor · on May 18, 2022

Show me the companies that only ever implement policies that have shown to be effective in rigorous empirical studies.

Usually some person (or a group of people) is in charge of some decision and that person will make judgment calls based on their beliefs. This is no less true of programming techniques than it is of management styles, corporate strategy or anything else.

Your insistence that we may not have beliefs about the very things we work with daily, unless they're empirically verified, is IMHO frankly ridiculous.

pron · on May 18, 2022

That's not my insistence at all. You can believe what you like. What you can't do is make empirical assertions that we've not been able to validate empirically.

Companies may adopt a technique based on empirical findings or anything else they like; most people choose a favourite programming language because they like working with it better. But the statement that types lead to fewer bugs is a very particular assertion that is simply unsupported by evidence. You may believe that using types reduces baldness and make your choices based on that, but it's still a conjecture/belief at best.

Tainnor · on May 18, 2022

I think you're guilty yourself of what you're accusing other people of.

I haven't seen people ITT arguing that there is empirical evidence for types providing better correctness guarantees, just that they strongly believe it to be the case given their own experience.

pron · on May 24, 2022

The original statement was "A language with static types would have made it easier to build correct software." This is an empirical claim that the evidence does not support. Note that it's not that there's merely no evidence supporting the claim, but that studies designed to support the claim failed to do so.

grumpyprole · on May 16, 2022

> Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one.

Which use cases, languages and static type systems are you referring to? The context is very important, especially when seeking to draw general conclusions from empirical studies.

As someone who has previously posted extolling the merits of static analysis, I'm very surprised at your position regrding static types. Static types help to constrain a language and enable reasoning, either by additional static analysis or otherwise.

It is precisely the flexibility of dynamic languages that makes them difficult to reason about and difficult to build correct software in. This is why the use of dynamic languages are mostly banned in the defense industry.

Static types clearly help with composition (one of the complaints with Julia), especially at scale. How many academic empirical studies considered multimillion-line code bases? I submit for evidence a lot of expensive type-retrofitting projects such as Facebook Hack, Microsoft Typescript or Python types, which demonstrate that many companies have or had real problems with dynamic languages at any kind of scale.

jolux · on May 16, 2022

> Note that Matlab, the workhorse of scientific computing for a few decades now, is even less typed than Julia.

You always make this argument when discussing PL features and I find it irksome. People get along fine without this feature, therefore there’s no sense in implementing it. But it cuts the other way, or we’d all still be using assembly. How many Matlab users know things could be better? Was the superiority of structured programming and avoiding GOTO ever empirically proven, or did we all just collectively realize it was a good idea?

pron · on May 16, 2022

> People get along fine without this feature, therefore there’s no sense in implementing it.

As someone whose job is to add new features to a programming language, that's never been my argument.

> But it cuts the other way, or we’d all still be using assembly

High-level languages were satisfactorily shown to be more productive than Assembly. I don't claim that no innovation works, just that not all do, and certainly not to the same degree. That feature X is helpful is certainly no evidence that feature Y is helpful, and that Python is more productive than Assembly does not support the claim that programs in OCaml are more correct than programs in Clojure.

Also, my argument isn't "we got by without it" or that no idea could ever work. It's that a specific claim was tested and unconfirmed.

> Was the superiority of structured programming and avoiding GOTO ever empirically proven, or did we all just collectively realize it was a good idea?

I don't know about the former, but the latter is certainly true, and until we actually reach concensus you can't claim we have.

BTW, I certainly don't claim that types aren't useful or even that they're not better in some ways (I believe that they help a lot with tooling and organisation), but the particular claim that they universally help with correctness, and do so better than other approaches, was studied, and simply not confirmed. You can't come up with a claim, try and fail to support it time and again, and keep asserting it as if it's obviously true, despite the evidence.

jolux · on May 16, 2022

> As someone whose job is to add new features to a programming language, that's never been my argument.

I’ve definitely seen you argue along the lines of “it hasn’t been implemented in Java, therefore nobody uses it and we can’t tell if it’s a good idea or not” before. Forgive me for assuming this followed from that.

> You can't come up with a claim, try and fail to support it time and again, and keep asserting it as if it's obviously true, despite the evidence.

But “correctness” of itself is pretty nebulous. If we define it as whether or not the program conforms to one’s intentions with writing it, I would expect static types alone not to show a significant difference in correctness. Probably formal methods do but they have much higher overhead.

However, in terms of eliminating patterns which are literally never correct, like dereferencing null pointers, violating resource lifetimes, or calling methods that don’t exist, static typing can in fact eliminate those patterns.

> the claim that programs in OCaml are more correct than programs in Clojure

My full-time job is Elixir so I know full well the consequences of maintaining large codebases in dynamic languages. I would switch to OCaml in a heartbeat if it ran on the BEAM! I want to know that I am calling functions correctly within a node when the module can be resolved at compile time. This is a really basic thing to want, and not one that dynamic languages can offer. The qualitative difference is similar to that between structured and unstructured programming: I can actually do local reasoning about a function without having to check all the call sites or write a lot of defensive tests.

This is an obvious advantage, and on some level I don’t really care if it contributes to formal correctness or not because it would make my job easier.

pron · on May 16, 2022

> I’ve definitely seen you argue along the lines of “it hasn’t been implemented in Java, therefore nobody uses it and we can’t tell if it’s a good idea or not” before.

You have not seen me argue anything along those lines. I have, however, said the converse, that we try not to adopt features in Java until they've proven themselves elsewhere.

> However, in terms of eliminating patterns which are literally never correct, like dereferencing null pointers, violating resource lifetimes, or calling methods that don’t exist, static typing can in fact eliminate those patterns.

But the implication is reveresed! From A => B, i.e. types prevent certain bad things, you're concluding B => A, i.e. if you don't want those bad things then you should use types. That simply does not follow.

> This is an obvious advantage, and on some level I don’t really care if it contributes to formal correctness or not because it would make my job easier.

I wouldn't dare imply that types don't have certain important advantages, but that doesn't support the specific claim that types generally and significantly improve correctness — which many have tried to show and failed — and it certainly doesn't support the much stronger claim that if you want to improve correctness, the most effective way to do it is to use types.

jolux · on May 16, 2022

> You have not seen me argue anything along those lines. I have, however, said the converse, that we try not to adopt features in Java until they've proven themselves elsewhere.

You’ve definitely argued that not enough software has been written in Haskell to know whether it’s the right tool for anything and whether strong types are actually a good idea.

> But the implication is reveresed! From A => B, i.e. types prevent certain bad things, you're concluding B => A, i.e. if you don't want those bad things then you should use types. That simply does not follow.

I really don’t know of a simpler way to do this than types. Do you? Honestly I would be using it if I did. All of the solutions I know to these problems involve types.

> that doesn't support the specific claim that types generally and significantly improve correctness — which many have tried to show and failed

What definition of correctness is being used here? Surely you’re not saying that optionals don’t eliminate null pointer dereferencing, are you?

pron · on May 17, 2022

> to know whether it’s the right tool for anything and whether strong types are actually a good idea.

Nope. I don't know what you mean by "a good idea", and I prefer typed languages myself (mostly for tooling support), but I do often point out that the claim that types improve correctness — let alone the claim that they do that better than other approaches — is an empirical claim that is not supported by empirical evidence (which, in fact, appears to contradict it).

Also, that there have been few programs written in Haskell, and that Haskell has failed to demonstrate that it leads to better correctness are both pretty basic facts.

> All of the solutions I know to these problems involve types.

I don't know what you mean by "solutions to these problems", but while we've not found a correlation between types and more correct programs, we have found correlations between code reviews and tests and more correct programs. Types might well be the solution to many things (e.g. automatic refactoring and jump-to-definition), but the empirical evidence we have suggests that increased correctness isn't one of them.

> Surely you’re not saying that optionals don’t eliminate null pointer dereferencing, are you?

Types certainly eliminate various kinds of errors, yet studies did not find that they reduce bugs (except in specific circumstances; for example, there was one study that reported that TypeScript has 15% fewer bugs than JavaScript).

Just to give you a sense for one reason that happens, we can take your example of Maybe types. A null pointer exception occurs when code assumes a reference can't be null, but is wrong to make the assumption. A Maybe type would force a test somewhere. But the question then, is, what do you do when the value is empty? A brilliant study on software correctness [1] found that most catastrophic crashes in distributed systems occur not because programmers fail to consider certain exceptional situations — in fact, the language forces them to consider those situtations — but because they frequently do the wrong thing when those situations occur.

[1]: https://www.usenix.org/system/files/conference/osdi14/osdi14...

jolux · on May 17, 2022

> Types certainly eliminate various kinds of errors, yet studies did not find that they reduce bugs

Right, but this is why I asked about structured programming. These things are hard to study by their nature. I understand there’s not a consensus on this, but there are a lot of programmers who feel quite strongly that static types reduce bugs. Maybe that’s not good enough for you! But it’s clear that you accept other practices as beneficial on insufficient evidence. Or maybe you don’t — maybe you don’t think writing correct unstructured programs is harder.

> most catastrophic crashes in distributed systems occur not because programmers fail to consider certain exceptional situations

This is irrelevant though. The question is not whether static types prevent the most common types of bugs, or the most dangerous (it’s clear that memory safety is more important than static types in that regard.) If static types narrow the problem space to situations where you explicitly made the wrong decision, that’s significant. If there are any bugs caused by not handling exceptional conditions (and we both know that there are), then static type systems help reduce those.

pron · on May 17, 2022

> These things are hard to study by their nature.

Small effects are hard to study by their nature. Big effects are usually easy to spot.

> there are a lot of programmers who feel quite strongly that static types reduce bugs

There are a lot of people who feel quite strongly that homeopathy cures all kinds of diseases, but they've failed to demonstrate that.

> But it’s clear that you accept other practices as beneficial on insufficient evidence

It's not about "accept." I myself practice typed programming without asserting the empirical claim that it reduces bugs, that seems not to be true (or at least, the effect does not seem to be big).

> The question is not whether static types prevent the most common types of bugs, or the most dangerous

So far, we've failed to show that static types reduce bugs, period. You're allowed to like them and promote them, but your feeling towards them does not make a specific empirical claim more or less true.

jolux · on May 17, 2022

> There are a lot of people who feel quite strongly that homeopathy cures all kinds of diseases, but they've failed to demonstrate that.

This is a really poor comparison. How many people who know enough to judge these things believe that? I can understand that you’re rigorous in the empirical claims you accept here, but surely you can see that experienced software developers have somewhat more basis to make claims about type systems than random people do about homeopathy.

> or at least, the effect does not seem to be big

Fair enough! It’s probably not as big as testing or code review but I do think it exists. You mentioned the TypeScript study, it’s not like there’s no evidence for believing this like there is with homeopathy.

> So far, we've failed to show that static types reduce bugs, period.

I mean, how could they not? You still haven’t explained this part, except with vague allusions that the problems they catch are not the most common problems. I’ve explained how static types reduce bugs: by eliminating cases where you intend to handle a case and forget to. I’ll specify that I mean a type system with exhaustivity checking or narrowing with control flow (like TS or Kotlin). This is not an empirical argument, it’s a rational argument, and I still legitimately don’t understand what the flaw is. The way I see it, static type systems restrict operations on types to those known to be valid at compile time. In a sound type system and compiler, the resulting program is guaranteed not to make invalid operations on types at runtime. Dynamic programs are fully free to make invalid operations on types at runtime. Some number of bugs are caused by making invalid operations on types at runtime. Where are those bugs in a statically typed program, if they haven’t been eliminated?

adgjlsfhk1 · on May 17, 2022

The extra bugs in a statically typed program go in the duplicated code that someone copied and changed the type names on because their type system wasn't flexible enough to let different types share the same code. This means that 3 years later when someone fixed a bug in part of the code, the bug remained in the other copy because the person writing the fix didn't know about the copy.

For a simple example, consider Arrow.jl vs the C++ implementation of the Arrow format. The Julia implementation is roughly 1/10th the lines of code (with more functionality), so even if there are 5x more bugs per line, the code still has fewer bugs.

Static types definitely reduce bugs per line, but they can still increase bugs per functionality.

jolux · on May 17, 2022

But there are plenty of static type systems that let different types share the same code, the problem you describe only exists in nominal type systems (like C++) as far as I know. With structural types like in TypeScript or Go you can express this trivially.

For that matter you can do this with subtype polymorphism in most cases. In Rust you can do it with trait objects as long as you control either the type or the trait. Probably there’s a way to do it in C++ too.

pron · on May 17, 2022

> but surely you can see that experienced software developers have somewhat more basis to make claims about type systems

But experienced software developers, more than "random people", should know that if they make a conjecture, it's tested and isn't verified, they should reconsider their conjecture.

> You mentioned the TypeScript study, it’s not like there’s no evidence for believing this like there is with homeopathy.

There is no evidence for believing this. If you believe the evidence for TS vs JS in particular, then you should also believe the failure to find a more general effect.

> I mean, how could they not?

That's an interesting question and there are many answers; I've been interested in the complex subject of software correctness for years, and have written a bit about it (https://pron.github.io/). The more you study software correctness, the more you learn how complicated it is and that there are no easy answers. In particular, you learn that it's not true that more soundness is always a good path toward more correctness. But if you accept your preconceived notions over empirical study, then there's little hope for making actual progress.

> Where are those bugs in a statically typed program, if they haven’t been eliminated?

I gave you an example of where they are. If you want to go down the path of thinking about the theory of software correctness, start by convincing yourself that for every JavaScript program there is a Haskell program (perhaps living entirely inside an Either monad) that behaves the same way.

jolux · on May 17, 2022

I don’t think the subject of software correctness in practice is itself well-studied enough to say conclusively that my conjecture is false. I think what can be said conclusively is that at scale people cannot write memory-safe code in an unsafe language or type-safe code in a dynamic language, but obviously these are not the only kinds of correctness.

> In particular, you learn that it's not true that more soundness is always a good path toward more correctness.

I’m still curious what definition of “correctness” you’re using here. Formal correctness? Bugs per line of code? Generally I’m thinking in terms of formal correctness, in which case I think it’s basically a truism that more soundness leads to more correctness. At some cost, perhaps.

> I gave you an example of where they are.

Yes, but those bugs are not unique to static languages, and I’ve never claimed that static languages eliminate all kinds of bugs. As far as I could tell from reading the study, it’s not evidence that static languages encourage this sort of bug more.

To be blunt: I think the set of kinds of bugs that can be written in dynamic languages is a strict superset of the kinds of bugs that can be written in a static language. Maybe I’m completely wrong about this! But this is the root of my reasoning.

pron · on May 17, 2022

> I don’t think the subject of software correctness in practice is itself well-studied enough to say conclusively that my conjecture is false.

I don't claim that. Given what we know, the likeliest explanation to the findings so far is that an effect, if it exists, is probably small.

> Formal correctness? Bugs per line of code?

Both would work.

> in which case I think it’s basically a truism that more soundness leads to more correctness. At some cost, perhaps.

And you'd be wrong, or, at least, the second part of your statement makes all the difference. What we want is the best correctness we can get for some given cost, or, given some effort, what should you do to get the most correct program? If you follow formal methods, some of the hottest lines of research right now are about reducing soundness to improve correctness.

> As far as I could tell from reading the study, it’s not evidence that static languages encourage this sort of bug more.

I didn't say they did. But you asked how we explain the observation that types don't improve correctness, and one explanation is that the kind of mistakes that types catch aren't the costliest bugs that make it to production, and perhaps the extra effort invested comes at the expense of other approaches that do uncover more serious bugs.

> But this is the root of my reasoning.

That's as good a conjecture to start with as any, but it needs to be revised with findings.

jolux · on May 17, 2022

> perhaps the extra effort invested comes at the expense of other approaches that do uncover more serious bugs

I guess in my experience the effort invested programming in a static language is really not that much higher than dynamic, and in some ways I find it less effortful. For example: pattern matching on a sum type, being sure that I’ve handled all the cases I want to. Is there good empirical research on this?

> That's as good a conjecture to start with as any, but it needs to be revised with findings.

I was attempting to make a factual statement, not a conjecture. If it is true that static types eliminate a class of errors, then type errors must be really cheap for static types not to be worth it on those grounds. My prior is that compiler errors are cheaper than runtime errors here.

pron · on May 18, 2022

Until it's been measured, a statement is a conjecture, not "factual." A conjecture that we've tried to verify yet failed to see a large effect is a problematic conjecture.

We don't have good empirical findings about many things, most likely because many effects are small at best. But it doesn't matter. You're can say that you still believe something despite failed attempts to measure it, but you can't see it's "factual." That is the difference between fact and conjecture.

jolux · on May 18, 2022

> That is the difference between fact and conjecture.

Yes, but I'm trying to make a formal statement of fact here, not an empirical one.

> I think the set of kinds of bugs that can be written in dynamic languages is a strict superset of the kinds of bugs that can be written in a static language.

Here I am attempting to make a formal statement about the set of runtime behaviors that can be exhibited under static type systems. What I'm saying is that there is a set of incorrect runtime behaviors that can only be exhibited in a dynamic type system, that is, the set of type errors. I'm not aware of any runtime errors that can only be exhibited under a static type system. I'm not well educated enough in the relevant fields to be able to formalize this with notation (I would do so if I was, I think notation communicates these things much more clearly than words) but I do believe it has a formal representation.

> A conjecture that we've tried to verify yet failed to see a large effect is a problematic conjecture.

I think we're somewhat talking past each other here. What I'm saying is that the set of possible incorrect runtime behaviors is smaller in a static language. This can't really be empirically verified, it should have a formal answer in type theory or programming language theory. It's possible I'm wrong about what the formal answer is, but I haven't seen you address it yet. It's also possible that this is true but not strongly related to the way that bugs evolve in typical software engineering practice, I've speculated upthread that the total number of bugs might be similar because programmers make a higher number of repeated mistakes from the more narrow set while programming in static languages (though personally this seems unlikely). It's further possible that this is just not a very significant effect, as you have conjectured. However it is my belief that if there is an effect on bugs overall that it derives from what I understand to be a formal character of the runtime behavior of statically typed programs, that there are fewer ways for them to "go wrong."

pron · on May 31, 2022

> What I'm saying is that there is a set of incorrect runtime behaviors that can only be exhibited in a dynamic type system, that is, the set of type errors. I'm not aware of any runtime errors that can only be exhibited under a static type system.

But that doesn't mean they have fewer bugs! They might well have more.

> What I'm saying is that the set of possible incorrect runtime behaviors is smaller in a static language.

Not exactly. For every program in an untyped language, you could write a typed program with the exact same behaviours.

The assertion that typed programs have fewer bugs is simply false in theory, and failed to be confirmed empirically.

> However it is my belief that if there is an effect on bugs overall that it derives from what I understand to be a formal character of the runtime behavior of statically typed programs, that there are fewer ways for them to "go wrong."

I understand that that is your belief, but it is supported by neither theory nor practice.

riwsky · on May 17, 2022

“Want B” is not the same as “B”.

“Should do A” is not the same as “A”.

The reverse of “types prevent certain kinds of bugs” would be that “all code which doesn’t suffer from those kinds of bugs is typed”, not “if you don’t want those bugs, use types”. The parent does not assume B->A; your actual disagreement with them is over whether A -> B implies A -is-the-most-effective-way-to—> B.

I share your understanding of the evidence base around types, and I agree with your conclusions; my beef here is just with divorcing the language of formal logic from its substance.

jolux · on May 17, 2022

Yeah, though I do actually think “all code which doesn’t suffer from those kinds of bugs is typed” is true to a first approximation, for what it's worth. That's a big part of the reason I prefer static types. Under a few thousand lines it's less true. It's really at scale that these things become a problem, and even so they're still manageable most of the time, they're just kind of a pain.

pron · on May 17, 2022

Formalising natural language is tricky, but I believe "every PL that leads to more correct programs is typed" is either equivalent or stronger than "if you want a PL that leads to more correct programs it should be typed." It could be stronger if the latter phrasing takes measure into account (so a bit of fuzzy logic), meaning the possibility that other approaches also lead to more correct programs but not as well. Still, I contend that my formalisation is correct enough to demonstrate the logical error that even if types imply correctness (not substantiated by evidence), it does not follow that correctness implies types (even less supported by evidence).

nancarrow · on May 18, 2022

> How many Matlab users know things could be better?

not very many in my experience - matlab rots the brain

StefanKarpinski · on May 16, 2022

In particular, not a single issue mentioned in this article would have been prevented by static type checking.

grumpyprole · on May 16, 2022

This is not true. For example, the issue regarding custom index ranges causing silent data corruption (6 examples) could be fixed with static types.

Look how many of the other bug reports contain the phrase "does not check for" or refer to specific primitive types.

DNF2 · on May 16, 2022

How would static types help with that? Whether your indexing range starts with zero or one or something else isn't necessarily encoded in the type domain. `1:length(A)` is just a range of `Int`s.

grumpyprole · on May 16, 2022

Why not encode the starting offset into the type domain? Or at least distinguish between normal and unusual. Then the function signature can restrict to 1-offset arrays if that is what it assumes internally.

markkitti · on May 16, 2022

If the function signature said `Array` rather than `AbstractArray`, then this code would have been fine. `Array` indexing starts at `1`.

``` julia> function f(A::Array) println(A[1:length(A)]) end f (generic function with 1 method)

julia> f([1,2,3,4]) [1, 2, 3, 4]

julia> f(OffsetArray(1:10, -1)) ERROR: MethodError: no method matching f(::OffsetVector{Int64, UnitRange{Int64}}) ```

You could prevent this problem using Julia's type system. The `AbstractArray` might have been too broad. Based on the chronology of the code that might not have been apparent. See other threads for details.

Another way would be to treat `firstindex` as a trait and dispatch on that. ``` julia> f(A::AbstractArray) = f(A, Val(firstindex(A))) f (generic function with 1 method)

julia> f(A::AbstractArray, firstindex::Val{1}) = println(A[1:length(A)]) f (generic function with 2 methods)

julia> f(A::AbstractArray, firstindex::Val{T}) where T = error("Indexing for array does not start at 1") f (generic function with 3 methods)

julia> f(A::AbstractArray, firstindex::Val{0}) = println("So you like 0-based indexing?") f (generic function with 4 methods)

julia> f([1,2,3,4]) [1, 2, 3, 4]

julia> using OffsetArrays

julia> f(OffsetArray(1:10, 1)) ERROR: Indexing for array does not start at 1 Stacktrace: [1] error(s::String) @ Base .\error.jl:33 [2] f(A::OffsetVector{Int64, UnitRange{Int64}}, #unused#::Val{2}) @ Main .\REPL[5]:1 [3] f(A::OffsetVector{Int64, UnitRange{Int64}}) @ Main .\REPL[3]:1 [4] top-level scope @ REPL[9]:1

julia> f(OffsetArray(1:10, -1)) So you like 0-based indexing? ```

DNF2 · on May 16, 2022

The starting offset is encoded in the type domain, btw, and accessible with the `firstindex` function.

But you will still want to calculate indices at runtime, and then out-of-bounds errors will have to be caught at runtime anyway.

DNF2 · on May 16, 2022

That means disallowing indexing with integers, I presume? Since an integer can take the values 0 or 1 equally. And what about the other end of the array. Must every index be restricted by type to be located in the acceptable range?

adgjlsfhk1 · on May 16, 2022

it is only fixed with static typing and no generics (i.e. C/Fortran). If you have a generic array supertype, a statically typed language would let you write exactly the same bug.

grumpyprole · on May 16, 2022

If one has a different type or trait for unusual and normal range indices, then the signature for the procedure that assumes indexing from 1 can be written to disallow other starting indices.

guenthert · on May 16, 2022

Julia allows you to specify the type of a datum if you feel the need (not unlike Common Lisp). Is any of the bugs the author mentioned related to the type system?

mattkrause · on May 16, 2022

I'm surprised at this critique, as I thought Julia's type system was often considered to be one of its strongest features.

ThenAsNow · on May 16, 2022

So, I really respect what you've done (for those who don't know, Chris is the original developer and lead of DifferentialEquations.jl) and use your work heavily. However, understanding and writing idiomatic Julia, especially with these large packages, is severely hampered by the documentation culture.

A prior comment I made, all of which seems unaddressed to me three years later: https://news.ycombinator.com/item?id=20589167

To be fair, I've only submitted a small documentation patch for a package and haven't significantly "put my money where my mouth is" on this topic. But I hope the next time there are thoughts among the core team about what is the next capability to add to the language, addressing this deficiency is prioritized.

ChrisRackauckas · on May 16, 2022

FWIW, I posted the other month that I'm looking for any devs who can help with building a multi-package documentation for SciML, since I don't think the "separate docs for all packages" ends up helpful when the usage is intertwined. SciML is looking for anyone looking to help out there (and there's a tiny bit of funding, though "open source sized" funding). In the meantime, we're having a big push for more comprehensive docstrings, and will be planning a Cambridge area hackathon around this (follow https://www.meetup.com/julia-cajun/ for anyone who is curious in joining in).

As for high level changes, there's a few not too difficult things I think that can be done: https://github.com/JuliaLang/julia/issues/36517 and https://github.com/JuliaLang/julia/issues/45086 are two I feel strongly about. I think limiting the type information and decreasing the stack size with earlier error checking on broadcast would make a lot of error messages a lot more sane.

KKKKkkkk1 · on May 16, 2022

FMA can't be broken on Windows because FMA is implemented in hardware by Intel. What's broken is the compiler that Julia uses on Windows.

ChrisRackauckas · on May 16, 2022

When FMA isn't in the hardware (due to using some chip where it doesn't exist) it has a fallback to a software-based emulation. That is incorrectly implemented in Windows. Julia ends up calling that in this case because that's what LLVM ends up calling, and so any LLVM-based language will see this issue.

celrod · on May 16, 2022

Even when FMA is implemented in hardware, LLVM will generally use the software version when the arguments are known at compile time.

stephencanon · on May 16, 2022

FMA is only implemented in hardware on Haswell and later uArches. If you’re running on (or compiling for) IVB or earlier, you’ll get a libcall instead, and MSVC’s has been broken since forever.

Diggsey · on May 16, 2022

Is this actually broken in MSVC, or is it broken because Julia is using mingw and linking to an ancient version of libc on windows (which is intentionally left as-is for back-compat)?

(I genuinely don't know, but the linked issue mentioned mingw specifically)

adgjlsfhk1 · on May 16, 2022

It's broken in MSVC and mingw (in different ways). See https://github.com/MicrosoftDocs/cpp-docs/pull/3526.

Diggsey · on May 16, 2022

Thanks for the link!

adgjlsfhk1 · on May 16, 2022

the problem is that LLVM will happily miscompile fma instructions by turning them into incorrect constants due to windows having a broken libm. This is a bug in C/C++, and I'm currently unaware of a language that has fma and a good compiler which gives correct fma results on Windows.

Const-me · on May 18, 2022

CPUs support these instructions for 9 years now. When ignoring these old CPUs, most languages and compilers are usually doing a good job. Example in C which does not depend on any library functions:

    double fma( double a, double b, double c )
    {
        __m128d av = _mm_set_sd( a );
        __m128d bv = _mm_set_sd( b );
        __m128d cv = _mm_set_sd( c );
        return _mm_cvtsd_f64( _mm_fmadd_sd( av, bv, cv ) );
    }

adgjlsfhk1 · on May 18, 2022

2 problems: Julia supports cpus without FMA, and on windows, llvm will use libc to constant fold the value of fma even on computers that have fma in hardware.

Const-me · on May 18, 2022

Hardware requirements are up to the product management. For instance, many modern videogames (they generally want compatibility because directly translates to sales) no longer support 32-bit or pre-AVX1 processors. Technically, Julia can drop the support of pre-FMA3 processors if it helps moving forward.

It’s inevitable anyway due to the changing hardware requirements of the OS, the only question is “when”. I don’t think Windows 10 21H2 supports any CPU which doesn’t have SSE 4.1, it’s only a matter of time when Windows will require support of newer instruction sets.

About LLVM, can’t they compile that thing with an option like -mfma to use hardware FMA3 for constant folding?