I don't get these complaints about `sum!(a, a)`. Sure it's a bit of a footgun that you can overwrite the array you are working with. This doesn't rise to a "major problem" of composability.
The histogram errors seem annoying though. Hopefully they can get fixed.
Sure, it's unsurprising that it produces unexpected results, but there are actually semantics that should be expected. The problem is that implementing those semantics correctly for all cases is hard, because aliasing. Same issue that e.g. memcpy() vs memmove() have.
The obvious semantics for these functions is that f!(a, args...) should do the same thing as a .= f(args...).
It's only undefined behavior because the simple implementations don't do that in the presence of aliasing.
I brought up memcpy() and memmove() (which in C are copying identity functions on bytes) exactly for this point. memcpy() has undefined behavior when the source and destination ranges overlap (implementable as a simple loop), while memmove() does the right thing if they do overlap, at the cost of having to check what direction they overlap when they do. And in C you can actually easily check if they overlap and in what direction, because the only interface there is the pointer. Aliasing with objects with internal details that are more complicated than that to check is difficult, perhaps too difficult to expect. But it is possible if your only handling your own objects: witness analogous behavior getting specified in numpy: https://docs.scipy.org/doc/numpy-1.13.0/release.html#ufunc-b... . They do note that this can require allocation, even in some cases where it shouldn't. But not allocating is of course most of the point of the in-place versions.
Yeah allocation seems like the biggest hangup here. I would rather have a function stick to a "no allocating" contract and allow for some undefined behavior than have a function unexpectedly allocate to preserve safety.