I had an experiment with getting the Rust compiler to vectorise things itself, and it seems LLVM does a pretty good job automatically, e.g. on my computer (x86-64), running `rustc -O bytesum.rs` optimises the core of the addition:
fn inner(x: &[u8]) -> u8 {
let mut s = 0;
for b in x.iter() {
s += *b;
}
s
}
I can convince clang to automatically vectorize the inner loop in [1] to equivalent code (by passing -O3), but I can't seem to get GCC to do anything but a byte-by-byte tranversal.
I think it's just the type of the accumulator[1] that influences this, GCC seems to vectorize it fine for any type other than signed char/signed short.
[1] s in your code, result in the author's
Edit: The reason for the failure appears to be this:
test.c:7:2: note: reduction: not commutative/associative: s_11 = (int8_t) _10;
Edit: GCC vectorizes this fine when compiling with -fwrapv, which gives you the semantics the author probably expected.
I had an experiment with getting the Rust compiler to vectorise things itself, and it seems LLVM does a pretty good job automatically, e.g. on my computer (x86-64), running `rustc -O bytesum.rs` optimises the core of the addition:
to I can convince clang to automatically vectorize the inner loop in [1] to equivalent code (by passing -O3), but I can't seem to get GCC to do anything but a byte-by-byte tranversal.[1]: https://github.com/jvns/howcomputer/blob/master/bytesum.c