Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting investigation!

I had an experiment with getting the Rust compiler to vectorise things itself, and it seems LLVM does a pretty good job automatically, e.g. on my computer (x86-64), running `rustc -O bytesum.rs` optimises the core of the addition:

  fn inner(x: &[u8]) -> u8 {
      let mut s = 0;
      for b in x.iter() {
          s += *b;
      }
      s
  }
to

  .LBB0_6:
  	movdqa	%xmm1, %xmm2
  	movdqa	%xmm0, %xmm3
  	movdqu	-16(%rsi), %xmm0
  	movdqu	(%rsi), %xmm1
  	paddb	%xmm3, %xmm0
  	paddb	%xmm2, %xmm1
  	addq	$32, %rsi
  	addq	$-32, %rdi
  	jne	.LBB0_6
I can convince clang to automatically vectorize the inner loop in [1] to equivalent code (by passing -O3), but I can't seem to get GCC to do anything but a byte-by-byte tranversal.

[1]: https://github.com/jvns/howcomputer/blob/master/bytesum.c



GCC vectorizes the sum fine if the input is unsigned bytes, cf [1]. My guess is some odd interaction between integer promotions and the backend.

[1] http://goo.gl/B7KX1V


I think it's just the type of the accumulator[1] that influences this, GCC seems to vectorize it fine for any type other than signed char/signed short.

[1] s in your code, result in the author's

Edit: The reason for the failure appears to be this:

    test.c:7:2: note: reduction: not commutative/associative: s_11 = (int8_t) _10;
Edit: GCC vectorizes this fine when compiling with -fwrapv, which gives you the semantics the author probably expected.


This link seems to be breaking the HN comments for this page. :)


It's a shortened link (because the true link is so long, as you discovered), maybe you have a browser extension that expands them?


Ah, you're right, I do (Tactical URL Expander for Chrome)! That'd sure explain it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: