Comes with a single PCIe 3.0 x16 link on die, good for 16Gbyte/s. Kaveri was the...

erichocean · on Jan 22, 2014

Comes with a single PCIe 3.0 x16 link on die, good for 16Gbyte/s.

This is the area where Intel is just killing it with their E5 chips, along with being able to write directly to the L3 from I/O. (I have no idea if AMD does this.)

The E5 is so good that it lets you do entirely different architectures from what came before it. Total game changer.

nly · on Jan 22, 2014

> The E5 is so good that it lets you do entirely different architectures from what came before it.

As an example: Luke Gorrie is one such person who is actively talking about doing so by talking directly to Ethernet controllers via DMA from user space. Here he is in a 30 minute talk about exploiting 512 Gbit/s of PCIe in his project called Snabb Switch. He's even written a 10 Gbit/s Intel Ethernet driver in Lua. The idea, as far as I can tell, is you can turn a common Xeon server in to a very low latency, zero-copy, multi-gigabit, software defined, layer 2 network appliance.

https://cast.switch.ch/vod/clips/26uo9i576i/

https://github.com/SnabbCo/snabbswitch/wiki

rektide · on Jan 22, 2014

Intel seems stuck at 2P and HT still has massively lower latency but 80 GByte/s worth of PCIe lanes is huge. As big as main-memory throughput huge! Hence DDIO, which you reference, which allows IO to write to cache and skip the historic data-path it used to take through main memory. AFAIK AMD doesn't have anything equivalent. And they only have 16 lanes on chip: the rest come out of io-hubs.

I'd love to see someone actually try and use all that Intel PCIe IO and report on how utilized those pipes can get. Perhaps someone wants to send the PacketShader people a box loaded with GPUs? That'd be great, thanks!

http://shader.kaist.edu/packetshader/

erichocean · on Jan 22, 2014

Cool project! I wonder if you'd get similar perf from CPUs if you could used Intel's ISPC compiler[0] with the same GPU algorithms. I've found that GPU algorithms also perform substantially better on plain old CPUs, IMO because they use memory bandwidth more effectively.

I too would like to see how far those PCI Express busses can be pushed. :)

BTW We're adopting Intel's DPDK[1] approach to get massive packet processing performance on a single machine. So far we're liking it, but we'll see as it's not in production yet.

[0] http://ispc.github.io/ [1] http://www.intel.com/content/www/us/en/intelligent-systems/i...

manoDev · on Jan 22, 2014

I don't follow hardware too closely, but I'm under the impression the new processors have ridiculously complicated architectures now. Integrated graphics on die, PCI bridges, write-through cache... I remember back in the day it was Processor / Northbridge / Southbridge. Is it still the case? In which direction are they going? System-on-a-chip?