Even then, it's pretty much the only place in the stdlib that extensively uses virtual calls, via std::locale. It's very complicated implementation-wise (I was actually working on an implementation of it once and it's not fun at all) and hard to find places to optimize it.
It's not that it's too heavy for a console app's output, really, but for things like converting numbers to strings and vice-versa (a-la boost::format) it is definitely a lot slower than atoi/sprintf.
There's also the issue that iostreams might be implemented on top of stdio for simplicity. Hard to make it faster in that case!