Using thread pools/connection pools could help a lot both in performance (less time to create threads) and memory use (less fragmentation + less memory to garbage collect after short-lived allocations).
There's a guide somewhere about using connection pools, multiprocessing and async I/O vs. pthreads, showing that the most obvious solution to the concurrency implementation does not necessarily has the best performance, but couldn't find it right now.
EDIT: polls->pools typo