This argument in this article is mainly based on a naive realloc() that does all...

This argument in this article is mainly based on a naive realloc() that does allocate-copy-free, but on a system with virtual memory the memory allocator should be able to reallocate by modifying page table entries, which is far faster than copying the data around; in the (highly improbable) unfortunate case that there are no contiguous VAs on every reallocation, this method still has to move a quadratic number of PTEs, but for the given example of growing to 1M, assuming 4K pages that's only 256 PTEs in total - and 1 + 2 + 3 + ... 256 is 32896. Assuming each PTE is 4 bytes, that gives 131584 total bytes moved, in the pathologically worse case.

And if you’re lucky the OS’s virtual memory system will do some magic with page tables to make the copying cheap. But still, it’s a lot of churn.

There is no actual copying of data, and as the numbers show, a little over 128K for zero-copy reallocations via VM is still 16x less than the 2M of a doubling, copying buffer.

Thus, a better strategy could be allocate-copy-free with doubling size for small buffers, and resizing in page-sized increments allowing realloc() to do zero-copy via VM for large buffers.