Polars author here. Polars adheres to arrow's memory format, but is a complete vectorized query engine written in rust.
Some other key differentiatiors:
- multi-threaded: almost all operations are multi-threaded and share a single threadpool that has low contention (not multiprocessing!). Polars often is able to completely saturate all your CPU cores with useful work.
- out-of-core: polars can process datasets much larger than RAM.
- lazy: polars optimizes your queries and materializes much less data.
- completely written in rust: polars controls every performance critical operation and doesn't have to defer to third parties, this allows it to have tight control over performance and memory.
- zero-required dependencies. This greatly reduces latency. A pandas import takes >500ms, a polars import ~70/80ms.
- declarative and strict API: polars doesn't adhere to the pandas API because we think it is suboptimal for a performant OLAP library.
Polars will remain a much faster and more memory efficient alternative.
Was GPU acceleration considered? I know there's cudf which tries to offer dataframes for GPUs already. But, in my naive mind, it feels like dataframes would be a great fit for GPUs, I'm curious why there seems to be little interest in that.
Some other key differentiatiors:
- multi-threaded: almost all operations are multi-threaded and share a single threadpool that has low contention (not multiprocessing!). Polars often is able to completely saturate all your CPU cores with useful work.
- out-of-core: polars can process datasets much larger than RAM.
- lazy: polars optimizes your queries and materializes much less data.
- completely written in rust: polars controls every performance critical operation and doesn't have to defer to third parties, this allows it to have tight control over performance and memory.
- zero-required dependencies. This greatly reduces latency. A pandas import takes >500ms, a polars import ~70/80ms.
- declarative and strict API: polars doesn't adhere to the pandas API because we think it is suboptimal for a performant OLAP library.
Polars will remain a much faster and more memory efficient alternative.