Excited to share 𝗠𝗔𝗥𝗥𝗢𝗪 — a native Apache Arrow implementation in #Mojo🔥
Every major data tool speaks 𝗔𝗿𝗿𝗼𝘄. PyArrow alone pulls 300 million downloads a month — one implementation, one distribution channel. Mojo deserves a native implementation.
Still in early development, but after a major revamp here's where 𝗺𝗮𝗿𝗿𝗼𝘄 stands:
- Arrays, builders, compute kernels
- Python bindings with PyArrow-compatible API
- Zero-copy PyArrow interop via C Data Interface
- SIMD-vectorized bitmap operations and compute kernels
- Experimental GPU support for NVIDIA, AMD, Apple Silicon thanks to Mojo
Performance is looking exciting, early benchmarks show 1.3–3.9x faster Python-to-Arrow conversions than PyArrow itself and faster bitmap operations. Take the benchmark numbers with a grain of salt since PyArrow is heavily optimized and there can be missing pieces, but marrow is at least in the same ballpark. Compute operations with pre-loaded GPU arrays are showing great numbers too.
The implementation is far from being complete, there's a lot to build. As always, if you're into systems programming, data formats, or GPU compute in Mojo, contributions are more than welcome! Special thanks to Marius Seritan for joining the effort!
🔗 https://lnkd.in/dPCkvhhq
#Mojo #ApacheArrow #DataEngineering #OpenSource #GPU