Apache Arrow is 10 years old

  • if I could tell myself in 2015 who had just found the feather library and was using it to power my unhinged topic modeling for power point slides work, and explained what feather would become (arrow) and the impact it would have on the date ecosystem. I would have looked at 2026 me like he was a crazy person.

    Yet today I feel it was 2016 dataders who is the crazy one lol

  • We use Apache Arrow at my company and it's fantastic. The performance is so good. We have terabytes of time-series financial data and use arrow to store it and process it.

  • I laugh everytime I have to explain that "Apache Arrow format is more efficient than JSON. Yes, the format is called 'Apache Arrow.'"

  • What's the difference between feather and parquet in terms of usage? I get the design philosophy, but how would you use them differently?

  • Its nice to see useful, impactful interchange formats getting the attention and resources they need, and ecosystems converging around them. Optimizing serialization/deserialization might seem like a "trivial" task at first, but when moving petabytes of data they quickly become the bottlenecks. With common interchange formats, the benefits of these optimizations are shared across stacks. Love to see it.

  • I like arrow for its type system. It's efficient, complete and does not have "infinite precision decimals". Considering Postgres's decimal encoding, using i256 as the backing type is so much saner approach.

  • I had to look up what Arrow actually does, and I might have to run some performance comparisons vs sqlite.

    It's very neat for some types of data to have columns contiguous in memory.

  • We contributed the first JS impl and were helping with the nvidia gpu bits when it was starting. Some of our architectural decisions back then were awful as we were trying to figure out how make Graphistry work, but Arrow + GPU dataframes remain gifts that keep giving.

  • stupid question: why hasnt apache arrow taken over to the point where we are not longer dealing with json?

  • I read that entire page and I could not tell you what Apache Arrow is, or what it does.

  • undefined