Show HN: Open-Source ETL to index data

I’ve built an open source framework (CocoIndex) to prepare data for RAG with my friend.

Features: - Data flow programming

- Support custom logic - you can plugin your own choice of chunking, embedding, vector stores; plugin your own logic like lego. We have three examples in the repo for now. In the long run, we also want to support dedupe, reconcile etc.

- Incremental updates. We provide state management out-of-box to minimize re-computation. Right now, it checks if a file from a data source is updated. In future, it will be at smaller granularity, e.g., at chunk level.

- Python SDK (RUST core with Python binding)

Sincerely looking for feedback and learning from your thoughts. Here is a quick start video tutorial too. https://youtu.be/gv5R8nOXsWU?si=GnvOF8LqBD9zE82K

Thank you so much!