Understanding Recursive Queries in PostgreSQL
This post is only scratching the surface of what can be achieved with a.. fully armed, and operational battle station.. powered by recursive SQL in postgres. Simply adding a parent_id column to a table (and a few constraints) can facilitate all sorts of useful network topologies. I doubt that the recursive query against a DAG is going to give better performance than a query against a DAG within a graph database, but the relational query may be sufficient for your work and not require the cost of graph db investment.
I think that people need to move beyond simple recursive sql example blog posts and share the more challenging examples. Recursive queries can become challenging quickly.
I use recursive queries in SQL to search persistent suffix arrays (mainly when there is a LOT OF DATA). You can do string search hundreds of gigabytes of data within ~0.004s with SQLite on cheap SSD and pretty much no RAM - my goal is to be able to be able to search and align on all genetic information we have so far (about ~9 trillion base pairs in total)
If anyone knows how to increase INSERT speeds on Postgres, I would appreciate. I can get about 100K per second with COPY or putting the inserts in a transaction. Have tried some more mirrored NVME drives and more RAM with ZFS, as well as dropping some unimportant indexes, but I'm still a little limited. Any thoughts?
Without benchmarking it, my first instinct would still be to design the schema in a way to avoid recursive queries, if possible. I'm thinking of stuff like a simple tree, e.g. categories with subcategories arbitrarily nested. In that case I'd probably just store all the parents as well, not just the immediate category. That is more work in the insert/update case, but makes queries that include all children trivial.
Recursive queries are better mentally modelled as iterated queries building up a result set. If you think of recursion as tracing a tree, or in trivial cases, a linked list, recursive queries are a breadth-first traversal.
You better not traverse too deep (iterate too much), or you lose the data locality benefits that relational databases are tuned for, and all you end up saving is the cost of round trips to the database.
Many tree structures are better modelled as prefix pattern matching on string paths. Extracting a whole subtree with "path LIKE 'foo/bar/baz/%'" on an indexed path column is much faster than following a series of parent links with a recursive query, more and more so the deeper your subtree gets. This denormalizes the tree structure into the path column - you can no longer make structural edits with small updates - but it's much faster for queries.
A well-written, simple introduction to something I've been curious about but never took the time to dig into. Thank you!
A few years ago the company I was working for ran an advanced SQL course for a few people who were interested. I was skeptical going in, but the chap who ran the course was the real deal.
One particularly memorable example he showed was a real world implementation of Dijkstra's shortest path algorithm using recursive CTEs. While it was very impressive I did spend most of the demo thinking "I'm so glad I don't have to maintain that"!
How does a Recursive CTE run, line by line?
https://stackoverflow.com/questions/3187850/how-does-a-recur...
Recursive SQL is fun: https://excessivelyadequate.com/posts/sierpinksy.html
Datalog is a recursive query language. One of the very few alternatives to SQL.