Graph Databases 101
I have spent a lot of time figuring out how to deal with a large graph a couple of years ago. My conclusion - there will never be such a thing as a "graph database". There are many efforts in this area, someone here already mentioned SPARQL and RDF, you can google for "triple stores", etc. There are also large-scale graph processing tools on top of Hadoop such as Giraph or Graphx for Spark.
For the particular project we ended up using Redis and storing the graph as an adjacency list in a machine with 128GB of RAM.
The reason I don't think there ever will be a "graph database" is because there are so many different ways you can store a graph, so many things you might want to do with one. It's trivial to build a "graph database" in a few lines of any programming language - graph traversal is (hopefully) taught in any decent CS course.
Also - the latest versions of PostgreSQL have all the features to support graph storage. It's ironic how PostgreSQL is becoming a SQL database that is gradually taking over the "NoSQL" problem space.
If anyone's curious about Network Science/Graph Theory in general here's a free online textbook used by a grad student friend of mine
Question as someone new to graph databases: Are there any open source graph databases worth looking into?
Everybody's focused on graph databases here but let's talk about Cray! One of the most forward-thinking computer technology companies ever to exist is starting to get out there again. If they got a few hundred million dollars from an outside investor, they could do friggin' incredible things. They already do incredible things but not out there in the way it so easily could be.
I am huge fan a graph-y stuff. I did several iteration over a graph database written -- in Python -- using files, bsddb and right now wiredtiger. I also use Gremlin for querying. Have a look at the code https://github.com/amirouche/ajgudb.
Also, I made an hypergraphdb, atom-centered instead of hyperedge focused in Scheme https://github.com/amirouche/Culturia/blob/master/culturia/c....
Did you know that Gremlin, is only srfi-41 aka. stream API with a few graph centric helpers.
edit: it's srfi 41, http://srfi.schemers.org/srfi-41/srfi-41.html
The author's next post describes RDF and SPARQL in the context of the Cray Graph Engine:
http://www.cray.com/blog/how-cray-graph-engine-manages-graph...
I've seen people using graph databases as a general-purpose backing store for webapps/microservices. What are people's opinions about this?
My feeling is that graph databases are not suitable/ready for — for lack of a better term — the kind of document-like entity relationship graphs we typically use in webapps. Typical data models don't represent data as vertices and edges, but as entities with relationships ("foreign keys" in RDBMS nomenclature) embedded in the entities themselves.
This coincidentally applies to the relational model, in its most pure, formal, normal form, but the web development community has long established conventions of ORMing their way around this. The thing is, you shouldn't need an ORM with a graph database.
1-Would it be more efficient to store objects that contain its relations if you only do (simple) read operations? (e.g. JSON database)
2-Instead, do graph DB engines try to break through bottlenecks for big data and analytics scenarios?
It introduces false dichotomy "graph vs relational".
In fact, most (if not all) graph algorithms can be expressed using linear algebra (with specific addition and multiplication). And matrix multiplication is a select from two matrices, related with "where i=j" and aggregation over identical result coordinates.
The selection of multiplication and addition operations can account for different "data stored in links and nodes".
So there is no such dichotomy "graph vs relational".
Anybody know dgraph.io? it's a Scalable, Distributed, Low Latency, High Throughput Graph Database over terabytes of structured data. DGraph supports facebook GraphQL as query language, and responds in JSON and the storage engine is facebook rocksdb a very fast database. see more in https://github.com/dgraph-io/dgraph
One of the biggest challenges in databases is handling concurrency and sharding, wish this would have talked a bit more about how that changes between a graph database and a relational database.