The Database Rant

  •     SQL was invented in the 70s for business analysts to write reports without having to go bug the programmers.
    
    So what? Unix was invented in the 70s because Ken Thompson wanted to play Space Travel. [1]

        Queries are not composable.
    
    Per your example, you want to throw out SQL because using AND and OR doesn't feel right. I'd love to see you compose a 10-line SQL query. Also, views ARE query composition.

        PL/SQL is primitive by today's standards, yet it's the only game in town for certain classes of features, like stored procedures and triggers. 
    
    Untrue. Oracle, the creators of PL/SQL, now let you write stored procedures in any JVM or .NET language (so Jython, Clojure, etc.). MS SQL supports .NET stored procedures. PostgreSQL supports Tcl, Perl and Python. Also, your alternative suggestion of a simple and stupid data store DOES involve fetching all the data from the database into RAM and operating on it in your application. You can already do that with SQL.

        It's not obvious which queries are slow.
    
    Untrue. MySQL can log slow queries and help you analyze them in real time using tools like mytop. I assume similar tools are available for all databases. When you write code it's not always obvious if it will be slow, either.

        No back door for map, filter, reduce
    
    Isn't the WHERE clause filter? Isn't GROUP BY very similar to reduce? SELECT UPPER(firstname) is map? With the ability to define custom functions in different languages that's a fairly powerful tool. Also, comparisons between MapReduce and SQL are misguided, because MapReduce does not operate on tabular data.

    Finally, your solution is based entirely on the idea that people will prefer Lisp to SQL.

    SQL is not perfect for everything, but if you want to suggest a better alternative, start with a solid argument.

    [1] http://en.wikipedia.org/wiki/Unix#History

  • Current databases don't give the developer a way to bypass the optimizer.

    Sure they do. Oracle and other commercial DBs provide "planner hints", which essentially force the optimizer to use a particular access method / join method for a given part of the plan.

    why does SQL decide that declarative style is the One True Way to get access to your data

    Well, part of the reason for a query optimizer is that if the physical representation of the data changes, the queries don't need to change. So that if you add or remove an index, you don't need to adjust all the queries on the table, for example. Obviously the query planner is imperfect, but planner hints are fairly common in practice. The further you stray from declarative queries, the more application changes are needed when physical database properties change.

    Datatypes don't match your PL

    If you want a strongly-typed database system that can be accessed from any language, you need the DB's type system to be distinct from any of the client language's type systems. This isn't always the right tradeoff, of course.

  • I was all prepared to rant about how wrong he was, but he convinced me.

    I've had all kinds of problems with PostgreSQL because its planner would choose a poor query plan in certain situations. Specifically, we don't have a good way of estimating random page fetches for index scans on multi-column indexes.

    I also had this weird problem suddenly occur. When one of my tables got to a certain size (about 40 million rows), Postgres started giving a completely different query plan that it thought was faster. The query execution time went from a few minutes to many hours. It took me all day to debug this problem because you simply don't expect the query plan to suddenly change like that.

    To tell Postgres that you want the query to run in a different way, you must turn off certain things, like disabling sequential table scans or bitmap scans. You can't tell it specifically how you want it to execute. The reason is pretty simple: SQL is a declarative language. You declare want results you want, and the server figures out how it'll get you there.

    I think the author is right. We need a simple language where we can tell the server exactly how we want to get the result. This is essential, because, like the author said, finding the optimal query plan is NP-hard. In practice, most programs run only a few queries. It'd be worth the time invested to try out a few query plans and find out which one is the best and stick with that over having the SQL server replan the query almost every time you submit it.

    There could even be third party planners where it could take SQL input and return the resulting executable s-expr if you want to retain that functionality. This way, the development of planners remains separate from the development of the database.

  • Any attempt to throw away SQL is misguided. If you are going to do server side development, you must know SQL. Thinking in sets is hard work, but you must learn it. For the most part fancy ORM's just get in the way. The solution is not less SQL; it's more SQL. Stored procedures are your friends. When I write a complex query, I develop and test it entirely in SQL, before I write a line of procedural code. It's way easier.

    The thing about query optimization is that the optimizer stops when it thinks going on will take longer to perfect it than executing what it has. If you have a query that takes too long look at the query plan. Break up the query into smaller queries and use temporary tables. Give it hints.

    Dates are often poorly handle in language libraries. Store them as UTC in the database and pound them into shape in code.

    Apologies for the counter rant.

  • His point about composability is not a complaint about what sql does. It's merely a complaint that he thinks that function composition should be used to combine predicates, but that isn't how sql works.

    I.e., he wants "SELECT * from X where P(x)" to be a function acting in X. It isn't a function (or at least should not be thought of that way), it's a description of a set:

    {x : x in X, P(x)}

    The predicates ARE composable:

    {x : x in X, P(x)} intersect {x : x in X, Q(x)} == {x : x in X, P(x) and Q(x)}

  • Have you looked at http://en.wikipedia.org/wiki/MonetDB ?

    It exposes some of the DB guts at a different layer while leaving a SQL "porcelain" feature available for people who desire that.

  • It sounds like his ideal database already exists; AllegroCache. It's an object database, but you can easily restrict yourself to certain data types to make it "relational", if that's what you're after.

    Some of the open source databases are almost as featureful; Elephant for CL and KiokuDB for Perl come to mind.

  • I never went through this phase myself, but I see a lot of developers stuck in it. It's this sort of exasperated anger towards a technology that the author has tried his best to understand but failed.

    For some reason some people never really wrap their head around Relational Databases, Normalization, and SQL. They seem to not be wired for it. And it understandably pisses them off.

    It's people just like this author that are behind all those terrible Object Relational Mapper projects that try to pretend that your data isn't really data. It's just an attempt to escape the obligation of truly understanding how databases work and how they can help you. It's sadly ironic when you think about it.

  • I recently made a GUI for a search utility which is filled in, and then it builds an SQL query from the GUI, and then displays the results of the query in another GUI.

    My point is that you can map a search to whatever language you like - whichever is best for that utility.

    Plus a friend told me about Linq, which is an interesting evolution from SQL. It lets you break up queries over several cores (though that's just because it's iterative in nature).

  • how about a 10+ table join on f(any function)?

  • Those who complain about SQL just haven't used Oracle, fully.