A sequel to SQL? An intro to Malloy

  • This is the example at https://github.com/looker-open-source/malloy

      query: table('malloy-data.faa.flights') -> {
        where: origin ? 'SFO'
        group_by: carrier
        aggregate:
          flight_count is count()
          average_flight_time is flight_time.avg()
      }
    
    
      SELECT
         carrier,
         COUNT(*) as flight_count,
         AVG(flight_time) as average_flight_time
      FROM `malloy-data.faa.flights`
      WHERE origin = 'SFO'
      GROUP BY carrier
      ORDER BY flight_count desc         -- malloy automatically orders by the first aggregate
    
    I don't see much value in this. This is not aesthetically better than SQL. It's also semantically better. This is just a different syntax that would parse to the same AST. And what's with the `?` for equality?!

  • There is also PRQL which is more intuitive and simpler IMO - https://github.com/prql/prql

  • Reading about the "semantic layer" it very much reminds me of the kind of things people do in an ORM. That is: how do tables relate, refinements of data types like strings where a column might have specific semantics... this post doesn't go into much so I don't know if Malloy also allows specifying things like how updates should happen (do you update in place or create new records?), reusable queries (especially given its nesting), knowledge of indexed vs unindexed queries, etc. All of this stuff usually either gets stuffed in the ORM layer, or exists only as folk wisdom about specific databases.

    It is peculiar that databases typically lack referential integrity, something that we've decided is absolutely essential in other programming environments.

  • Out if necessity I've started working with Microsoft's Kusto Query Language [1] which is used by various services in Azure (e g. their Log Analytics Workspace).

    At first I found the language rather akward and was wondering why yet another query language. But the more I used it the more it grew on me. The thing I really like is that unlike the clauses in SQL, the order of operators isn't really fixed and it reads and feels like a pipe command in a Unix shell.

    One example where I find this far superior is when doing aggregations. In SQL I would have to modify both the start and the end of the query, which is quite a nuisance.

    [1] https://docs.microsoft.com/en-us/azure/data-explorer/kusto/q...)

  • Related:

    Malloy – A Better SQL, from Looker - https://news.ycombinator.com/item?id=30053860 - Jan 2022 (99 comments)

    Malloy: An Experimental Language for Data - https://news.ycombinator.com/item?id=28926349 - Oct 2021 (1 comment)

  • My musings on why SQL has been so hard to displace, and what it might take to do so. Any and all feedback appreciated!

  • The minimum enhancement I want for SQL is a version where no literal values are allowed in queries, as this would completely preclude SQL injection :)

    To make that more tolerable for query planning purposes, there would have to be two types of query parameters: compile-time and run-time.

    Next up: why not allow query clauses to come in any order? `SELECT .. FROM .. WHERE ..;` or `FROM .. SELECT .. WHERE;` and so on.

    Since the query parser/planner has to see the whole thing anyways. The parser/planner can't begin coding at `SELECT`, or at `WHERE`, since there might be a `GROUP BY`, or an `ORDER BY` that affect the whole query plan, so all these clauses might as well come in any order. Wanna put `HAVING` first? Sure, why not. It's probably best to insist that table sources all come together rather than be all over, but I think even that doesn't have to be so.

    Also, I'd like an out-of-band mechanism for expressing query planner hints. This would be a separate string or object passed along with the query, and which does things like: identify a table source to use as the outer-most table for the query plan, for each of some or all table sources identify an index to use or temp index to create, for each of some or all joins pick a join strategy, etc. Table sources would have to be addressed as {<CTE_name>, <table_source_name>}, naturally. Such a thing should also allow one to specify indices that should be created on CTEs.

  • It sounds like Malloy might be able to find a niche amongst some casual users, but it will never gain traction amongst DBAs. My reasoning is as follows.

    Being a DBA is more than just writing queries, you have to be able to maintain the database, maintain the security settings, and (most importantly for this discussion) do performance tuning.

    Performance tuning is incredibly important; I have, personally, seen a query that was running in only a few seconds have an absolutely catastrophic performance drop off when a few extra records were added. I had to get help from the DBA to find where it was going wrong and rewrite it to get the performance back to a reasonable state again. (I am talking about query run time going from seconds to minutes.)

    You can't do performance tuning in Malloy I doubt; You'll be needing to run the analyser and rewriting the produced SQL. Since you'll have to know SQL really well to do this, why bother learning Malloy as well?

  • For a much much more mature product in this area with a very strong team behind it, see EdgeDB

  • SQL is around since the dawn of relational database and its hard to replace. The best option for mass adoption is to have drag and drop tools with visualizations like no-code ETL. Template like and markup language or framework are easier to adopt for new developers but majority of the population still tend of stick with the original language.

  • I don't think SQL gets replaced until we move to a different database paradigm. For example, I could imagine a function database model [0] getting paired with a "grammar of graphics" type of metadata for the records to create much more concise query/aggregate declarations, especially where time series are concerned.

    [0] https://en.wikipedia.org/wiki/Functional_database_model

  • I like the VS Code integration that Malloy has. There's pretty limited in-browser tooling for BigQuery so that bit of the extension is amazing.

    But I found practically that it's very hard to get folks that are writing SQL day-to-day to try to integrate a new language on top of something they already understand in and out so I'm thinking about just pulling out the BigQuery bits from their VS Code extension to be able to write SQL with in VS Code with auto-complete and references.

  • If you would like to play with Malloy... You can Fiddle using just a web browser. The Malloy Fiddle uses DuckDB and WASM.

      https://twitter.com/lloydtabb/status/1567671348306264064

  • > there are relatively few database targets that it must support

    Heh. Oh wow.

  • undefined

  • undefined

  • This is the umpteenth attempt at replacing SQL. Just like all previous attempts, it may well address some weaknesses of SQL, however, it introduces a whole new range of issues. SQL has been around so long as it mostly works.

  • It’s nice, but it’s hard to beat the clarity and expressiveness of dplyr and purrr.

  • undefined