Yacc is not dead

  • This is one area that has always disappointed me. Compiler-compilers just never seem to have lived up to the promise of making it easier to define your own languages. If anything, languages like Ruby have done a far better job (albeit in a more limited way).

    I wonder at the reasons for this. Is this simply the wrong theoretical approach? If you look at formal grammars and how Yacc, ANTLR, etc deal with them, they're intended to be unambiguous. This is actually a problem in many ways.

    PEGs (Parsing Expression Grammars) are a relatively new (~10 years) development that I find far more interesting for this reason as they solve the ambiguity problem by simply ranking the alternatives, which can have hideous backtracking possibilities (and tend to expensive in space terms) but in practice seem to lead to more natural definitions (at least in my limited experience).

    I just have to wonder if we're going about this whole language thing the wrong way.

  • I have used yacc and its offspring like bison for a long time. Writing the grammars is really simple, but debugging them is really arcane. The problem is that the place where the parser generator discovers the error can be some distance from where you introduced the ambiguity. Grammars have their own debugging style, particularly with yacc/bison output. Eventually you have an aha moment and you start looking for an improper empty state or trying a reduction at the wrong place in the parse tree. It isnt hard if you grasp it and probably an intractable bitch if you dont.

    I will agree however, that the parser and the bison source look like someone barfed C all over the floor, unless someone has given it serious TLC in the past dozen years.

  • What do people use yacc and other parsing tools for these days?

    Looking at the uses page for ANTLR yesterday, it seems to be entirely DSL (query languages etc) and language tools (Java src code transformers, coverage checkers etc).

    It seems that if you just need a text format, you'll probably settle for JSON (if you're happy to output data-structures directly) or XML (if you need to design a schema or interop with one). They are both general, non-tailored formats, making specific formats easy to extend. Less specific, more flexible.

    Recently it was said here that most modern programming languages (surprisingly) use hand-made recursive descent parsers. This gives maximum power and performance, at the cost of complexity and being a lot of work.

    BTW: javacc is a nice update of yacc IMHO, using EBNF, eg. you can write repetition with an * instead of breaking it into A:: aA|- . But it doesn't seem to have an active community around it (cf ANTLR), which is sadly a crucial factor for practical use.

    However, I personally have always written manual recursive descent parsers (including XML formats), instead of using tools. For me, yacc et. al. just get in the way. But it's such a cool idea! BTW: There's a body of research called "grammarware", that didn't seem to take off; and I've heard that everyone who looks at grammars believes that can make it simple and easy to use, and set out to do so, but no one ever succeeds (...yet, anyway). http://www.cs.vu.nl/grammarware/

    In my humble (and I admit, extremely unpopular) opinion, XML represents an amazing leap forward towards this goal: an XML Schema (or a DTD) is a grammar, and many parsing issues just disappear using this simpler, less powerful approach. For example, it has to be deterministic (aka 1-unambiguous); you can't have recursion within the one level of an element (aka tree grammar). I just wish it weren't so ghastly and soul-destroying to use. You have to look past the surface to see its disguised beauty (its beauty is "very effectively disguised").

    funfact: Eric Schmidt wrote lex http://en.wikipedia.org/wiki/Lex_(software) Now, you might pooh-pooh lex for various reasons, but you can't call Eric a non-technical CEO.

  • Mathew Might video "Yacc is Dead" at Stanford from Feb 2011. http://www.stanford.edu/class/ee380/permlinks/Might.html. Slides are also available.