Python vs Julia – an example from machine learning

  • As others have pointed out improvements could be made to improve performance of the Cython code. However it seems to me the take away from this post should be in this application Julia has a better performance profile while being as simple and expressive as Python [1].

    As a Python programmer that often works with these sorts of problems this reinforces my perception that Julia, while not suited to all programming problems offers a productivity gain when it comes to mathematical/algorithmic computing.

    Sure I could sit and tweak the Cython code or rewrite the hot loops in C++, but it seems to me Julia hits a sweet spot where I am less likely to have to compromise programmer performance for computational performance.

    [1] I will note however that the code in the OP doesn't seem like the clearest implementation of the algorithms in either language, some of that heavy nesting should be factored for a start.

  • If the objective is to get a fast implementation of isotonic regression, I think the priority would be to choose the right algorithm. PAVA is indeed one of the fastest such algorithms and has a running time bound that is linear in the number of data points. I may be wrong, but from a quick look at the code it seems neither the [P/C]ython version nor the Julia version has an implementation that ensures linear time. The files are named Linear PAVA so I believe there was an expectation that they are indeed linear time.

    Consider applying it to this sequence

         1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, -30.
    
    It seems that the number of operations required according to these implementations would be quadratic in the size of the sequence. (It will average the tail repeatedly.)

    I cannot read Julia yet, so not sure about the Julia version, but the others do indeed look very much a quadratic time algorithm at the least. The behavior will of course depend on input, so on certain inputs they may have a runtime that scales linearly. Complexity addresses the worst case behavior and as I showed coming up with such bad cases is not hard, and they need not be very un-natural either.

       OTOH: 
    
    If the objective was to compare speeds of CPython, Cython and Julia, then it does not matter whether an efficient algorithm was chosen for the task. My hunch is, because the current implementations computes more than necessary, it is going to magnify the speed difference between Cython and Julia.

    Does Julia do any inlining ? Or are there any plans for such.

    If so, that would be one aspect where it could shine. As mentioned in the comment the Cython version is unfactored and manually inlined for performance. If Julia inlines in its JIT pass, such measures would be unnecessary and would allow one to write cleaner code.

  • Very cool. We had a similar experience when implementing the Simplex method for linear optimization, except we evaluated PyPy versus Julia instead of Cython. Its similar in that it benchmarks "just" the language, not BLAS, which is common pitfall I see when people compare the two. I guess at this point, a year or so later, it isn't surprising to me anymore that Julia is as fast as it is for these sorts of computational tasks - its been demonstrated pretty comprehensively at this point.

    Slides: http://www.mit.edu/~mlubin/informs2013.pdf

    Paper: http://arxiv.org/abs/1312.1431

  • This looks dodgy to me. The purple and red lines on the graph don't end up parallel, so the active set algorithm appears to have time complexity O(n) in Julia, but O(n^2) or O(n^3) in Python. Can that be explained?

  • No kidding the Active Set solution in Cython is slow, it barely leverages Cython at all! It uses a Python list which is managed by Python and uses none of Cython's static typing.

    Implement the algorithm inside a "with nogil" and you will see some nice speed-ups. It won't look pretty, but for certain performance-sensitive code it will be worth it.

  • Coming from R, I have to ask... Why am I seeing 'for' and 'while' loops? Why isn't this code vectorized? It looks like both implementations suck.

  • 1. Where is the source for this benchmark?

    2.http://benchmarksgame.alioth.debian.org could be a bit more representative of broad-based algorithmic performance.

    3. There are lots of Python libraries for application features other than handpicked algorithms. I would be interested to see benchmarks of the marshaling code in IJulia (IPython with Julia)

  • How much time is Cython spending making the copy into the solution array?

    (Or is Julia making a copy and I just don't see it?)

  • If you need speed wouldn't the ability to write as a C extension beat using Julia?

  • Another good example how dynamic languages can be made to execute faster with good native code generation support than having to force people to use C or similar.

    Looking forward to Julia's JIT improvements. This is only the early days.

  • Wait, are we looking at the same graphs? Doesn't this show that for large problem sets cython and julia are pretty much equivalent as long as you choose the best algorithm?

  • tl;dr: Julia is killing it.

  • Why not use C++11 if performance matters?

  • Why are you not using typed memory views in Cython? That might easily close the performance gap.

  • Would love to see numba added into this benchmark.