Our recent server issues
I expect that what they've pointed to as the cause is only part of the problem. We'll never know the full picture unless they share it.
Being upfront, my experience of the people in charge of the organization there doesn't have much goodwill left. Nothing against Thibault personally, I think he's done some great things but seems to be busy with whatever he's interested in and management isn't it, and he has some unprofessional people with access that work for the project/charity.
I had volunteered my services years ago as a System Administrator (no charge) with suggestions, but with limited modes of communication, multiple issues going stale, no response with an auto issue closed. They have issues that don't get addressed, and their process doesn't appare to be aimed to cultivate qualified volunteers or improve the bus factor of the project.
To make things worse, when I expressed disagreement with constructive feedback regarding one of their process decisions, one of the other dev's with access apparently took offense and the next day I found my lichess account had been edited by an admin without notice or notification. I could not log in (wrong password), the password and email for password recovery were changed, and trying to access the profile URI directly showed the account as banned.
It seemed this was done out of spite, and definitely without any kind of due process. Appeals by email went unanswered within the 90 day cutoff I gave them. As a result I submitted a complaint to the french charities regulatory body and moved on since the group wasn't worth wasting any more of my time. I haven't heard back so who knows if anything came of what I reported.
In my opinion, they've got more internal problems than they let on, and to me this is just spillover.
Its unfortunate because any failure like this impacts so many people, but I don't find it surprising given my limited experience of the people there.
> Eventually there was no way of keeping up with the queue
A chess congestion pile up... sounds like an event stream rook-ie mistake. :-)
Seriously congrats to Lichess for growing. It's an amazing site. Donate if you can.
I'm not sure I completely understand. They say that the only thing that was affected was the tournament because its events needed to be processed synchronously, but I remember the entire site being unavailable for people. Was that unrelated?
On a side note, huge props to Lichess, the fact that they can compete with chess.com which has so many resources behind it is very impressive. Everyone who plays chess should consider becoming a patron.
Hikaru was live streaming the event so you can see the series of failures and how they affected the tournament here: https://youtu.be/YKfvNl8UoxA
Anyone take a look into the code and see why it can't be parallelized? The bottom of the FAQ mentions that but I'd think at least certain aspects should be parallelizable or at least be prioritizable (like maybe forgoing leaderboard updates to focus on more important events?)
For some things there is simply no way to parallelise events when things need to be processed in order, and the limitation is simply the processing capacity. In this instance I'm surprised Lichess didn't have surge capacity to automatically scale up for a big tournament like this. Sure, it might be expensive, but that's when optimisation should come in (to save costs). Having to cancel a major event like this due to poor optimisation is a deployment failure moreso than a coding failure.
You could maybe paralyze the matchmaking process by making buckets in which players of all the elos are ranked. Nobody in the top 100 will be paired with someone who is in the last 2000 anyways. So why not split it up on every 1000th player and treat them separately.
The leaderboard then only needs to modified (read can be done without race conditions) by the top 100 or whatever.
The "easy" solution is to put a cap on tournament size.
Apart from when Agadmator wants his fans to be in the same tournament as Magnus Carlsen etc., there is basically no need to hold chess tournaments with over 1000 players.
It would be very interesting to hear about the technical details!
undefined
Sounds like they need backpressure. Isn't that the usual solution to a queue growing without bound?