Hacker News down, unwisely returning HTTP 200 for outage message
As people mentioned this site is an exception to how to do things, in that PG actively does not care about search engine results. However, for the people who are interested, here's a few ways you can handle a situation like this.
1. If you add the "max-stale" header to your content you can tell your CDN to hold on to things longer in the event of an outage. What happens is that the CDN will look at the cache time you set and check in as normal when that expires, but if it gets back no response or a 503 code (server maintenance status code- if you work with CDNs this is your friend!) it will continue serving the stale content for as long as you tell it.
2. Lets say you're beyond that, or your site is too dynamic. Instead of setting up an error page that responds to everything, setup a redirect with the 302 status code (so crawlers and browsers know it's a temporary thing). Point that redirect at an error page and you're golden. The best part is these types of requests use minimal resources.
What I do is keep a "maintenance" host up at all times that responds to a few domains. It responds to all requests with a redirect that points to an error page that issues the 503 maintenance code. Whenever there's an issue I just point things at that and stop caring while I deal with the problem. I've seen webservers go down for hours without people realizing anything was up, although with dynamic stuff everything is obviously limited. The other benefit to this system is that it makes planned maintenance a hell of a lot easier too.
Oh, another thought- you can use a service like Dyn (their enterprise DNS) or Edgecast to do "failover dns". Basically, if their monitoring systems notice an issue they just change your DNS records for you to point at that maintenance domain. You can also trigger it manually for planned things.
This was all me. I probably should have thought about it more, but just wanted to make it clear we knew something was wrong and were working on it. Load was not a concern.
Though the article is correct, with everything else that was going on response codes and cache headers were the least of my worries.
I think the best takeaway is that you will go down at some point, so it's best to have a reasoned plan in place for when you do. Handling it in the heat of the moment means you'll miss things.
HTTP 200 = "Cloudflare, please cache this status message instead of passing through a million requests to our dead server while it's busy restoring a backup".
PG doesn't care about HN's search listings, so there's no drawbacks to doing that.
Browsers tend to cache 200 OK responses. When HN came up (as reported on Twitter) I kept getting the error page until I bust the cache and reloaded. Yup, that's what 200 OK for an error page can cause. A regular reload will still show your _down_ page
I know that pg "actively does not care about search engine results", but HTTP spec has other applications besides Google pagerank. It's hard to build amazing new technologies and improve the Web if people keep ignoring the standards without a good, technical reason. Please, for the sake of the example set to others, send the proper HTTP codes.
This isn't bad for just Google. RSS aggregators were also getting the "everything's fine" message. I thought I had a bug in my aggregator until I went to the site and realized it was down.
He's right that for most sites this would be undesirable, but PG has stated that they aren't looking for a lot of Google traffic. But then, just because it doesn't matter to PG doesn't mean it doesn't matter at all.
I've more than once found myself Googling old HN threads I'd like to find, but can't. Google's search (and site search) is miles better than HN's, but HN intentionally limits Google's crawl rate, thus limiting the amount of content crawled and indexed.
Just hit HN this morning and got the downtime message. Then I remembered this post and did a hard refresh to get back to normal. Browsers certainly aggressively cached it.
I am not a news junky, but I realized how dependent my newsfeed is from Hacker News. Seeing about 6-8 times the outage message reminded me on why I keep coming back and reading the quality entries of these forums.
Welcome back, HN - I had an unusually productive (yet unstimulating) day at the office.
I was 8% more productive today. Down-vote at will.
BTW I just realised that Chrome has been serving me a cached version of HN the whole time. Didn't realise it was up again. When did it go back on-line? An hour ago? More?
By the way, HN still says it is down at this url
@HNStatus helped keep me updated but it wasn't posted on the maintenance page until the end of the day for everyone.
Now i know why i had to refresh when i first opened HN 10 minutes ago even though it was up at that time.
I didn't know this was up until somebody told me to ctrl-reload. Nice one with the 10yr cached soft-500 page. :P
Now that our brittle forum is back, let's get back to work nitpicking the Android UIs that aren't quite beautiful enough! This is not enough drop shadows!
Even if the author is correct in that Google's ranking algorithm for HackerNews will be affected by just 24 hours of downtime, wouldn't the algorithm update itself back to normal over the next 24 hours?
Seems like a fair point in principle, except in this case HN is one of the few sites I don't need Google to get to and don't care about any other tools that might rely on returned status codes.
I know it suppose to make you more productive when it's down (I was), but suddenly I felt clueless during my commuting (which can take 2 hours total).
For anyone interested in HTTP statuses this is a great resource:
Question: Is the HNsearch still being worked on? It has broken "link" and "parent" for search results.
I just realized I've addicted to HN, have been kept refreshing the page millions time :) thanks HN
Some people where speculating that HN banned goodle and other search engines. But from at least /robots.txt I can't see that. Do they do any ip based filtering? Does any one have information on that? I'm just curios.
Maybe this was a prior error message, but the original CloudFlare "Origin Server" error was returning a 520 - which is a CloudFlare-custom HTTP Status Code.
Edit: CloudFront -> CloudFlare
By the logic of this blog post, a page like status.heroku.com should return a 503 when heroku is experiencing downtime and a 200 otherwise.
200 means that the page loaded as intended (which it did).It turns out that some of the page's content (the interesting stuff) was unable to be loaded, and the site's content reflected that.
A 503 would be appropriate if there was a server problem, which might have actually been the case, but with Cloudflare's landing page there was not actually a server problem (since cloudflare served the substitute content properly w/o error)
Anyone knows the reason for the downtime ?
CloudFlare really messes a lot of things up. I've seen CloudFlare refuse to give me error responses from forms before. Enter a bad value, get a cached page of the empty form, lol. Server was trying to return a page explaining the wrong entry, but CloudFlare refused to send it to me because it had a non-200 response.
ah. this explains why HN has been telling me it's down for 2 days now. had to open it in incognito to realize it was a cache issue.
undefined
The site is hella slow too, what on earth do they run it on ? 512mb VPS?
HN is probably one of the more unreliable sites on the internet.
who cares...
undefined
Of hundreds of comments I have made on this site, only one has been snarky. Here's my second: chill your tits.