Ask HN: A Good Alternative for ReCaptcha?
Are there any good alternatives for reCaptcha? It has come to a point that traditional text/sound captcha challenges are trivial to bypass today, and the more distorted we make them, the more challenging they are for humans.
Google went Sparta with their reCaptcha be, and nobody in their right mind should add a script that fingerprints users, specially from an adtech company.
What solutions do you use to thwart bad boots from submitting your forms and automating things where it should not have been?
For bots which are not specifically targeted at your page i simply add an invisible form element named url. Bots _LOVE_ to share their viagra urls. Any request which submitted an url is discarded.
This trick is simple stupid and should not work but somehow the simple spam bots have not improved.
This does not work for sophisticated bots (never met one) or the ones programmed specifically for your site (happens very rarely).
W3C has published an extensive list of reCAPTCHA alternatives: https://www.w3.org/TR/turingtest/
W3C is requesting feedback for the document, if you'd like to make suggestions, please open an issue: https://github.com/w3c/apa/issues
Akismet is a third party service that works really well. You send data there with a HTTP POST and it will reply with a yes or no, it is spam or not spam. It is not that hard to implement. You do have to be aware that you are sending user data to that service, which you have to mention in your privacy policy.
Stop Forum Spam is a similar third party service. You send it an ip address and an email address. It will reply on both items if it is spam, together with a confidence level. Quite interesting way to reply :) It is originally intended to fight registration spam, but you can use it for comment spam or contact forms as well.
JavaScript spamfilters can be very usefull. Most spambots do a HTTP GET for a page with a form. They fill in all the fields and submit it with a HTTP POST. They don't run any JavaScript on that page. You can have honeypot and timeout fields on a form that get manipulated by JavaScript, and spambots will not validate. Works really well, and all transparent to the user. The only "risk" is that in the future spammers might start using more sophisticated spambots, like using Electron or Chromium. I implemented spamfilters like this in a WordPress plugin and it works really well for me: https://wordpress.org/plugins/la-sentinelle-antispam/
As per google "reCAPTCHA is a free service that protects your website from spam and abuse" but instead one can argue that reCAPTCHA is a service that transfer spam issue from the provider to its users, so at the end one provider will be free of spam (I guess) but all of his users will be spamed, tricked, fingerprinted and abused to actually constantly work for free for this 3rd party ant-spam service
I think it is best to design your own captcha around your use case. All you need to do is make the amount of work for spammers too high for targeting your site.
Just recently, I added the idea of a captcha that might actually be enjoyable for users to my list of "things that should exist":
http://www.gibney.de/things_that_should_exist
The idea is to show the user a random image and ask what is on it. If the image is beautiful, that might even be fun. And there are many sites that offer beautiful public domain images. And have tags for everything in them.
There probably are many other funny and enjoyable captcha ideas one could implement.
I had a serious problem with bots spamming my forum. I implemented all the usual captchas but none of it worked. What I found interesting was that I was able to defeat the bots simply by "tricking" them. I kept the old forum up but basically set this forum to auto delete the content on it. I then setup a brand new forum and for whatever reason bots don't spam it, at all. It is almost like the bot goes to the original forum, spams it, and then moves on thinking it has completed its mission. I even flat out disabled the captcha to see if anything changed and nothing did. The new forum never got spammed. I have no idea why that happened but it strangely did work. When I do get spam, I don't think that spam is from bots. It is from humans posting instead but that is at least manageable to clean up.
It kinda leads me to conclude that each developer has to create "out of the box" solutions instead of some plug and play solution. If a plug and play solution is developed then all the spam bot creators start figuring out ways to crack or simply create a service for human based cracking. If unconventional methods are used on each site then it gets more complicated for the spammers.
I’m a fan of the Chinese-style captchas where you just move a puzzle piece with a slider. I have no idea how defeatable it is vs reCaptcha but it’s far far less painful.
The best solution I've ever come to that didn't negatively impact my clients was generating a UUID on the server via an ajax call 100ms after page load. That UUID was stored in a cookie, and returned via AJAX and stuck it in a hidden field on the form.
Server checks cookie != null and cookie == hidden field, and returned a 200 OK regardless of if it failed (used the response text for success or failure indication), and deleted the cookie.
Implemented it across a network of sites ~10 years ago, and only a handful of spam had gotten through when I quit that job 4 years ago. They had been getting 10-20 spam comments per day per site.
> It has come to a point that traditional text/sound captcha challenges are trivial to bypass today
I have yet to see a general-purpose tool to which you can throw any text captcha and it’ll solve it.
Just because there are academic papers that demonstrated it once doesn’t mean there’s still a huge barrier to entry in implementing this solution (which spammers won’t do as long as it’s easier to move onto another target).
There are paid captcha-solving services out there and even those are still powered by humans even though it’s in their commercial interests to automate the process. Them not doing so further suggests that AI is not there yet.
It might be worth considering a honeypot approach. E.g. having a field in the form that isn't visible for users that, when filled in, indicates that it is likely a SPAM submission.
Unfortunately, there aren't many good captcha systems that don't do the equivalent of what ReCaptcha does, because we're at a point where fingerprinting users is a strong signal to help identify contractors doing captchas on behalf of bots.
Even some silicon valley products use captcha-breaker services. These services present themselves as sophisticated APIs but in reality they're just dispatching work to humans who accept pennies an hour at internet cafes; a competition with Amazon's mechanical turk for digital sweatshops. They're common and cheap and the tech industry feeds them. Undercutting the workforce doing the captcha busting is the only viable way to stop that.
Your real alternative is to do the fingerprinting yourself.
There was a good podcast about this [0] just a couple weeks ago. They interviewed the guy who invented CAPTCHA as well as the head engineer on ReCaptcha v3.
The gist of it was that in a few years, all Captchas will be useless because machine learning is too easy and cheap. The only way to defeat spam will be to use reCaptcha v3 or something like it, because those services will use what they know about you to determine if you're a bot or not, plus their own machine learning of what "normal" behavior is for your website. It sounds like ReCaptcha v3 is basically an app level IDS.
[0] https://www.npr.org/sections/money/2019/04/24/716854013/epis...
In my personal blog I am using "Riddler" Drupal module, and have had good experience: https://www.drupal.org/project/riddler
You can create your own Captcha questions / answers. I feel like this is the preferred way of handling spam posts, creating your own custom Captcha implementation.
I have a mail server with a new address generated per post (or per comment for thread functionality) on a blog i run. People then get to mail their comments. For all reputable mail sites I let things directly through, for everything else I use a spam filter turned to 11 together with a mail-back link for post verification.
I have had zero spam the last 8 years.
The code is ancient and runs on an even older version of lispworks with Auth details hard coded all over the place, so I the time it would take for me to share it would be longer than to rewrite it in some hip language.
Had I been lazy and not as privacy conscious I would have let Gmail do the spam filtering for me.
Depends on why you need it.
Captchas work well for telling humans from bots for the purpose of denying automated/scripted access. But here a simple IP-based blacklist works well, because of how many bots now live on Amazon's properties and some such.
You don't need a captcha to filter out bot spam. That's a massive overkill.
stopforumspam.com works well. You can combine it with a simple keyword based filter, have it tag hits with a cookie, temporarily blacklist the IP and then filter them out based on that as well. Auto-submit it to stopforumspam too. Obviously, also have whitelisting in place, e.g. to let through existing customers, previously cleared posters, etc.
For bonus points, first-time posts that look OK may be put into a "shadow ban"-ish mode, whereby they are visible to the posters and mods, but not anyone else. Until they are cleared. This works equally well.
The bottom line is there's no spam that doesn't try to promote something and they aren't likely to target just you, so there's always a keyword/URL you can latch onto, and it also makes sense to participate in a distributed monitoring framework to piggy-back on each other's first hits.
As a user I found geetest [1] to be really friendly and much easier to use than recaptcha. I have never integrated it myself.
Simplest way is to use filtering.
``` (defparameter spam-words '("viagra" "cialis" "v1agra" "c1alis" "tamadol" "hydrocodome" "doxycyline" "prozac" "prozca" "prizac" "doxycyclins" "anx8ety" "amytriptylone" "poker" "laxative" "anatrim" "breast" "penis" "fiorinal" "sexy" "kaspersky" "hoodia" "thyroid" "coupon.com" "vuitton" "coupon" "fetish" "famotidine" "footwear" "sweetwater" "sunglasses" "ninja" "www" "http" "cheap3ddigitalcameras.com" "aquadivingaccessories.com" "tastyarabicacoffee.com" "yourmail@gmail.com" "bit.ly" "cottonsleepingbags.com" "italiancarairbags.com" "newpopularwatches.com" "glasslightbulbs.com" "browndecorationlights.com" "fx-brokers.review" "ceramicsouvenirs.com" "xevil" "senuke" "captcha" "xrumer" "vkontakte" "апрап" "erectile" "spellingscan" "lialda" "lamborghini" "doubles your bitcoin" "pro-expert.online" "specified wallet" "selected wallet" "online casino" "multimillionaire" "win-win lottery" "lottery" "Перезвоните пожалуйста" "yuguhun88@hotmail.com" "meeting-club.online" "from2325214cv" "did you receive my offer" "Domain zone .de" "all your photos" "Pay 1 BTC" "to our bitcoin wallet" "you will be sued" "police will be interested" "hacked")) ```
I had a strange idea about solving this problem: How about a micro-payment, something like $0.01, instead of solving a puzzle? In that case maybe you won't care if many bots login to your website.
I think that I by this time I have the technology to make something like this work, I was wondering if this is a good solution though. What do you think?
Whatever you use, please remember not everyone has good vision / hearing / dextrous mouse control. Captchas can be a nightmare for accessibility. Most of the 'clever' solutions to this will completely block some subset of keyboard users / blind users / eye gaze users along with the bots.
For automation I would recommend ratelimiting endpoints. I personally tend to use 5 requests per IP/second along with 100 requests/minute as default and then override specific endpoints to e.g. 1 request per IP/hour.
For user input I recommend keeping the first comment submitted by a new account/IP hidden until you/moderators have approved it, after which new comments from that user no longer needs to be approved before they become visible to other users.
If it's a problem with spammy blog comments I would recommend to just remove any kind of input on the site and ask people to send you an email with their questions and concerns.
Be sure to use a separate email and give it to readers on your about page via some language like "questions (dash) and (dash) comments (at) (this domain)".
If it's for account signups just send an email confirmation link and possibly include a code in the email that has to be submitted manually as well.
For those who are interested in an alternative CAPTCHA service, we at NetToolKit are putting the finishing touches on a service that we hope to launch at the end of next month (June). The CAPTCHAs are interactive and meant to be fun for the user -- no machine learning training involved. We'd be thrilled to get some early feedback before launch, so if anyone is interested, please reach out via email or via our website (both in profile).
What about PoW (https://en.wikipedia.org/wiki/Proof-of-work_system) ?
It require minimum user interaction, and you will eliminate most of spammers bot, since it will lose its cost-effectiveness. You can implement something like coin-hive proof-of-work, without having to mine monero anyway
Looking at these comments (141 at the time of this post) the answer looks to be: No.
I have small business clients, Google's reCAPTCHA is our best option. They aren't willing to pay for some obscure, and expensive one-off solution that might work. They just want the spam to stop. I fill out reCAPTCHAs every god damned day because I work on the web. Asking "normal" users to fill out a handful each year isn't asking that much.
Maybe for your startup "rolling your own" makes sense, but not for small biz.
I think reCaptcha is very terrible. For HTML forms, a simple question could be used (change them sufficiently often when spam is received), or you may require the user to edit the URL manually in order to access something, based on the client IP address perhaps (which would be displayed). I also invented a protocol-independent CAPTCHA, which is also text-based, and uses SASL. You should allow the user to implement the code themself if they want to do rather than requiring that they use your code.
I also recommend mailing the website owners who uses ReCaptcha about why it's a nuisance and stating that you and many others won't be using the site anytime soon.
Unless your website is under targeted attack just putting "2+3" on a image will block 99.9% of all bots. You hardly even have to distort the image or randomize the math but doing so could help against script kiddies. Only drawback vs reCAPTCHA is you have to show the captcha all the time instead of automatically suspecting bots.
If you are under targeted attack by someone more dedicated, captcha is not going to be the only defense in your book.
For the use case of blocking general web form spam, we've had good results with relying solely on IP reputation crowdsourced via AbuseIPDB:
https://www.abuseipdb.com/about
Occasionally we're an early target of a fresh IP, but we report it back to the database to help later victims. The more people contribute to such a system, the better it gets.
> nobody in their right mind should add a script that fingerprints users
Fingerprinting users is no more a problem than using cookies, there are far more legitimate reasons to use these things than illegitimate. The problem is Google and Facebook using these techniques to spy on people at massive scale.
Once again the problem is Google and Facebook not the internet.
A commenter on HN some years ago claimed a 100% success rate at blocking spam by requiring all web form submissions to be cryptographically signed. This solution struck me as stunningly elegant both by raising the standard for constructive feedback and promoting public awareness of secure communication.
I like those math questions captcha. Fun and I doubt a bot or even a real persona spammer will waste time on this. Make the question appear on an image instead of text and the bot will also have to do OCR on top of being wolframalpha to defeat your captcha.
> nobody in their right mind should add a script that fingerprints users
I helped vendor-select and lead implementation on a fraud solution that was an integration with SiftScience (yc-funded, https://sift.com/), which relies on fingerprinting. This was years ago but I still think about the project and how it plays with user privacy etc. I will say that -- fingerprinting as a component in fraud management is/can be highly effective.
The problem is, once you get into payments fraud through bots, I think the conversation becomes way more nuanced. If you're looking for a solution to bots spamming or throwing bad data into your app, maybe that's a little extreme. But if the choice between privacy and becoming a front for credit card fraud and chargebacks, you're in a choice between who the victims of your service are going to be, and how much ill is done.
Have seen these guys, met the founders a while back at AppSecUSA: https://funcaptcha.com/ It’s those puzzles/games as captchas.
undefined
> Google went Sparta with their reCaptcha be, and nobody in their right mind should add a script that fingerprints users, specially from an adtech company
Elaborate?
Recaptcha is also blocked in China. Users there wont be able to bypass it at all to accomplish a protected task.
Anyone knows of a good alternative that works there?
"went Sparta"???
recaptcha v3 is invisible.
Just ask a math question?
A bit late, but honorable mention: https://xkcd.com/233/
It has come to a point that traditional text/sound captcha challenges are trivial to bypass today
Don't be fooled. Text-based CAPTCHAs are still very effective, unless you're a really large target on the scale of facebook or google. If you design your own text-based CAPTCHA, it's highly unlikely that someone is gonna pull out their ML skills to read your CAPTCHAS just to spam. Too much effort. I wrote my own PHP CAPTCHA more than a decade ago and have it used ever since with virtually no modifications, and not a single piece of spam has made it through on my websites (if you don't count my friends trolling me with silly messages once in a while).
guess most companies are now using not a robot future with any images. meaning when you click on am not a robot you will be moved to do next..
Just out of curiosity, isn't that feasible today to implement some machine learning to stop spammers? Is there any project trying to come from this angle?
How about put your users first and don’t farm them out to Google ML training because someone told you to. Recaptcha is a cancer on the web.