mikeash.com: just this guy, you know?

Posted at 2011-11-26 18:42 | RSS feed (Full text feed) | Blog Index
Next article: Friday Q&A 2011-12-02: Object File Inspection Tools
Previous article: No Article For you!
Tags: meta
Testing Hashcash-Based Anti-Spam Measures
by Mike Ash  

Due to a recent uptick in spam comments (I've had to delete literally several this week!) I decided to dust off my dormant JavaScript-based hashcash implementation and drop it into the comments system as an added anti-spam measure.

In short, hashcash is a scheme that uses a problem that's difficult (but not too difficult!) to solve but easy to verify to prove computation. In this particular case, the problem is producing data whose SHA-1 hash contains a certain number of leading zeroes.

When you click in any of the comment fields, the JavaScript fetches a unique salt from my server. It then begins a brute-force computation, searching for data which, when appended to that salt, produces the requisite number of leading zeroes in the SHA-1 hash. Once it finds one, it saves that value and enables the "Post Comment" button. The required number of leading zeroes is configured to take, on average, a reasonable amount of time to compute. This time is large enough to deter spammers trying to post stuff immediately, but small enough not to get in the way of legitimate users. It's currently configured to use 18 leading zeroes, which gives a roughly 10 second computation itme on a modern Mac.

On the server, the story is much simpler. All it has to do is keep track of the hashes that it gives out, and verify the hashcash that comes back in. Verification is simply a matter of checking that the salt is one it previously gave out, and checking that the salt + hashcash does indeed produce an SHA-1 hash with the required number of leading zeroes. While it takes several seconds of computation to produce the hashcash, it is essentially instantaneous to verify, producing no real server load.

Note that this scheme also works fine on iOS devices, but takes somewhat more time to compute the hashcash. It should still take much less time than it takes to compose a good comment, though!

For those of you worried about battery life, the hashcash computation only kicks off when you actually focus one of the comment form fields, so simply reading a blog post doesn't add any additional load.

Please feel free to play with the new system in the comments to this post. I hope that it will deter spammers while not impacting legitimate commenters. Also, it was fun to write.

Did you enjoy this article? I'm selling whole books full of them! Volumes II and III are now out! They're available as ePub, PDF, print, and on iBooks and Kindle. Click here for more information.

Comments:

Greetings from this new hashcash-enabled world!
Hello, I am commenting for the sake of commenting. Also for the sake of understanding what's happening as I click in these fields and invisible javascript numbers are tossed back and forth. Happy day after day after US Thanksgiving to all.
Oh noes! My CPUs! they burn unnecessarily! :-) (/me pulls out stopwatch to measure performance)
The computation took quite a while on my iPad 2 (I didn't time it but it was at least 30 seconds), and while it was churning it was just about impossible to type.
Silly me, I never thought to actually test UI responsiveness. I've tuned how the computation runs and it should be much better now. I'm posting this from my iPad as proof.
Yup, much better.
I'm trying to come up with some funny way to say that you should have implemented the routine in Flash to use even more CPU time but I got nothin'.
Interestingly, you actually want the hashcash implementation to be as fast as possible. If you make it artificially slow, then an adversary could come along and implement their own that's much faster and defeat your inherent rate limiting. For example, if I calibrate the hashcash to take 10 seconds of CPU time and then somebody comes along and writes an implementation that's 1000x faster, they only have to dedicate 1/100th of a second of CPU time for each post.

As it happens, doing it in JavaScript is about 100x slower than doing it in C using CommonCrypto, so that certainly could be done here. But you'd have to dedicate quite a bit of development time developing a special client just for my blog comments, and at that point I'm not really going to worry about it too hard.
It definitely didn't take anywhere near 10 seconds on my machine (late 2010 11" MBA, 1.6GHz). It was a small blip on my Activity Monitor and then went away just as quickly.
Since it's a probabilistic computation, it can sometimes go really fast (or slow). It's just trying different pieces of data until a match is found, which could be the first item it tries or the millionth. On average, it'll be around 2^18 attempts, but it'll vary.
Do I get a Bitcoin upon successful comment submission?
The problem with this is that JavaScript implementations of hashcash are 10000x slower than native implementations. Any acceptable comment-posting delays mean that I can write a native blog spamming tool with more than acceptable performance. :)
The thing is, that if you decided to write your spamming client in javascript, this is trivial to get around.

document.forms[0][0].onfocus();
document.forms[0][0].value = "spam";
And then just wait for the "Post Comment" button to become available. Since you've already implemented the JS to actually do the computation, there isn't any added complexity to the "spam" client.

I would think this is about as good as just having a hidden field you expect to be a certain value, that JS fills in that value onfocus()
My assumption is that waiting ~10 seconds for the button to become available imposes an impractical time drain on a spammer. They need to work in incredible bulk and that means they need to work fast.

This also hits human spammers, who have similar time constraints. They'll want to open the page, paste paste paste submit, all really quickly. This forces them to wait.

As a bonus, it also discourages legitimate commenters with nothing important to say.
Forgot to mention: yes, a specialized tool could defeat this easily by computing the hashcash in C in 0.01 seconds instead of in JavaScript in 10 seconds. But the goal isn't to defeat all potential adversaries, just all that I expect to come across. Nobody will implement a C hashcasher just to defeat my site.
I have always assumed "spammers" employ various types of automation, and I wouldn't say it is out of the realm of possibility that someone would script the above (I could do it trivially in less than 30 minutes by making a Safari Extension, opening the posts I wanted to target in separate background tabs and let the custom JS do their own thing forever).

Your idea is interesting, but at best is a nuisance for spammers. At worst, a nuisance to spammers AND nuisance to your readers.
A nuisance to spammers is all I need. 30 minutes of programmer time plus the resources that Safari would take up is worth far more than spamming my site is. I haven't had to delete any spam since I set the thing up, so clearly it's working. Whether it works better than the previous scheme (which prevented spam for a while every time I changed the magic word) remains to be seen.

How could it be a nuisance to my readers? If you mean because it prevents people from being able to write and submit a post in under 10 seconds, well, that's just the sort of nuisance I want to give to my readers.
Typically, if you implement any custom anti-spam measures at all, the spam rate will drop to almost zero because the automated tools no longer work.
Mine tends to drop to zero for a couple of months, then rise back to a handful of spam comments a week. I'm not sure how to explain that, as it doesn't seem worth anybody's while to automate this particular site's anti-spam measures, but human-driven spam shouldn't be deterred at all by small changes.

I really should check my logs for the IP address that spam comments come from and see if the requests show anything about the nature of what's posting them. I'll have to remember to do that next time one shows up.
Took less than a second for me. :|
test.
So does this system actually work? As others have noted, JavaScript is much faster these days and so you probably need more zeros.
It hasn't stopped spam completely, but the spam that remains appears to be posted by actual humans using the form just like the rest of us. I can't fathom how it could be worth their time, but that's what the logs tell me.
rwerwerwerw
It's a good point about the efficiency of the hash implementation being a remaining issue. How long before botnet operators implement C SHA-1 into their zombie programs in order to render these counter-measures ineffective? Assuming the efficiency of C SHA-1 is really that much greater than JavaScript - anyone got any benchmark results?
I did some quickie benchmarking when I wrote the implementation, and as I recall, my result was that a native hashcash solution was about 100x faster than doing it in JavaScript in a modern browser. This is probably enough to render the whole exercise pointless on a large scale. If the hashcash is tuned to take 30 seconds in the browser, then a native implementation would take about a third of a second per post. It's possible that this would still be enough to discourage spam to an extent, but it's not a huge obstacle.

On a small blog like this, any custom solution is enough to cut down on spam substantially, at least for a long time, because it's just not worth it to write any custom code for it, no matter how easy.

As JavaScript engines improve (and I assume that crypto will be a particularly important target for optimization), that will allow the hashcash difficulty to be turned up, lessening the advantage of a native implementation. So it could become a truly useful, scalable solution eventually.
Just wants to see how this works
Took less than a second for me as well.
Just for trying live demo
Just testing here.
This should be everywhere
I have written a number of web applications with public/anonymous commenting systems. If I implement a native Hashcash support for JavaScript, then spam is non-existent. Just for fun, I log all spam comments thrown out, due to not passing the Hashcash test, and I've seen upwards of 600 spam comments PER DAY thrown out due to not presenting a valid Hashcash token.

Hashcash is only one valid client puzzle, among many though. For Wordpress, there is a "hashcash plugin", but when you look at the source, it's actually not Hashcash. Instead, it's just a random client puzzle. But, proof-of-work client puzzles work all the same.

Well done on implementing Hashcash though. If more people would do this for their web applications, we wouldn't need those ridiculous CAPTCHAs.
testing from a Samsung Galaxy S3. still waiting for the post button to become visible...
pretty cool isnt it
Just testing this to see how it works! ;)
Just testing on an iPhone 6 Plus, button appeared straight away.
at least for me it takes literally half an eternity to calculate the thing, and I am on an i5 CPU.
Hi Mike, i couldn't find this on your github repo. I was curious of how you implemented. This was about 30 secs.
Thank you!

Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.

Name:
The Answer to the Ultimate Question of Life, the Universe, and Everything?
Comment:
Formatting: <i> <b> <blockquote> <code>.
NOTE: Due to an increase in spam, URLs are forbidden! Please provide search terms or fragment your URLs so they don't look like URLs.
Hosted at DigitalOcean.