The End Of Innocence

Posted on March 16, 2008 at 6:29 PM in 'Miscellaneous' with tags 'binrock, spam'

Over the past year or so I've noticed BinRock getting slower and slower. It got so bad that visitors' web browsers were timing out before BinRock got around to sending them the page they were waiting for. I don't get any notification when a visitor times out, but for the last few months, I've been getting 5-10 error messages a day from my own scripts that load BinRock pages for one reason or another, and I can extrapolate from that and guess that it's been quite a lot.

In Linux, there's a measure of how heavily the machine is loaded, aptly called the "load average." A value between 0.0 and 1.0 means that the machine is only partially being used, and it's spending part of its time idle. A value of 1.0 effectively means that there is something using the CPU at all times, but nothing is being made to wait. That's just about perfect — you're not wasting hardware sitting around doing nothing, but you're also not overloading it. Values above 1.0 mean that the machine is overloaded, and processes are having to sit around waiting for their turn to run. Generally, if you see it reach 2.0 or more, the machine will be running pretty slowly and it'll be really obvious to the users.

For years, BinRock's load average was down around 0.10 — it wasn't even breaking a sweat serving all the web sites, email accounts, etc that are hosted there. But for the last year, the load has started growing steadily, and during the last few months, I'd regularly see it get as high as an astounding 30.0. Hence all the timeouts.

I'd made several previous attempts to figure out what was making it run so slow, and found and solved a few bottlenecks, but the load never got better. But finally this weekend I tracked down the cause — spam. Ever since about last July, the flow of incoming email to BinRock has grown steadily:

Incoming emails per hour

Since BinRock was always so lightly loaded and only hosted 20-30 email accounts, I had configured it to be relatively trusting of incoming email — if BinRock was a house, the mail server was a suspicious but not overprotective father, who carefully eyed over anyone knocking on the door asking to see his daughter, and letting them pass unless they had a particularly skeezy look to them. If the daughter didn't want to see them (the email was addressed to a nonexistent user), then the father had to usher them back out the door, and in fact, all the way back to their supposed homes.

That policy works fine when you live in a nice suburban neighborhood, but not when you live in downtown Baghdad. Sadly, the current state of email on the internet more closely resembles the latter case, and dad was getting overloaded carefully eyeing the thousands of callers every day, and wasting hours of his time trying to get all the rejected boys back to their homes, especially since, as it turns out, most of those boys with ill intent lied about where they were from. Dad constantly had a backlog of thousands of boys to deal with. Here's the number of messages sitting in the queue waiting to be processed at any given time (normally, the number should stay close to zero, as every message is handled immediately).

Message queue size

The boys don't have it so easy anymore. Now, any boys knocking on BinRock's door have to run a well-armed gauntlet just to make it in the front door. Dad first checks a list to see if the boy is coming from a house that is already known to be of ill repute, and if so, he doesn't even open the door. If he does open the door, and the boy asks to see Cheryl or Steve or King Ramses III (Dad's daughter is named Katie), he slams the door in the boy's face. Finally, if the boy has made it past those challenges unscathed, Dad lets the boy in and eyes him suspiciously, and if he's got an unsavory look about him, out the door he goes. Only the few remaining boys are let in to see his daughter.

And now, after a year of hectic work, Dad has time to read the paper and catch up on his gardening again.

Incoming emails per hour

The green area reflects the number of incoming emails per hour, and the blue line shows how many of those connections were denied. I made the first change (check a blacklist, and refuse incoming connections from blacklisted IP addresses) at about 17:00 hrs yesterday, and instantly about 80% of incoming connections were just denied outright. The next change, refusing connections that tried to deliver email to nonexistent BinRock addresses, was made today at about 16:00 hrs, and you can see that almost all of the remaining connections are now denied. The few remaining connections (about 10 out of 2000 per hour) are allowed to deliver email to BinRock, which means the spam filter only has to look at 10 messages per hour, rather than 2000. And the improvement is visible in the size of the pending message queue:

Message queue size

Ahh, back to zero. The load average is back under control as well, sitting steady around 0.50-0.60. I can handle that. And, more importantly, so can Dad.

Comments

Posted by jenn 5 hours, 11 minutes later

Oh my god. This probably means I won't have to sort through 80,000 spams PER DAY when I'm looking for real email amidst the TMDA thing. Yay dan!