Tuesday, August 30, 2005

Spam Filtering Idea

I share a hosted server with a few friends. We host a number of sites (including this one) and our own email. We have SpamAssassin set up and tuned pretty nicely (thanks, Jelo). Unfortunately, it's killing our server. SpamAssassin can suck up a lot of resources. Ideally we'd have a server dedicated to processing mail, but we can't afford that.



So here's my idea: a spam filtering cluster. Have a bunch of [trusted] home-based servers set up with SpamAssassin and dynamic DNS. A mail comes into the main mailserver and if it isn't on a whitelist, it gets shipped out to one of the filter machines. The filter machine runs SA on it and lets the main server know the result (it doesn't need to sent the mail back, so no worries about low upload speeds). Emails over a certain size would probably just be processed on the main server to avoid the bandwidth and time required to send it out (large emails are rarely spam anyway). Finally, in order for the Bayesian filtering to function correctly, you'd have to periodically sync the data from the Bayesian learner out to the filter machines.



I've looked at SpamAssassin a bit and I think it wouldn't be too hard. SA comes with a client and a server, spamc and spamd. Incoming mail gets piped through spamc, which takes care of shipping it off to spamd for processing. Spamd can live on any other host. It can also just report back whether the email was spam or not, rather than returning the entire email. So the only two things left to do are:



  1. Write a small client (spamb?) that maintains a list of hosts running spamd. When a message comes in, pick the next hostname in the list, and pass it on to spamc. Spamc sends it to the appropriate spamd host, and receives the response--either yes/no or in our case, the full SA report, which gets attached to the message headers. If we get the full message back, that means that spamc timed out trying to contact that spamd host. In this case, mark that host as being down, along with a timestamp, and take it out of the rotation for a certain period of time.

  2. A way to distribute the users' bayesian data files and prefs to the remote systems. Apparently spamd can read user from a SQL database, although I haven't looked into it to see if the bayesian learner data can be stored in a database. If so, that's an easy solution to the problem. Otherwise, you could just write a script that checks for changes in any of the user files and if it sees them, rsyncs them to each of the hosts.



I'll let you know what happens if I get around to trying this out.

Sunday, August 28, 2005

Ruby on Rails Problem with OS X

I've been playing around with Ruby and Ruby on Rails lately. I'm not that far into it yet, but they both look interesting. I'll write more about my experiences later.



In the meantime, if you're setting this up on a Mac (running Tiger) like I am, do yourself a favor and before you start getting into any of the Rails tutorials (like Rolling With Ruby on Rails), go here first:



Ruby On Rails, Mysql, and OSX Tiger Woes



Do what it says (including the fix it mentions), and you'll save yourself a lot of grief. Note that as one of the commentors mentions, it works just fine with mysql-ruby-2.7 also.



Back to my tutorial...

Friday, August 26, 2005

What do ants smell like?

Just got an appointment reminder from Kaiser. Must remember not to put on my Parfum de la Fourmi.


kaiser sm

Wednesday, August 24, 2005

Sage Advice of the Day

Some insight to be gleaned here:



10 Steps to a Hugely Successful Web 2.0 Company



I don't completely agree with all of his "steps", particularly the examples he provides. Take #7--Get people hooked on free--for example. Yes, free will always win you fans, and a large user base has intrinsic value, but if you're providing a service that people are willing to pay for, by all means let them pay. (Regarding his Thefacebook vs. Match.com example--which one is actually making money?) Still, there are some good ones. My favorite: 6. Be mindnumbingly simple. Then again, this is a pretty age-old design mantra.

Monday, August 15, 2005

Useful Sites of the Week

I've had my web designer hat on lately. I've always been a much better judge of good design than a creator of good design, so I need all the help I can get. One thing that can make a huge difference in the quality of your site is color selection. Here are some sites I've been going to lately for chromatic inspiration.



The Return of Design - Web Color Schemes
A slew of pre-packaged palettes, all using web-safe colors (i.e. 256-color palette), each with a number of variations.



ColorBlender
Pick your starting color and ColorBlender will show you a 6-color palette for that color, and allow you to easily play with variations or customize them. You can also save them [to a cookie] so you have your personal blends available to you whenever you visit the site. You can also email direct links to your blend. Very cool.



Color Scheme Generator 2
The ultimate color-generating tool. If you can't come up with a color scheme you like with this tool, you really are a lost cause. You can even see how your scheme looks to people with any one of eight different types of colorblindness.

Monday, August 8, 2005

CSS Not Working? Don't Forget Your Doctype

I just spend *hours* trying to get some CSS working only to discover that the reason it wasn't working was because I didn't have the DOCTYPE declared at the top of my html:




< !DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">

Thursday, August 4, 2005

The Butler in the Pantry with the Candlestick

Today's diversion: Whose Fish?



Took me about 45-50 minutes, the first 30 minutes of which were spent on a false start. After starting over with a better system for keeping track of everything, it wasn't that bad.



I really doubt that this was Einstein's puzzle, and certainly the 2% figure is pulled out of someone's ass, but it's still good for some brain calisthenics.

Secret to cheaper flights?

Last night Jacqueline was looking around online for flights to Seattle. She ended up falling asleep before she could book anything, but when she went to check again in the morning, she found that the same flights were about $30 cheaper.


Perhaps it was just a coincidence, but my theory is that people/agents put holds on flights throughout the day, and those holds are released at the end of the day (assumedly midnight, although I don't know which timezone), freeing up seats and thus reducing prices. Has anyone else seen this? Or is this common knowledge and I've just been clueless?

Tuesday, August 2, 2005

Netflix

I finally caved in and signed up for Netflix, as you'll see if you look on the bottom of the sidebar. I've been a bit of Luddite, preferring to get my movies from one of the great local video stores. But we recently moved and now our closest video store is about a mile away--as opposed to about 200 feet away--so our movie-watching has dropped dramatically (while our watching of mind-numbing reality shows has increased accordingly).


It seems to me that the queue is the key feature of Netflix. I can't count the number of times we've seen a preview for a movie and thought "We should rent that." Then you get to the video store and you spend half an hour walking up and down the Recent Releases trying to figure out what to get. Then again, sometimes you're in the mood for a particular kind of movie, and it might not be what's next in your queue.


Hmmm...here's a business idea to help out all those struggling video stores in the new Netflix world: make an online service (let's call it FooFlix) where people can browse movies, get recommendations, and add movies to a queue, much like Netflix. But then install terminals in video stores and tie member's FooFlix accounts to their video store account. Then they scan/swipe/enter their store card/code in the terminal and it brings up the movies in their queue and highlights the ones that the video store has available. The stores would also be able to see what their customers (or an aggregate of all FooFlix customers) have in their queues, so they know what they should be ordering.


As for a revenue model, you probably couldn't get away with charging customers for the service, but you could charge video stores to rent the terminal, and perhaps for customer queue data. You could probably do some pretty targeted advertising on the site, too.


Probably not going to make me a billionare. If anyone steals the idea, at least send me a postcard or something.