Wednesday, December 5, 2007

Cheating for Fun

I recently checked out the Scramble application on Facebook, which is a knock-off of Boggle. It's a fun diversion, even though I usually get my ass kicked. There's an annoying chat window there, too, and invariably someone will accuse someone else of cheating, at which point a number of people will pipe up with "how would you cheat?"

This got me thinking...how *would* you cheat? You'd just have to write a Scramble puzzle solver that spit out all the words in on the board. That sounded like a fun problem. I hadn't done anything with recursion in a while, so I gave it a go. Below is what I came up with. You'll need a word list to run it. I originally used /usr/share/dict/words, but it was missing a lot of valid words (I think it only has base words, no plurals or tenses). So I stole the scrabble word list from the Debian scrabble package, which worked much better.

Anyway, here you go:



I did try this out a couple times on Scramble and won handily. There's really not much fun in that, though.

Update: I posted this in one of the comments below, but I'll put it here as well to make sure everyone sees it: This was really just an exercise in coming up with a fun algorithm, and is primarily intended for programmers. If you’re just looking to cheat at Boggle/Scramble, there are plenty of web sites that will help you with that, and with a lot less effort.

Wednesday, November 28, 2007

Lesser-Known MySQL Performance Tips

dolphins by talkrhubarb
"dolphins" by talkrhubarb

I thought I knew a fair bit about MySQL before I started working on Wesabe, but I was wrong. Once your tables start to get really massive, all sorts of fun issues come out of the woodwork. There are a thousand articles out there on the basics of database optimization (use indices, denormalize when necessary, etc.), so I'm not going to cover those again. Here are some tips--most specific to MySQL--that I've learned on the job:




  • MySQL only uses one index per table when doing a query, and the query optimizer doesn't always pick the best column, so indexing everything is not necessarily a great idea. If you're always selecting on a certain set of columns, create a single multi-column index


  • related to the above, don't put indices on columns with a low cardinality. You can check the cardinality of a column by doing SHOW INDEX FROM table_name


  • if any text or blob fields are part of your SELECT query, MySQL will write any temp tables to disk rather than memory. So where possible, use large VARCHAR fields (on 5.0.3 or higher) instead. See: http://railspikes.com/2007/9/14/one-reason-why-MySQL-sucks


  • if you need to do case-sensitive matching, declare your column as BINARY; don't use LIKE BINARY in your queries to cast a non-binary column. If you do, MySQL won't use any indexes on that column.


  • SELECT ... LIMIT offset, limit, where offset is a large number == bad, at least with InnoDB tables, as MySQL actually counts out every record until it gets to the offset. Ouch. Instead, assuming you have an auto-incremented id field, SELECT * FROM foo WHERE id >= offset AND id < offset+limit (props to Coda for that one).



Update:

  • If you have a query in which you're looking for the most recent n items of something—e.g. SELECT * FROM foo WHERE bar = ? ORDER BY created_at DESC LIMIT 10)—your query will be significantly faster if you create a multi-column index on whichever field(s) you're selecting on plus the timestamp field that you're ordering by (in this case, CREATE INDEX idx_foo_bar_created_at ON foo (bar,created_at)). This allows MySQL to look up the records directly from the index, instead of pulling all the records where bar = ?, sorting them (probably first writing them to a tempfile), and then taking the last 10. EXPLAIN is your friend. If you're seeing "Using temporary table, Using filesort" in the "extra" section of EXPLAIN, that's bad.


I'll add more as we come across them. If you have any other non-obvious tips, leave them in the comments or email me (brad@ this domain) and I'll add them too.

By the way, an invaluable book for learning how to squeeze the most out of MySQL is High Performance MySQL, by Jeremy Zawodny and Derek Balling.

Monday, November 5, 2007

Tumblr

I decided to give tumblogging a try. I'm hoping that ease of use and lowered expectations will get me to post more often, and I'm liking it so far. It covers the large middle ground between blogging and twittering. I can put up something interesting without feeling like I need to construct a coherent post around it, and I don't feel like I'm pestering people with twitters (not to mention that it allows for richer media types).

I realize I could just do the same thing on this blog if I wanted to, but there's something about the minimalist formatting and ease-of-use that just seems to work. So I'll continue to post longer pieces here (I've got a number that I need to just sit down and write), and I'll keep posting inane comments to Twitter, but I imagine that the majority of interesting things I find will end up on my tumblelog.

Saturday, September 22, 2007

Social Graphs Must Die

Chris Wanstrath made a perfectly reasonable request earlier today, so I came up with the following Greasemonkey script:

// ==UserScript==
// @name Social Graphs Must Die
// @namespace Footle
// @description Redact all mentions of "social graphs"
// ==/UserScript==
document.body.innerHTML = document.body.innerHTML.replace(/(social\s+graph(s?))/ig,"<span style='background:#000;color:#000'>$1</span>");


Actually closing a tab (that wasn't opened by Javascript) in Firefox 2 isn't possible AFAICT, but this is more fun anyway.

Tuesday, August 14, 2007

I Get No Spam

After hearing a couple complaints from friends about the amount of spam they're getting, I decided to take a quick look to see where I stood. At the risk of setting myself up, I should say that I get no spam (channeling John C. Dvorak). Or at least very little--maybe one every other day gets through to my inbox.

I think the trick is just to have multiple levels of filtering. I use a Gmail address for most online transactions (for online transactions that I really don't care about--if I'm doing a one-time registration just to get a trial key, for example--I use my Yahoo address, which is nothing but a spam bucket). My Gmail address gets forwarded to my personal email address (@footle.org), which has SpamAssassin running on the server. If it makes it past SA to Mail.app, it gets inspected by SpamSieve, an awesome Bayesian spam filter for OS X mail clients.

Anyway, my stats for the last 24 hours:

Server-level filters (spam caught):
  SpamAssassin: 281
  Gmail: 32

Client-level filter (spam caught):
  SpamSieve: 17

Spam getting to my inbox: 0
Non-spam messages: 52
False positives: 1 (just Flavorpill, so not a big deal)

(Wow...330 pieces of spam in a single day. That seems way worse than when I last checked.)

Thursday, August 9, 2007

Idea-That-I-Don't-Have-Time-To-Work-On of the Week

Bonjour-enabled Git. (Although Coda tells me that Dave F. at Powerset already approached him with this idea. Neither of us have time to work on it.)

I started using Git recently and although it's a bit confounding at times, I love how cheap and easy branching is, and that you can commit incremental changes to your local repository, then later commit them in batch to subversion. Git allows for distributed development, but unless you're on the same server as other developers, or you go through the trouble of mounting their repository over the network, you can't check out their repository. Seems like it shouldn't be too difficult to give Git Bonjour powers.

Update: Chad Fowler has started such a project.

Tuesday, July 31, 2007

Binary multipart POSTs in Javascript

We recently released a very slick Firefox extension at Wesabe. It was written by my colleague Tim Mason, but I helped figure out one small piece of it—namely, how to do binary multipart POSTs in Javascript—and since it involved many hours of hair-pulling for both of us, I thought I'd share the knowledge. (Tim says I should note that "this probably only works in Firefox version 2.0 and greater _and_ that it uses Mozilla specific calls that are only allowed for privileged javascript—basically only for extensions.")

One of the cool features of the plugin is the ability to take a snapshot of a full browser page and either save the snapshot to disk or upload it to Wesabe (so you can, for example, save the receipt for a web purchase along with that transaction in your account). The snapshot is uploaded to Wesabe via a standard multipart POST, the same way that a file is uploaded via a web form.

Tim was having trouble getting the POST to work with binary data at first, and he had other things to finish, so he wanted to just base-64-encode it and be done with it. I was reluctant to do that, as the size of the upload would be significantly larger (about 137% of the original). Also, Rails didn't automatically decode base-64-encoded file attachments. But Tim had other bugs to fix, so I submitted a patch to Rails to do the base-64 decoding. I was pretty proud of this patch until it was pointed out to me that RFC 2616 specifically disallows the use of Content-Transfer-Encoding in HTTP. Doh. They also realized that it is a colossal waste of bandwidth.

Since Tim was cramming to meet a hard(-ish) deadline set for the release of the plugin, I offered to lend my eyeballs to the binary post problem. This could be a very long story, but I'll just get to the point: you can read binary data in to a Javascript string and dump it right out to a file just fine, but if you try to do any concatenation with that string, Javascript ends up munging it mercilessly. I'm not sure whether it is trying to interpret it as UTF8 or if it terminates it as soon as it hits a null byte (which is what seemed to be happening), but regardless, doing "some string" + binaryData + "another string", as is necessary when putting together a mutipart post, just does not work.

The answer required employing Rube Goldbergian system of input and output streams. The seed of the solution was found on this post, although that didn't explain how to mix in all of the strings needed for the post and MIME envelope. So here it is, in all it's goriness:


var dataURL = this.canvas.toDataURL(this.getImageType()); // grab the snapshot as base64
var imgData = atob(dataURL.substring(13 + this.getImageType().length)); // convert to binary

var filenameTimestamp = (new Date().getTime());
var separator = "----------12345-multipart-boundary-" + filenameTimestamp;

// Javascript munges binary data when it undergoes string operations (such as concatenation), so we need
// to jump through a bunch of hoops with streams to make sure that doesn't happen

// create a string input stream with the form preamble
var prefixStringInputStream = Components.classes["@mozilla.org/io/string-input-stream;1"].createInstance(Components.interfaces.nsIStringInputStream);
var formData =
"--" + separator + "\\r\\n" +
"Content-Disposition: form-data; name=\"data\"; filename=\"snapshot_" + filenameTimestamp +
(this.getImageType() === "image/jpeg" ? ".jpg" : ".png") + "\"\\r\\n" +
"Content-Type: " + this.getImageType() + "\\r\\n\\r\\n";
prefixStringInputStream.setData(formData, formData.length);

// write the image data via a binary output stream, to a storage stream
var binaryOutputStream = Components.classes["@mozilla.org/binaryoutputstream;1"].createInstance(Components.interfaces.nsIBinaryOutputStream);
var storageStream = Components.classes["@mozilla.org/storagestream;1"].createInstance(Components.interfaces.nsIStorageStream);
storageStream.init(4096, imgData.length, null);
binaryOutputStream.setOutputStream(storageStream.getOutputStream(0));
binaryOutputStream.writeBytes(imgData, imgData.length);
binaryOutputStream.close();

// write out the rest of the form to another string input stream
var suffixStringInputStream = Components.classes["@mozilla.org/io/string-input-stream;1"].createInstance(Components.interfaces.nsIStringInputStream);
formData =
"\\r\\n--" + separator + "\\r\\n" +
"Content-Disposition: form-data; name=\"description\"\\r\\n\\r\\n" + description + "\\r\\n" +
"--" + separator + "--\\r\\n";
suffixStringInputStream.setData(formData, formData.length);

// multiplex the streams together
var multiStream = Components.classes["@mozilla.org/io/multiplex-input-stream;1"].createInstance(Components.interfaces.nsIMultiplexInputStream);
multiStream.appendStream(prefixStringInputStream);
multiStream.appendStream(storageStream.newInputStream(0));
multiStream.appendStream(suffixStringInputStream);

// post it
req.open("POST", "http://yoursite.com/upload_endpoint", true);
req.setRequestHeader("Accept", "*/*, application/xml");
req.setRequestHeader("Content-type", "multipart/form-data; boundary=" + separator);
req.setRequestHeader("Content-length", multiStream.available());
req.setRequestHeader("Authorization", "Basic " + btoa(username + ":" + password));
req.setRequestHeader("User-Agent", "YourUserAgent/1.0.0");
req.send(multiStream);


Update: Spaces removed from multipart boundary per Gijsbert's suggestion in the comments (thanks!).

Sunday, May 13, 2007

Reloading Ruby Classes in Rails

I ran across a bug in Date::Format today, and after spending a few hours hacking away at a fix (the date/format.rb code is *uuuuugly* and *sloooow*...someone should really rewrite that. Better yet, rewrite it in C), I thought I'd submit a patch. So I grabbed the ruby_1_8 branch and lo and behold, my issue had already been fixed!

So the question now was how to monkeypatch the entire Date::Format module? Simply require-ing it as a plugin doesn't work, since Date::Format is already loaded at that point. The trick then is to use load instead of require.

First, I tried this:


load 'date/format.rb'


(note: load needs the actual filename; it doesn't have the magic that require does) but that gave me an error:


in `load': wrong number of arguments (1 for 0) (ArgumentError)


Turns out Rails::Plugin::Loader defines its own load, so:


Kernel::load 'date/format.rb'


et voila!

Stop iPhoto from Launching when you Connect your Camera

It took me a bit of digging to find this, so I thought I may as well spread the word here. I've started using Adobe Lightroom for managing my photos, but every time I connect my camera or card reader to import photos, iPhoto would launch, and I couldn't figure out where I could change that. I finally discovered that the setting is in the Image Capture application. Run that, then go to Preferences and you'll find it there. Why they wouldn't also put it in iPhoto, Lightroom, or even on a preferences panel is beyond me.

Tuesday, April 17, 2007

Web 2.0 and Coming Soon...

Marc and I presented our Super Ninja Privacy Techniques for Web App Developers talk again yesterday at Web 2.0 (we also did it at ETech last month), and it seemed well received. I still need to work on my presentation skills; I'm not an old hand at this like Marc is.

I'm working on a follow-up to my privacy wall post which will describe a much better way to go about keeping a user's private data private, using an encrypted "Locker". I'll also go into detail about how we deal with password recovery.

Hopefully by preannouncing this, I'll force myself to get off my duff and finish the post. So stay tuned, and bug me if I don't have it up soon.

Wednesday, March 14, 2007

Google Takes Steps Towards Greater Privacy

Google recently announced that it will soon start anonymizing search logs older than 18-24 months. Full details can be found in their Log Retention Policy FAQ (PDF). This is a heartening step back towards their "Don't Be Evil" corporate philosophy, which some think has been largely abandoned.

I've just recently started using Scroogle as a way of defeating their tracking of my every search (their site is awful; Wikipedia has more readable information about the project), although the motives of the man behind it, Daniel Brandt, who also runs the Google Watch site, may be questionable. Still, he doesn't have much incentive for keeping a log of queries and IP addresses, and if he did, since he's not giving me a cookie, he can't tie all my searches together.

Wednesday, February 28, 2007

Do Passwords Right With bcrypt-ruby

My colleague Coda Hale just released a sweet bcrypt-ruby gem that does password hashing right. It also provides for future-proofing by enabling you to assign a computational cost for generating a password hash and letting you version your password hashes. You have no more excuses.

Thursday, February 22, 2007

Protecting Your Users' Data with a Privacy Wall

Just Another Brick In The Wall? by Iain Cuthbertson
Just Another Brick In The Wall?
by Iain Cuthbertson


We deal with a lot of very private data at Wesabe, so security and privacy are our top concerns. In this post I will describe one of our primary means for assuring privacy, a technique that is general enough that any site can use it. Our creative name for this technique is the privacy wall. Later, I'll go on to tell you ways to hack the wall, just so you don't get too comfortable.



The Privacy Wall



The idea is simple: don't have any direct links in your database between your users' "public" data and their private data. Instead of linking tables directly via a foreign key, use a cryptographic hash [1] that is based on at least one piece of data that only the user knows—such as their password. The user's private data can be looked up when the user logs in, but otherwise it is completely anonymous. Let's go through a simple example.



Let's say we're designing an application that lets members keep a list of their deepest, darkest secrets. We need a database with at least two tables: 'users' and 'secrets'. The first pass database model looks like this:



Standard Model

The problem with this schema is that anyone with access to the database can easily find out all the secrets of a given user. With one small change, however, we can make this extremely difficult, if not impossible:



Privacy Wall

The special sauce is the 'secret_key', which is nothing more than a cryptographic hash of the user's username and their password [2]. When the user logs in, we can generate the hash and store it in the session [3]. Whenever we need to query the user's secrets, we use that key to look them up instead of the user id. Now, if some baddie gets ahold of the database, they will still be able to read everyone's secrets, but they won't know which secret belongs to which user, and there's no way to look up the secrets of a given user.



Update: A commenter on my shorter post on the Wesabe blog brought up the important point of what you do if the user forgets their password. The recovery method we came up with was to store a copy of their secret key, encrypted with the answers to their security questions (which aren't stored anywhere in our database, of course). Assuming that the user hasn't forgotten those as well, you can easily find their account data and "move it over" when they reset their password (don't forget to update the encrypted secret key); if they do forget them, well, there's a problem.



Attacking the Wall



I mentioned earlier that you store the secret key in the user's session. If you're storing your session data in the database and your db is hacked, any users that are logged in (or whose sessions haven't yet be deleted) can be compromised. The same is true if sessions are stored on the filesystem. Keeping session data in memory is better, although it is still hackable (the swapfile is one obvious target). However you're storing your session data, keeping your sessions reasonably short and deleting them when they expire is wise. You could also store the secret key separately in a cookie on the user's computer, although then you'd better make damn sure you don't have any cross-site scripting (XSS) vulnerabilities that would allow a hacker to harvest your user's cookies.



Other holes can be found if your system is sufficiently complex and an attacker can find a path from User to Secret through other tables in the database, so it's important to trace out those paths and make sure that the secret key is used somewhere in each chain.



A harder problem to solve is when the secrets themselves may contain enough information to identify the user, and with the above scheme, if one secret is traced back to a user, all of that user's secrets are compromised. It might not be possible or practical to scrub or encrypt the data, but you can limit the damage of a secret being compromised. My colleague and security guru Sam Quiqley suggests the following as an extra layer of security: add a counter to the data being hashed to generate the secret key:




secret key 1 = Hash(salt + password + '1')
secret key 2 = Hash(salt + password + '2')
...
secret key n = Hash(salt + password + '<n>')


Getting a list of all the secrets for a given user when they log in is going to be a lot less efficient, of course; you have to keep generating hashes and doing queries until no secret with that hash is found, and deleting secrets may require special handling. But it may be a small price to pay for the extra privacy.



Finally, log files can be a gold mine for attackers. There's a very good chance you're logging queries, debug statements, or exception reports that link users to their keys or directly to their secrets. You should scrub any identifying information before it gets written to the log file.



So That's It, Right?



The privacy wall is far from a silver bullet. Privacy and security are hard—really hard—particularly so if your app is taking private data and extracting information out of it for public consumption, like we are at Wesabe. The privacy wall is one of a number of methods we're using to insure that our users' private data stays that way. If you're lucky enough to be going to ETech next month, definitely check out Marc's session on Super Ninja Privacy Techniques for Web App Developers.



I hope you found this helpful. Let me know what you think; I appreciate any and all feedback. And if you've got any cool privacy techniques up your sleeve, share the knowledge!






[1] A cryptographic hash is way of mapping any amount of plain text to a fixed-length "fingerprint" such that the same text always maps to the same hash, and given a hash, it is impossible to generate the text from which it was derived. Hashes are wonderful things with many uses. If you're a developer, and you didn't already know this, stop reading now and go here or here, and learn how to generate a SHA1/2 hash in your programming language of choice. Come back when you're ready. I'll wait.



[2] You can throw in a salt too, to be safe; just make sure that you're not using the same hash that you're using for checking the user's password. You are smart enough not to store passwords in plaintext in the database, aren't you?



[3] Danger, Will Robinson! Keep reading.

Sunday, February 4, 2007

Campfire Hacks: Uploading Screenshots with Quicksilver and Pyro

We live in Campfire at Wesabe, so naturally we've developed a number of tools and hacks around it. One of these days I'll get around to writing about the Campfire bot framework we've developed, but right now I'll start with something simpler: a way to upload screenshots to Campfire with only a few keystrokes (riffing of my colleague Coda Hale's desire to eliminate the need for a mouse). Note that this is for Mac users only, so if you're one of the poor souls who hasn't seen the light yet, you can stop reading now.



Two prerequisites (other than a Mac): Quicksilver and Pyro. Quicksilver should be familiar to most Mac users already. It is an incredibly powerful application that can be quite obtuse but can save you gobs of time if you learn how to use it. Pyro is a client for Campfire. Why do you need a client for a browser-based chat app? Getting message notifications in your dock is one reason, but the biggest reason is that there seems to be a memory leak in Firefox that causes it to grind to a halt if you've had Campfire up for too long. But I digress.



You first need to enable the Screen Capture Actions plugin in Quicksilver (go to Plugins -> Recommended, and check Screen Capture Actions). Then to go Catalog -> Quicksilver and make sure that Internal Commands is checked). This should give you Capture Region, Window, and Screen commands.



Next, the secret sauce: a bit of AppleScript that uploads a file to Campfire via Pyro:



on open theFile
tell application "Pyro"
upload theFile to room "[your campfire room name]" in campfire "[your campfire].campfirenow.com"
end tell
end open




Paste that into Script Editor and save it as PyroUpload in ~/Library/Application Support/Quicksilver/Actions (if this folder doesn't exist, create it), and restart Quicksilver.

Now you can upload a screenshot with just Cmd - Space - [Capture Window|Region|Screen] - Return - [take your screenshot] - PY - Return.



Have fun!