Wednesday, November 28, 2007

Lesser-Known MySQL Performance Tips

dolphins by talkrhubarb
"dolphins" by talkrhubarb

I thought I knew a fair bit about MySQL before I started working on Wesabe, but I was wrong. Once your tables start to get really massive, all sorts of fun issues come out of the woodwork. There are a thousand articles out there on the basics of database optimization (use indices, denormalize when necessary, etc.), so I'm not going to cover those again. Here are some tips--most specific to MySQL--that I've learned on the job:




  • MySQL only uses one index per table when doing a query, and the query optimizer doesn't always pick the best column, so indexing everything is not necessarily a great idea. If you're always selecting on a certain set of columns, create a single multi-column index


  • related to the above, don't put indices on columns with a low cardinality. You can check the cardinality of a column by doing SHOW INDEX FROM table_name


  • if any text or blob fields are part of your SELECT query, MySQL will write any temp tables to disk rather than memory. So where possible, use large VARCHAR fields (on 5.0.3 or higher) instead. See: http://railspikes.com/2007/9/14/one-reason-why-MySQL-sucks


  • if you need to do case-sensitive matching, declare your column as BINARY; don't use LIKE BINARY in your queries to cast a non-binary column. If you do, MySQL won't use any indexes on that column.


  • SELECT ... LIMIT offset, limit, where offset is a large number == bad, at least with InnoDB tables, as MySQL actually counts out every record until it gets to the offset. Ouch. Instead, assuming you have an auto-incremented id field, SELECT * FROM foo WHERE id >= offset AND id < offset+limit (props to Coda for that one).



Update:

  • If you have a query in which you're looking for the most recent n items of something—e.g. SELECT * FROM foo WHERE bar = ? ORDER BY created_at DESC LIMIT 10)—your query will be significantly faster if you create a multi-column index on whichever field(s) you're selecting on plus the timestamp field that you're ordering by (in this case, CREATE INDEX idx_foo_bar_created_at ON foo (bar,created_at)). This allows MySQL to look up the records directly from the index, instead of pulling all the records where bar = ?, sorting them (probably first writing them to a tempfile), and then taking the last 10. EXPLAIN is your friend. If you're seeing "Using temporary table, Using filesort" in the "extra" section of EXPLAIN, that's bad.


I'll add more as we come across them. If you have any other non-obvious tips, leave them in the comments or email me (brad@ this domain) and I'll add them too.

By the way, an invaluable book for learning how to squeeze the most out of MySQL is High Performance MySQL, by Jeremy Zawodny and Derek Balling.

Monday, November 5, 2007

Tumblr

I decided to give tumblogging a try. I'm hoping that ease of use and lowered expectations will get me to post more often, and I'm liking it so far. It covers the large middle ground between blogging and twittering. I can put up something interesting without feeling like I need to construct a coherent post around it, and I don't feel like I'm pestering people with twitters (not to mention that it allows for richer media types).

I realize I could just do the same thing on this blog if I wanted to, but there's something about the minimalist formatting and ease-of-use that just seems to work. So I'll continue to post longer pieces here (I've got a number that I need to just sit down and write), and I'll keep posting inane comments to Twitter, but I imagine that the majority of interesting things I find will end up on my tumblelog.