Erik's Engineering

something alliterative


The Longest Schema Change

Sometimes, changing column type isn't as simple as a migration with a change_column command. This is the story of one of those times.

Published on 20/02/2017 at 15h36 under , .

Keep a Work Log

Keeping written notes about what you work on can help you learn faster.

Published on 24/11/2016 at 12h29 under .

Sidekiq Queueing Patterns

Almost every app seems to need background processing. In ruby, I think sidekiq is the best system for doing this. As your system grows you need to start paying attention to how your organize your sidekiq jobs. I walk through some progressively more complicated setups.

Published on 07/08/2016 at 13h35 under , . Tags ,

The 202 Pattern

A couple years ago we were working on a reporting project that needed data that wasn’t really kept online. A user would select a number of streams and a time range. Relevant log data would be downloaded from S3, indexed in SOLR, then queried, sent to the client, and displayed. Depending on which stream and how much time they asked for, this might take anywhere from a few seconds to a few hours. Of course, once it was loaded we could make subsequent queries very quickly. Obviously we couldn’t just have the browser stop and wait for the request to complete. Our solution will surprise you!

Published on 30/06/2016 at 12h05 under , .

Guard Remembers

You can configure guard to remember relationships between test files, so you never forget to run all the relevant tests after a change.

Published on 08/08/2013 at 13h07 under , . Tags , , ,

That's Easy!

A while back during a standup, I had an interaction about like the following: Me: Yesterday I spent a bunch of time banging my head against the wall on X. Person Zed: Oh, that's easy. I can't believe you're having trouble with it. You just have to frobble the Y and then snizzle the Z. Me: Well thanks for making me feel stupid.

Published on 07/08/2013 at 22h33 under .

Rake for Rails Developers

One of the most common tasks on a big rails project is to add a new rake task. Rake is an important tool, used by lots of different groups in the ruby community. It's used by rails, but I'm pretty sure it pre-dates rails. It certainly doesn't feel like rails.


For this blog post, I'm using rake, rails 3.1.3 and ruby 1.9.3-p0. I hope that's all current enough for you. Newer versions of rake and rails changed a few things, deprecating some old conventions. I think the new ways still work on the older versions, but I haven't tested.

During development it's really easy to wind up with multiple versions of rake installed. It's safest to specify a version in your Gemfile and then use bundle exec to run the rake binary.

Listing Tasks

The first thing to do with rake, is run 'bundle exec rake -T'. This will give you a list of available rake tasks. Do it on your app right now, just to help orient yourself. Every rails app comes with a bunch of tasks, and most apps add new ones of their own.

rake_demo > bundle exec rake -T
rake about              # List versions of all Rails fr...
rake assets:clean       # Remove compiled assets
rake assets:precompile  # Compile all the assets named ...
rake db:create          # Create the database from conf...
rake db:drop            # Drops the database for the cu...
rake db:fixtures:load   # Load fixtures into the curren...
rake db:migrate         # Migrate the database (options...
rake db:migrate:status  # Display status of migrations
rake db:rollback        # Rolls the schema back to the ...
rake db:schema:dump     # Create a db/schema.rb file th...
rake db:schema:load     # Load a schema.rb file into th...
rake db:seed            # Load the seed data from db/se...
rake db:setup           # Create the database, load the...
rake db:structure:dump  # Dump the database structure t...
rake db:version         # Retrieves the current schema ...
rake doc:app            # Generate docs for the app -- ...
rake log:clear          # Truncates all *.log files in ...
rake middleware         # Prints out your Rack middlewa...
rake notes              # Enumerate all annotations (us...
rake notes:custom       # Enumerate a custom annotation...
rake rails:template     # Applies the template supplied...
rake rails:update       # Update configs and some other...
rake routes             # Print out all defined routes ...
rake secret             # Generate a cryptographically ...
rake stats              # Report code statistics (KLOCs...
rake test               # Runs test:units, test:functio...
rake test:recent        # Run tests for {:recent=>"test...
rake test:single        # Run tests for {:single=>"test...
rake test:uncommitted   # Run tests for {:uncommitted=>...
rake time:zones:all     # Displays all time zones, also...
rake tmp:clear          # Clear session, cache, and soc...
rake tmp:create         # Creates tmp directories for s...
rake_demo > 

There are a couple things to notice here.

First, tasks are listed in alphabetical order. In order to get 'create', 'migrate', and 'seed' anywhere close to each other, they have to be grouped together in a namespace ('db' in this case).

Second, tasks have friendly descriptions that go with them, to help you figure out what something like 'secret' is.

Third, rake will conveniently truncate the output to match the width of your terminal window. Make that window really wide if you want to read more of the description. Keep this in mind when you make your own tasks - put the really important description first.

Fourth, if a task doesn't have a description it won't show up. There are more tasks than just the ones rake -T shows you. Think of the others as private tasks.

You can add an extra argument after -T and rake will filter the output down to only tasks that include it. Very handy if you want to see all the test tasks, or if you can't remember if it's test:unit or test:units.

Running Tasks

You run task 'foo' with the command line 'bundle exec rake foo'.

You can run both tasks foo and bar using 'bundle exec rake foo bar'. This can save you a lot of time if you have a big rails app that takes a long time to start. I cut 3 minutes off our deploy time by bundling multiple rake commands together to avoid repeated startup cost.

You pass arguments to a task by enclosing them in square brackets with NO SPACES. 'bundle exec rake foo[bar,1]'. If you put spaces anywhere between the start of the task name and the ending square bracket, rake will get confused and probably error out.

The rake tasks that come with rails usually get their arguments from environment variables instead. 'bundle exec rake db:migrate VERSION=0'

Code Organization

Rake tasks go in the lib/tasks directory, in files with a .rake extension. These aren't loaded during normal application boot, so whenever possible you should minimize the amount of code that goes into those .rake files and instead have them call code that's in other parts of the codebase. Think of rake tasks as actions for a command line application. They should do some argument processing and then hand things off to business logic that lives in your models or other classes.

Try not to let your .rake files get too big. Code in them is hard to test and inaccessible from the rest of your app.

Basic Rake Task

Here's a very basic .rake file, with a simple task.

namespace :demo do

  desc "a basic task"
  task :basic do |t, args|
    puts "I'm a basic task"


You'd call this with 'bundle exec rake demo:basic'.

First off, notice the namespace :demo line. I've named this to match the filename (demo.rake). Tasks within this namespaces will be called with demo: before their names. This gives you a way to indicate that some tasks are related to each other. This gets important when you have a lot of them. The naming convention helps you find the right file.

Next, you see the desc line. This gives a description that will be shown when you run rake -T. Keep it short - rake will truncate it to fit it on a single line in the user's terminal.

If you do not include a desc line, the task will not show up when you run rake -T. You can make use of that to create "private" tasks that aren't advertised.

The task line is the start of the actual task declaration. It names the task, declares args and dependencies and starts up the block of code that is executed when the task is run. The |t,args| on the block are optional, but I'd recommend always including them to help remind you about how to pass args. Consider them boilerplate.

Try to keep your tasks short and sweet, just like methods. These are every bit as much application code as anything in a controller or model, and the same rules about clear naming, documentation and short methods that apply to other code apply equally to rake tasks.


Rake tasks can depend on other things. You do this by putting '= > [:list, :of, :dependencies]' between the task name and the do that starts the task block. You can list dependencies as either symbols or strings. Most people use symbols and fall back to strings when forced to by syntax. If a dependency is in the same namespace you don't have to include the namespace part. For dependencies outside your current namespace you pretty much need to list them as strings.

Dependencies are a great way to break longer rake tasks down into shorter tasks and make it easier to re-use code. All dependencies will be run before the body of a task is executed, so any setup code will itself have to go in a dependency.

You'll notice that I never accessed any of my application's models in the basic demo task. That's because by default your rails application isn't loaded. The powers that be know you need this a lot, so they have a handy rake task called :environment that will do it for you. Just list it as a dependency. This is a great example of extracting some commonly used code into a separate task that's brought in as a dependency.

Here are a couple rake tasks that make use of dependencies:

namespace :deps do
  desc "uses the environment"
  task :with_environment => [:environment] do |t, args|
    puts "User count: #{User.count}"

  desc "run both one and two"
  task :both => [:one, :two] do |t, args|
    puts "both"

  desc "print 'one'"
  task :one do |t, args|
    puts "one"

  desc "print 'two'"
  task :two do |t, args|
    puts "two"


Basically, anything on the task line in the array after the hash rocket will be run first. You can add any number of dependencies by adding them to array, but most of the time you'll just want the environment.

Multiple dependencies will be satisfied in the order specified, but will only be called once for the entire invocation of rake. This means that each task will only run once, even if it's a dependency of multiple other tasks.

rake_demo > bundle exec rake deps:with_environment deps:both deps:one
User count: 0
rake_demo > 

Loading the rails environment will make your rake task take a lot longer to run. If possible, avoid loading rails. Your fellow developers will thank you for it.


Sometimes, you want to pass in an argument or two to your task. The tasks that come with rails usually pass arguments via environment variables. This gives you named arguments, but you can't differentiate arguments for different tasks and you can't use the very handy argument handling code that comes with rake.

Rake's built-in argument handling code gets arguments like this: 'bundle exec rake db:dump[filename]'.

Arguments are listed in an array on the task line, after the task is named, but before the declaration of dependencies (if any). task :name, [:arg1, :arg2] => [:dep1, :dep2] do |t, args|

Here's a simple rake task that accepts some arguments.

namespace :args do
  desc "takes dimensions as arguments"
  task :dimensions, [:x,:y] do |t, args|
    args.with_defaults(:x => 50, :y => 100)

    puts "dimensions are #{args[:x]}x#{args[:y]}"

Arguments are ordered, not named. However, rake automatically parses them into a hash for you based on the order you declare them on the task line. It also gives you a handy way to set defaults for those arguments. Check out the #with_defaults method. It will make your life easier.

rake_demo > bundle exec rake -T args
rake args:dimensions[x,y]  # takes dimensions as arguments
rake_demo > bundle exec rake args:dimensions
dimensions are 50x100
rake_demo > bundle exec rake args:dimensions[25]
dimensions are 25x100
rake_demo > bundle exec rake args:dimensions[25,75]
dimensions are 25x75
rake_demo > 

You don't get a lot of guarantees about what type the arguments will show up as (e.g. String vs Integer), so don't forget to do appropriate conversions to keep yourself safe. Likewise, rake doesn't consider arguments mandatory, so you'll need to enforce that manually.

Calling Other Tasks

Sometimes, you want to call another rake task, but for some reason you don't want to just list it as a dependency. Perhaps you need to calculate some values to pass to it as arguments, or you want to force it to run even if it was already invoked as a dependency by some other task.

You can do this by looking up the task using Rake::Task['task_name_as_string'] and calling #invoke on it.

Pass your arguments to the #invoke call just like any normal method call.

namespace :invoke do

  desc 'invoke bar with random argument'
  task :foo do |t, args|
    n = rand(5)

    Rake::Task['invoke:bar'].invoke n

  desc 'default is 3'
  task :bar, [:n] do |t, args|
    args.with_defaults(:n => 3)
    puts "n is #{args['n']}"

rake_demo > bundle exec rake invoke:foo
n is 0
rake_demo > 

File Tasks

Rake is based on the traditional make command. Make is all about creating files based on dependencies - it's a helper for compiling things.

Naturally, rake gives you a handy tool for building files. Here's a quick example of using that capability as part of a larger task.

desc "set up a fresh git clone for dev work"
task :setup => ['config/database.yml', 'setup:bundle_install', 'db:migrate'] do |t,args|

namespace :setup do

  desc "set up database.yml"
  file "config/database.yml" => ['config/database.yml.example'] do

    cp "config/database.yml.example", "config/database.yml"


  desc "install gems"
  task :bundle_install do |t, args|

    system('gem install bundler')
    system('bundle install')



Rake is smart enough to not overwrite database.yml if one already exists. If a file depends on other files, it will take their timestamps into account when deciding whether or not they need to be updated. In this case, if I update database.yml.example, bundle exec rake setup will overwite my database.yml, picking up any new settings that were added. That means you can use it to conditionally build assets, etc.

If you don't declare any file dependencies, rake will only build the file if it doesn't already exist. Sometimes this is safer - maybe I don't want to risk nuking my database.yml and it's better to leave it alone if one already exists.


Rake is a really DSL for adding command line functions to your rails apps. It offers some distinct advantages over plain old scripts but like any other code you write you do need to think about what you're doing first so that you can make the most of its features without creating unmaintainable code.

  • Organize your tasks into relevant namespaces. Don't just throw them all in a junk drawer.
  • Keep tasks short, with business logic in other places. Think of tasks as actions in a controller.
  • Don't load the Rails environment if you don't have to.
  • Use dependencies to build more powerful tasks via composition.


Brook Riggio has posted a really excellent mind-map of the stock tasks that come with a Rails app. Check it out here. I love how it breaks everything down hierarchically in a way you just can't do in text.

Published on 04/12/2011 at 17h04 under , . Tags , ,

Quick IRB/rails console customization idea

I'm not sure why this didn't occur to me sooner.  Most folks customize their irbrc file with tools to add features.  Why not use it to add helpers for things you do all the time?  I find myself loading up an example user record all the time, so I ginned up the following and stuck it in my .irbrc

It's a bit more convoluted than you'd normally expect, but by dancing around the subject a bit I made it so that this will compile just fine even if User isn't defined or doesn't support find_by_email.  I don't want irb/console to refuse to start because something isn't defined.

I just thought of this, so I'm not sure where it's going yet.  I'm going to be looking out for other opportunities to do this and thinking about how I might make it easier.  Perhaps a gem that would recognize if you were doing a Rails console session and load an application-specific irb helper file?

Published on 20/08/2011 at 16h14 under , . Tags ,

A Personal Choice

This might seem like old hat to those of you who are consultants, but it was a big shift in my thinking.

I don't have a work laptop.

At about this time last year, I left the video game industry.  For a lot of different reasons, the game industry is focused on MS products.  Game developers work on Windows workstations, and when they start talking about databases they want to use MS SQL.  I like having good tools, so I never wanted to buy a Windows laptop for personal use.  For a while I kept a Mac or Linux desktop that would dual-boot into Windows for gaming, but it's been years since I rebooted for a game.

It was really awesome when I went to work for a true web company that did everything in the cloud.  You could run everything you needed on a Macbook Pro.  Work provided a great laptop.  I could install the tools I liked.  Life was good.

The only downside was my arms were getting longer from dragging around a nearly matched pair of 15" Macbook Pros.  Eventually, I just set up a work development environment on my personal laptop and left the company laptop in the office.

Now that I'm at Moxiesoft, I don't even have a company laptop.

Here's my thinking:

  • I will always want to have my own personal programming projects.  This is essential for professional development, and I just like to program.  I can't use the company laptop to work on them because (like every other company) I had to sign a work for hire agreement that says my employer owns anything I write on their time or using their equipment.  I've got to use my own computer if I want to retain the rights to what I write in my spare time.
  • I want to have a reasonably recent laptop.  A 2-3 year upgrade cycle seems pretty reasonable these days, but Apple hardware is expensive.  It's easier to justify upgrades if it's a computer for both personal and professional use.  Plus, upgrading my personal laptop is my decision and doesn't require approval from above.  If I hate my laptop, I can do something about it without begging for approval.  Less stress for me, less stress for my boss.  Everybody wins.
  • I don't have to try talking my employer into buying licenses for expensive tools I've already bought for myself.  I've got Photoshop, Office, OmniGraffle, Balsamiq and lots of other useful tools and I didn't have to beg for them.
  • With only one laptop, I always have all my tools with me.  I don't have to switch laptops if I need photoshop.  I don't have to worry about whether my .emacs files are out of sync.  Managing my dev environment is a lot simpler.  I no longer feel "tool envy" when I'm on one laptop or another.
  • As long as I have my laptop with me, I've got what I need to work on my own projects AND my work projects.  I don't have to decide what I'm going to work on before I leave the house.  Programming is programming.  Again, everybody wins.
  • Setting cool things up on two laptops was a pain - less of that friction means I'm spending more time doing things to make my life better.  I'm happier and more productive both at work and at home.  Everybody wins.

So I'm happier.  Everything's happier, right?  Well, there are a couple downsides.

  • I left my backpack at a friend's house the other day and about had a panic attack.   It had EVERYTHING in it.  Work computer, personal computer, even my kindle.  Having to wait a day would have been really really bad.  I'm actually thinking about making some chef scripts that will build out a work dev environment on EC2 in case it (or hardware failure, which is more likely) happens again.
  • At jobs past, I've been able to stay away from work in the evenings either by just not opening the work laptop or refusing to install invasive VPN software on my personal computer.  One place actually had a VPN client that would audit all the other software installed on your computer and could do really nasty stuff to you.  No way I was going to install that.  Now there isn't much enforced separation.  I close my tabs and windows, then run 'ssh_identity personal' to swap keys.  So far, this hasn't been a problem.
  • I'm going to wear out this laptop faster than I might otherwise.  It's a unibody macbook pro, so I think that'll take a little while, but I do need to plan for replacing it every few years.
  • Maintenance and upgrades are my responsibility.  The flip side of not having to convince my boss to pay for things is that I have to pay for them.

I'm definitely happy with this arrangement.  I think it's a big win for me, and a win for Moxiesoft.  I doubt that it makes sense for everyone though.  The whole thing is predicated on the notion that I'd be buying most if not all of this stuff for myself anyway, so asking my employer to buy me duplicates is just wasted spending.

This wouldn't work if I was some place that insisted on a development environment I didn't like - but why would I work somewhere like that?

Published on 05/06/2011 at 13h24 under , .

Ruby Performance in the Rails Development Environment


I got the reasons for things slowing down wrong.  Things get slower, but not for the reasons I thought.  José Valim explains what's actually going on.  I can confirm that if I take the largest of my test apps and nuke all the helpers it will speed up to roughly the speed of the 200 scaffold version, though the response times are pretty inconsistent.   If I clear out the routes file,  startup times and response times go down to the same level as the tiniest 1 scaffold app.


The faster your Rails app runs in dev mode, the better. As your app gets bigger, it will get slower. Jruby doesn't slow down as much. For larger codebases it blows the doors off of other implementations.

The Long Version

I'd like to talk about performance. Development performance.

This is something rather dear to my heart. Optimizing development performance can greatly improve development productivity. Getting new features faster is one of the reasons we like Rails. The faster a developer can work, the more features come out of the sausage factory.

When I was in college, I did a senior project where we programmed a microprocessor by burning our code onto an EPROM. Erasing the EPROM meant putting it under a UV light for 15-20 minutes. Then you'd burn your code to it, plug it into your board and see if that new version of your keypad debouncing code worked. My teammates were in awe of my foresight in ordering 2 EPROMs, so we only had to wait 10 minutes on average between test runs.

10 minutes to test every code change. That sucked. I should have ordered a dozen, even if they did cost $4 apiece.

I spent > 10 years programming in perl.  Under mod_perl I'd make a code change, then restart my webserver and test to see if it worked.  Elapsed time somewhere ~ 5-10 seconds. The same kind of test/fix/verify cycle could be done in 10 seconds. Much better.

On Rails, we can program in the development environment where it will automatically recompile most of our application on every page view.  For small apps this might as well be instantaneous.  This is AWESOME. You can do things almost as fast as you can think and type.

Until It's Not

Well, it's awesome as long as it stays fast.  In order to make sure all your changes show up in development mode, Rails recompiles all the controllers and models on each page view.  This means that the more code in your app, the longer it will take to compile. Eventually you end up clicking and waiting 5 or 10 seconds for a response. This is even worse than back in the perl world, because at least then you were switching to a console and running 'sudo apachectl restart' or some such, so you had something to do during that time. Bored programmers start checking Hacker News and productivity suffers.

At work we've got several Rails apps but we're still in the process of transitioning from monolithic to service oriented architecture. The original app is huge.  Hundreds of models and controllers. Megabytes of code.  That's a lot of code to compile on each request and we end up with a painfully slow minimum response time.

We run on jruby. For most of us this is the first jruby app we've worked on. Naturally, we all blamed jruby and grumbled a bit.  Charles Nutter approached me about it and we ended up hypothesizing that it was purely the big recompile adding so much to the time.  I figured I should test that hypothesis.

Testing methodology

I made a Rails 3.0.3 app and used 'rails g scaffold' to add more and more controllers and models to it. I used 'rails s' to start a webrick-based development server (it's the default) and 'time wget http://localhost:3000/' to test response time (grab the "real" value). You can't trust the reported response time on the server console because that doesn't include recompile time.  No matter how slow the response time, that pretty much always showed 20-30ms.

RVM was a lifesaver. RVM plus some helper scripts to swap Gemfiles let me test 5 different Ruby versions.

I did 5 test runs with each ruby version at each of 8 different application sizes.  A typical dataset looks something like this:

scaffolds 1 2 3 4 5 avg
1 1.693 0.101 0.078 0.071 0.085 0.08375
50 2.347 0.156 0.157 0.147 0.157 0.15425
100 2.487 0.279 0.269 0.225 0.225 0.2495
200 3.339 0.509 0.466 0.445 0.412 0.458
300 3.682 0.667 0.652 0.672 0.611 0.6505
500 5.130 1.232 1.199 1.188 1.130 1.18725
750 7.536 2.246 2.228 2.138 2.166 2.1945
1000 8.949 3.118 2.965 3.060 3.023 3.0415

The sharp eyed among you have probably realized that the averages listed aren't for all 5 test runs. The first one was always much slower than the others and I'm mainly interested in the subsequent runs, so I'm only averaging the 4 remaining runs.

I did this for Ruby 1.8.7, Ruby 1.9.2, Jruby 1.5.6, Jruby 1.6.0 and Rubinius (rbx) 1.2.0. That's about 200 readings after I got everything all set up and automated.

Now, it should probably be noted that these times are optimistic. The code generated by Rails' scaffold generator is pretty simple. Not a lot of complex control structures and the inheritance hierarchies are very straightforward. Real life code is almost certainly harder to compile.

The Big Picture

Rails Dev scaling graph, 0 - 1000 scaffolds

Well look at that. We've got response time in seconds on the Y axis, with number of scaffolds in the app on the X axis.

First of all, Rubinius is just a lot slower than the others all around. Further, none of them scale linearly. Ruby 1.9.2 is much faster than ruby 1.8.7, but both versions of jruby are even faster than that. There's a nearly 12 second difference in response time between the fastest and slowest implementations and at 1000 scaffolds jruby is over twice as fast as the next fastest implementation.

They all curve upward a bit, meaning that as you add more code it has a greater effect on response times. Jruby is MUCH closer to linear, though.

But there's something else there.

The Smaller Picture

Rails Dev scaling graph, 0-200 scaffolds

This looks a little different. Ruby 1.9.2 wins until around 160 scaffolds. I'm not sure how important this is. Both MRI and Jruby are running at 300ms or less at that point, so I doubt the differences really make a difference to a developer. It IS interesting to note that jruby seems to have a little more fixed overhead but makes up for it by scaling better as you add more code.


For smaller apps, this probably doesn't make any difference whatsoever. For bigger apps, you can help maximize developer effectiveness by picking a ruby that will help them work faster. Jruby seems pretty good, with ruby 1.9.2 coming in second. Stay away from ruby 1.8.7 or rubinius if you're working with larger codebases.

Now, what I'd really like is a way to avoid recompiling everything every time. If I could have Rails recompile just the model or controller I'm working on and skip all the others, that'd be grand. I've taken a couple stabs at it, but I haven't succeeded yet.

Breaking larger apps down into a bunch of smaller apps that use a service oriented architecture will effectively give you that. Each one has a smaller codebase so the recompile time isn't as big of an issue, especially if you set cache_classes = true for all the apps you're not actively working on.

Published on 03/02/2011 at 04h14 under , .

Powered by Typo – Thème Frédéric de Villamil | Photo Glenn