Posts tagged with rails

High-Low Testing

High-low testing has changed the way I build software. When I first started using Rails five years ago, the paradigm of choice was fat model, skinny controller. This guidance is well-intentioned, but the more I worked within its bounds, the more frustrated I became. In particular, with fat models. Why should anything in software be fat?

Fat and complex models spread their entanglements likes roots of an aspen. When models adopt every responsibility, they become intertwined and cumbersome to work with. Development slows and happiness dwindles. The code not only becomes more difficult to reason about, but is more arduous to test. Developer happiness should be our primary concern, but when code is slow and poorly factored, we loosen our grasp on that ideal.

So how can we change the way we build software to optimize for developer happiness? There are a number of valuable techniques, but I've coalesced on one. The immediate win is a significantly faster unit test suite, running at thousands of tests per second. As a guiding principle, the implementation should not suffer while striving to improve testability. Further, though I'll use Rails as an example, such a technique should be transferable to multiple domains, languages and frameworks.

The approach I'm describing is what I call high-low testing. The name says it all; the goal is to test at a high, acceptance level and the low, unit level. The terms "high" and "low" come from the levels of test abstraction in an application. See the diagram below. In a web app, a higher test abstraction executes code that drives browser interactions. Lower test abstractions work with the pieces of logic that eventually compose a functioning system.

This article outlines the practices and generalized approaches to implementing high-low testing in Rails apps of any size. We'll discuss the roadmap to fast tests and how this mindset promotes better object orientation. Hopefully after considering high-low testing, you'll understand how to increase your developer happiness.

Aside: this article is broken into two major sections: the primary content and the appendix. Feel free to ignore the appendix, though it contains many real-world discussion points.

Seeking a better workflow

Before explaining how to apply high-low testing, let's look at the driving forces behind it.

You may be working in a large legacy codebase. Worst case scenario, there are zero tests and the work you're doing needs to build on existing functionality. Since the tests are either slow or nonexistent, you would like a rapid feedback cycle while adding new features. Or, you're refactoring and need to protect against regressions. How do you improve the feedback cycle while working in a legacy codebase?

You may be working on a medium-sized app whose entire test suite could benefit from improved feedback cycles. You'll incrementally move towards a high-low approach and continually reap the benefits along the way. How do you introduce high-low testing with the goal of easy maintainability as the app ages?

If you're working with a greenfield app, you may not benefit immediately from high-low testing. It requires discipline and can potentially be detrimental the code. Nonetheless, if you plan on maintaining the project for the long haul, you'll see near constant test suite times and improved productivity. How do you apply high-low testing without inhibiting the software design?

Regardless of application size, our primary goal is to integrate a sane testing philosophy that doesn't constrain the feedback-driven workflow we practice daily. Fast tests are great for building and refactoring, but we also want additional feedback mechanisms that highlight smells in our code.

Dependencies are a necessary evil

Developer happiness is important. When I'm unhappy, it's likely due to intermingled dependencies or a deranged pair. I can't fix crazy. Anyone who's taken on a rescue project knows the first thing you do is upgrade dependencies (maybe the very first thing you do is write some tests.) As a first step, it's also the most demoralizing because you don't just "upgrade dependencies," you trudge through the spaghetti, ripping things out, rewriting others, and you crawl out the other side in hopes that everything still works. This is an extreme case of exactly what happens in many Rails applications.

Dependencies are evil. I mean both third party dependencies and those you write yourself. Any time one piece of code depends on another, there's an immediate contingency on maintaining the relationship. Realistically, there's no way to remove all dependencies, not that you'd want to anyway. Building maintainable software is primarily about organizing dependencies so they don't pollute each other. They're a necessary and unavoidable evil, but left unkept, they'll eventually erode the codebase.

With Rails in particular, it's almost brainless to introduce dependencies. Just drop a line in the Gemfile. Or, just reference a constant by name. When the app needs it, it's ready and available or is immediately autoloaded. Such dependency management propagates itself throughout the framework and the applications built on top. Most apps I've worked on buy into this convenience to maximize developer productivity, but this productivity boost early in a project becomes a hindrance as it comes of age.

Unsurprisingly, the problem with dependencies is much more prolific than loading the world via a Gemfile. It works its way through every aspect of every piece of software. Rails unfortunately encourages dependency coupling through eagerloading and reliance on the framework. The goal of high-low testing is to remove the discomfort of intermingled dependencies. The resulting code is often better factored, easier and faster to test, and more pleasurable to work with.

Getting started

High-low testing is the practice of working at extreme poles in your test suite. You'll work at the acceptance level to validate happy and questionable paths, and you'll work at the lowest level possible, with the units of your code. See the below diagram. Most Rails applications test across all layers. The integration layer (and acceptance layer) tends to be slow because it's evaluating many parts. When most of an application's business logic is tied to Rails, it must be tested at the integration layer. Depending on your style, you may continue to test at the integration layer, but high-low testing promotes doing so without depending on Rails.

The end result is a full test suite composed of two main characteristics: a speedy unit suite for rapid feedback, and a slow acceptance suite to guard against regressions. I've found the ratio of acceptance to unit tests to be largely weighted towards unit tests. The more tests that can be pushed to the unit level, the more rapid and rewarding the development cycle becomes. The most poignant benefits surface at the unit level, so that's where we'll focus our efforts.

Let's decompress high-low testing.

Testing high

Testing high means working at an acceptance level and driving the browser to verify most things are working as intended. You may cover around half of your application code with acceptance tests, but that's OK, since you'll also be writing unit tests to guard against the edge cases.

Acceptance testing can be done with any tool that drives the browser, but I prefer Capybara on Poltergeist.

In any project, using high-low testing or otherwise, acceptance tests should not exhaustively cover all possible use cases. It should cover basic cases, with further emphasis on features that provide high business value or expose risk. For instance, login and registration should be tested thoroughly, but testing validations on every form is superfluous.

Acceptance tests will always be slow. It's the product of many layers of implementation working together to produce an anthropomorphic interaction with the software. While unit tests are much more akin to computer-computer interaction, acceptance tests give us an extra bit of confidence because they're driving a real browser.

This extra bit of confidence goes a long way to building trust in your test suite, which is the primary reason we test in the first place. We must ensure the software works, with the goal of safe deployment when all dots are green. Few other forms of testing can induce such confidence, which is why acceptance tests are invaluable.

High level tests also execute the entire stack. Everything from the HTTP request, through the router, controller, model and database, and back up to the browser's rendering engine. This is particularly important because acceptance tests can cover parts of the Rails stack that are painful to test. I'm looking at views and controllers.

A single Cucumber feature could cover multiple views and controllers. Testing high means covering pieces of the application that would otherwise burden a developer with excessive maintenance. Unit testing views and controllers introduces unnecessary test churn, so the bang for your buck with acceptance tests is enormous.

Really, high-low testing doesn't change the way you write acceptance tests. Life carries on as usual. You may add some additional acceptance tests to guard against things that would otherwise be tested at the integration layer, but the style is largely the same. The significant changes to a traditional Rails testing approach come in the form of unit tests.

Testing low

Testing low means working at a unit level. When most people think about unit testing Rails apps, it often involves testing methods of models. By "models," I'm referring to classes that inherit from ActiveRecord::Base. In a fat model/skinny controller setup, unit tests would definitely include models. But I avoid fat models like the plague. I want a fast testing environment that encourages better object orientation. To do this, we need to separate ourselves from Rails. It might seem extreme, but it's the most successful way I've scaled projects with high complexity.

This all stems from the nature of dependencies. They (should) always flow downwards. Higher abstractions depend on lower abstractions. As a result, the most dependency-free code resides at the bottom of the stack. The closer we can push code into this dependency-free zone, the easier it is to maintain.

By decoupling our business logic from the framework, we can independently test the logic without loading the framework into memory. The acceptance suite continues to depend on the framework, but by keeping the unit suite separate from the acceptance suite, we can keep our units fast.

Let's look at some code.

Consider an app that employed the fat model, skinny controller paradigm where all of our business logic resides in models inheriting from ActiveRecord::Base. Let's look at a simple User class that knows how to concatenate a first and last name into a full name:

class User < ActiveRecord::Base
  def full_name
    [first_name, last_name].joins(' ')
  end
end

And the tests for this class:

describe User do
  describe '#full_name' do
    user = User.new(first_name: 'Mike', last_name: 'Pack')
    expect(user.full_name).to eq('Mike Pack')
  end
end

What's painful about this? Primarily, it requires the entire Rails environment be loaded into memory before executing the tests. Sure, early in a project's lifetime loading the Rails environment is snappy, but this utopia fades quickly as dependencies are added. Additionally, tools like Spring are brash reactions to slow Rails boot times, and can introduce substantial complexity. The increasingly slow boot times are largely a factor of increasing dependencies.

High-low testing says to move methods out of classes that depend on Rails. What might traditionally fall into an ActiveRecord model can be moved into a decorator, for instance.

require 'delegate'

class UserDecorator < SimpleDelegator
  def full_name
    [first_name, last_name].joins(' ')
  end
end

Notice how we've manually required the delegate library. We'll investigate the benefits of this practice later in the article. The body of the #full_name method is identical, we really didn't need to be anywhere close to Rails for this behavior.

The UserDecorator class would likely be instantiated within the controller, though it could be used anywhere in the application. Here, we allow the controller to prepare the User instance for the view by wrapping it with the decorator:

class UsersController < ApplicationController
  def show
    @user = UserDecorator.new(User.find(params[:id]))
  end
end

Our view can then call @user.full_name to concatenate the first name with the last name.

The test for the UserDecorator class remains similar, with a couple differences. We'll mock the user object and refer to our new UserDecorator class:

require 'spec_helper'
require 'models/user_decorator'

describe UserDecorator do
  describe '#full_name' do
    user = double('User', first_name: 'Mike', last_name: 'Pack')
    decorator = UserDecorator.new(user)

    expect(decorator.full_name).to eq('Mike Pack')
  end
end

There are a few important things to note here. Again, we're explicitly requiring our dependencies: the spec helper and the class under test. We're also mocking the user object (using a double). This creates a boundary between the framework and our application code. Essentially, any time our application integrates with a piece of Rails, the Rails portion is mocked. In large applications, mocking the framework is a relatively infrequent practice. It's also surprising how little the framework does for us. Compared to the business logic that makes our apps unique, the framework provides little more than helpers and conventions.

“Rails is a delivery mechanism, not an architecture.” – Uncle Bob Martin

It's worth noting that mocking can be a powerful feedback mechanism. In the ideal world, we wouldn't have to mock at all. But mocking in this instance tells us, "hey, there's an object I don't want to integrate with right now." This extra feedback can be a helpful indicator when striving to reduce the touchpoints with the framework.

Requiring spec_helper is likely not new to you, but the contents of our new spec_helper will be. Let's look at a single line from a default spec_helper, generated by rspec-rails:

require File.expand_path("../../config/environment", __FILE__)

This is the devil, holding your kitchen sink. This single line will load most of Rails, all your Gemfile dependencies, and all your eager loaded files. It's the "magic" behind being able to freely access any autoloaded class within your tests. It comes with a cost: speed. This is the style of dependency loading that must be avoided. Favor fine-grained loading when speed is of the essence.

Here is what a baseline spec_helper would look like in a high-low setup. We require the test support files (helpers for our tests), we add the app/ directory to the load path for convenience when requiring classes in our tests, and we configure RSpec to randomly order the tests:

Dir[File.expand_path("../support/**/*.rb", __FILE__)].each { |f| require f }

$:.unshift File.expand_path("../../app", __FILE__)

RSpec.configure do |config|
  config.order = "random"
end

Nothing more is needed to get started.

With this setup, you could be testing hundreds or thousands of classes and modules in sub-second suite times. As I've already expressed, the key is separating our tests from Rails. It turns out, most of the business logic in a typical Rails application is plain old Ruby. Rails provides some accessible helpers, but the bulk of logic can be brainlessly extracted into classes that are oblivious to the framework.

Isolating the framework

I want to be clear about something. For the purposes of this article, I'm referring to "integration tests" as those that utilize the framework. Integration means nothing more than two pieces working together, and integration testing is strongly encouraged. The style of integration testing that's discouraged is that which integrates with the framework. I am not advocating that you mock anything but the framework. Heavy mocking has known drawbacks, and should be avoided.

Mock only the pieces that rely on the framework. Integration between low-level business objects can often be safely tested in conjunction without mocking. What's important is isolating the framework from the business logic. The below diagram illustrates this separation.

The important thing to take away from this diagram is that below the mocking layer, you can test in whatever style is most comfortable. If you want to test multiple objects as they execute in symphony, introduce an integration test. In the cases where you're testing simple business rules that have no dependencies, unit test them normally. In either case, each file will explicitly require its dependencies.

If the mocking layer becomes cumbersome, additional tooling can be leveraged to more easily work with it. For instance, factory_girl can be used to encapsulate the data required of the mocks and separate the framework from the business logic. Notice this factory uses an OpenStruct instead of the implicit User class, which forms the moat needed to keep Rails out of the tests:

require 'ostruct'

FactoryGirl.define do
  factory :user, class: OpenStruct do
    first_name 'Mike'
  end
end

By using OpenStruct, instances created by FactoryGirl will be basic OpenStruct objects instead of objects relying on Rails. We can use these surrogate factories as you would a normal factory:

it 'has a first name' do
  expect(build(:user).first_name).to eq('Mike')
end

Example refactor to isolate the framework

This isolation from Rails encourages better object orientation. Let's refactor a fat model into the more ideal high-low testing structure. The example below is inspired by an actual class in a large client project. I've rewritten it here for anonymity and brevity. The class encapsulates a cooking recipe. It performs two main behaviors:

  • calculate the total calories in the recipe
  • persist a value when someone favorites the recipe (using Redis::Objects)
class Recipe < ActiveRecord::Base
  has_many :ingredients
  has_many :foods, through: :ingredients

  def total_calories
    foods.map(&:calories).inject(0, &:+)
  end

  def favorite
    Redis::Counter.new("favorites:recipes:#{id}").increment
  end
end

A naive programmer would consider this sufficient. It works, right? There's one argument I find compelling in favor of keeping it this way: "if I needed to find something about recipes, the Recipe model is the first place I'd look." But this doesn't scale with domain fragmentation and comprehension. At some point, the model contains many thousands of lines of code, becoming very tiresome to work with. Thousands of small classes with hundreds of lines of code each is much more comprehensible than hundreds of classes with thousands of lines of code.

An experienced programmer will instantly spot the problems with this class. Namely, it performs disparate roles. In addition to the hundreds of methods Rails provides through ActiveRecord::Base, it can also calculate the total calories for the recipe and persist a tangential piece of information regarding user favorites.

If the programmer were cognizant of high-low testing, their instincts would tell them there's a better way to structure this code to enable faster and improved tests. As an important side effect, we also produce a more object-oriented structure. Let's pull the core business logic out of the framework-enabled Recipe model and into their own standalone classes:

class Recipe < ActiveRecord::Base
  has_many :ingredients
  has_many :foods, through: :ingredients

  def total_calories
    CalorieCalculator.new(foods).total
  end

  def favorite
    FavoritesTracker.new(self).favorite
  end
end

By making this minor change, we gain a slew of benefits:

  • The code is self documenting and easier to reason about without having to understand the underlying implementation.
  • The multiple responsibilities of calculating the total calories or persisting favorites has been moved into discrete classes.
  • The newly introduced classes can be tested in isolation of the Recipe class and Rails.
  • Dependencies have been illuminated in require statements and inputs into methods.

Before this improvement, to test the #total_calories method, we were required to load Rails. After the refactor, we can test this logic in isolation. The original implementation calculated total calories from foods fetched through its has_many :foods association. Though not impossible, under test, it would have been nasty to calculate the total calories from something other than the association. In the below code, we see it's quite easy to test this logic without Rails:

require 'spec_helper'
require 'models/calorie_calculator'

describe CalorieCalculator do
  describe '#total' do
    it 'sums the total calories for all foods' do
      foods = [build(:food, calories: 50), build(:food, calories: 25)]
      calc = CalorieCalculator.new(foods)

      expect(calc.total).to eq(75)
    end
  end
end

The original code encouraged us to use the database to test its logic, which is unnecessary since the logic works purely in memory. The improved code breaks this coupling by making implicit dependencies explicit. Now, we can more easily test the logic without a database.

We're one step closer to removing Rails from the picture; let's realize this goal now. Everywhere that Recipe#total_calories was being called can be changed to refer to the new CalorieCalculator class. Similarly for the #favorite method. This refactor would likely be done in the controller, but maybe in a presenter object. Below, the new classes are used in the controller and removed from the model:

class RecipesController < ApplicationController
  before_filter :find_recipe

  def show
    @total_calories = CalorieCalculator.new(@recipe.foods).total
  end

  def favorite
    FavoritesTracker.new(@recipe).favorite

    redirect_to :show
  end
end

Now our Recipe class looks like this:

class Recipe < ActiveRecord::Base
  has_many :ingredients
  has_many :foods, through: :ingredients
end

An end-to-end test should be added to round out the coverage. This will ensure that the controller truly does call our newly created classes.

As you can see from the above example, high-low testing encourages better object orientation by separating the business logic from the framework. Our first step was to pull the logic out of the model to make it easier and faster to test. This not only improved testability but drove us towards a design that was easier to reason about.

Our second step reintroduced the procedural nature of the program. Logic flows downwards in the same sense that computers execute instructions sequentially. Large Rails models often contain significant amounts of redirection as models talk amongst each other. By moving our code back into the controller, we've reduced this redirection and exposed its procedural nature, making it easier to understand. As we continue through the article, we'll look at even more ways high-low testing can improve the quality of a codebase.

Feedback, feedback, feedback

High-low testing is all about providing the right feedback at the right times. Do you need immediate feedback about application health as you work your way through a refactor? Keep your unit tests running on every change with Guard. Do you need a sanity check that everything is safe to deploy? Run your acceptance suite.

It's been said many times by others, but feedback cycles are pertinent to productive software engineering. The goal of high-low testing is to provide quick feedback where you need it, and defer slow feedback for sanity checks. The shortcoming of many current Rails setups is they don't delineate these two desires. Inevitably, everything eventually falls into the slow feedback cycle and it becomes increasingly difficult to recuperate.

Let's look at how we can embrace short feedback cycles in applications of all sizes. Unless you're curious, feel free to skip sections that don't apply to your current projects. We'll talk about working in legacy, middle-aged and greenfield applications.

Working with legacy codebases

The truth about legacy codebases is they will always be legacy. No matter how aggressively we refactor, there will always be a sense of technical debt. The goal is to eliminate future debt.

It's easy to apply high-low testing principles to legacy codebases, but it's done in addition to existing tests. If acceptance tests are already in place, continue to acceptance test in the prescribed fashion. If they're not in place, begin adding them as new features are built. Backfilling tests for existing features is also a great step towards refactoring.

You won't be able to immediately affect existing unit tests. These will require refactoring. But you can introduce new functionality as a suite that runs independent from any existing tests. To do so, create a new thin spec_helper, as described above, and begin requiring that instead of your normal spec_helper that loads Rails. You will now have three test suites:

  • Your normal acceptance suite
  • Your existing unit suite that loads Rails
  • Your new unit suite that does not load Rails

So, how does it fit into your workflow? The way I've done it is to keep my Rails-less unit suite continuously running with Guard. New functionality will be tested in the new manner, and the rapid feedback cycles will be immediately beneficial.

Run your acceptance suite as you would normally, but break your unit tests into separate directories. my_app/spec/slow and my_app/spec/fast would be one way to accomplish this. You'll then run your unit suites independently:

rspec spec/slow
rspec spec/fast

The latter should be run with Guard. If you want to run all units, just truncate the directory:

rspec spec

Of course, this will load the Rails environment and you'll lose the benefits of speedy tests, but it can be done as a sanity check before a deploy, for instance.

The unit suite that requires Rails will continue to be slow, as will the acceptance tests. The slow unit suite will act as checks against regressions, much like the acceptance suite. The new, faster suite will keep you productive moving forward and act as a goalpost for refactorings. To maximize the benefits of the new suite, existing unit tests and code can be reengineered to use it. In the ideal sense, all units of code would be refactored away from Rails, but I won't blow smoke up your ass.

Legacy codebases aren't the most conducive to high-low testing but medium-sized codebases are a perfect fit. Let's look at those next.

Working with medium-sized codebases

High-low testing feels most natural while working in semi-mature codebases which exhibit continual growth yet don't feel bogged down by past decisions. The business domain within codebases of this size is well understood, which means the abstractions encouraged by high-low testing can be expressed easily. Further, while medium-sized codebases still progress regularly, there's often enough time and resources to thoroughly consider design decisions for the sake of longevity.

To be clear, when I use the terms "legacy" and "medium-sized" codebases, I mean for them to be mutually exclusive. A medium-sized codebase can often feel legacy, but I'm referring to medium-sized codebases who have either employed high-low testing from the get-go or have refactored into a high-low testing paradigm.

As it pertains to medium-sized codebases, high-low testing exudes its worth ten times over. The ability to add functionality, refactor, and delete features becomes significantly faster than if a traditional fat model, skinny controller approach was taken. The primary benefit is, of course, the speed of the unit suite.

Consider the case where you're asked to share a feature amongst varying but not dissimilar parts of an application, say billing. If originally the billing code was constructed to work in only one part of the app and now needs to be shared across multiple features, this would generally be considered a serious effort. If the test suite loads Rails, you could be looking at a 3-5 second delay every time the suite is run. Likely, these tests are also slow as they depend on many things and utilize sluggish and unnecessary IO-bound operations.

With high-low testing, hundreds or thousands of specs can be run in under a second with little to no boot time. The feedback is immediate. This is achieved exclusively by mocking slow dependencies. Mocking the database, API responses, file reads, and otherwise slow procedures keeps the suite fast.

Throughout a refactor, the test suite should always be a measure of progress. Normally, you begin by ripping out existing functionality, breaking a ton of tests. You sort of eyeball it from there and begin constructing the necessary changes. You update the code and the tests, and there's a large degree of uncertainty. Don't worry about proceeding while the tests are red. You'll be red, and you'll remain red until you're done refactoring.

By continually running the test suite on every change (because it's so fast), you'll always have a sense of how far you've strayed. Are things breaking that you didn't expect? Is the whole suite failing or only the related segments? Continually running the tests is an excellent reminder of the progress of your refactor. If you bite off too much, your tests are there to tell you. This rapid workflow is unachievable with most suites I've seen.

While high-low testing is a perfect match for medium-sized apps, it can and should be applied throughout the project's lifetime. Let's next look at the implications throughout an app's inception.

Working with greenfield apps

Admittedly, high-low testing is not entirely natural in greenfield apps. It works best when the abstractions are well understood. In a greenfield app, the abstractions are often impossible to see. These apps are changing so rapidly that it would be more harmful to introduce an abstraction than to carry on blindly. Abstractions are generalizations of implementation. With rapid additions, the generalizations being made are full of folly.

But there is a way to apply high-low testing to such a situation, and it requires careful forethought and a bit of patience. The end result is almost always a better design and a stronger basis for refactoring. Let's look at a situation where it's difficult to see abstractions, and therefore difficult to decouple the framework from the business logic.

Say you're building a blog, and you need to list all the posts. Using the Rails way, you'd find yourself constructing a query in a controller actions:

class PostsController < ApplicationController
  def index
    @posts = Post.published
  end
end

Surely you either want to test that .published was called on Post or that .published returns the correct set which excludes unpublished posts. It feels awkward to abstract this single line into its own class. If you were to abstract it, what would you call the class? PostFinder? ViewingPostsService? You don't yet have enough context to know where this belongs and what it will include in the future.

One might say you should extract it anyway under the assumption that it will shortly change. Since the unit tests are so fast, it will be an easy refactor. This is a fair assessment. I don't have a silver bullet, but I will say this. Hold off on the abstraction. Don't create another class, and don't unit test this logic. Build a barebones acceptance test to cover this. A simple "the posts index page should have only published posts" acceptance test will often suffice. The point at which you better understand how this logic will change over time, extract it into its own class and write unit tests.

I'm still experimenting in this area, but I think it's worth practicing to form a discipline and awareness of the implications. The goal, of course, is to find a healthy balance of introducing the right abstractions at the right time. With patience, high-low testing is an effective way of achieving that goal.

Ancillary feedback

Testing high-low exposes some fantastic ancillary benefits that sort of just fall into your lap. Foremost is the constant feedback from transparent dependency resolution.

In most Rails apps, you see a lot of code that utilizes autoloading. It's a load-the-world approach that we're trying to avoid. For instance, we may have one class that refers to another:

class User < ActiveRecord::Base
end
class WelcomeUserService
  def send_email(id)
    @user = User.find(id)
    # Send the user an email
  end
end

How did the WelcomeUserService class find the User class? Through conventions and a little setup, Ruby and Rails was able to autoload the User class on demand, as it was referenced. In short, Rails uses #const_missing to watch for when the User class is referenced, and dynamically loads it at runtime.

In the above example, there is no reason for the WelcomeUserService class to have to manually require the User class. This is a nice convenience, but imagine your class is growing over time and eventually autoloads 10 dependent classes. How will you know how many dependencies this class has? I'll say it again, dependencies are evil.

With high-low testing, you're removing yourself from Rails' autoload behavior. You'll no longer be able to benefit from autoloading, which is a good thing. If you can't live without autoloading, you could construct your own autoloading setup, but I would encourage you to consider the following.

Every time you build a class or test disjunct from Rails, you must explicitly enumerate all of your requirements. The WelcomeUserService class would look more like this:

require 'models/user'

class WelcomeUserService
  def send_email(id)
    @user = User.find(id)
    # Send the user an email
  end
end

Notice we're explicitly requiring the User class at the top. Any other dependencies would be required in a similar fashion. For our tests, we would also require the class under test:

require 'spec_helper'
require 'services/welcome_user_service'

describe WelcomeUserService do
  ...
end

This might appear to be a step backwards, but it's most certainly not. Manually requiring your dependencies is another feedback mechanism for code quality. If the number of dependencies grows large for any given class, not only is it an eyesore, but it's an immediate reminder that something may be awry. If a class depends on too many things, it will become increasingly difficult to setup for testing, and may be an indicator of too many responsibilities. Manually requiring dependencies keeps this in check.

In general, the more feedback mechanisms you can create around dependency coupling, the higher the object cohesion. Such feedback mechanisms encourage better object-oriented design.

Know thy dependencies

Through practicing this technique, one thing has become overwhelmingly apparent: working at the poles produces more dependency revealing code. That's what creating fast tests is all about. If the dependencies are transparent, obvious and managed, building fast tests is simple. If a system's unit tests avoid wandering towards higher abstractions, the dependencies can be managed.

What we're really trying to do is wrangle dependencies in a way that doesn't inhibit future growth. There are two options: boot-the-world style eager loading or independently requiring where necessary. In the interest of fast tests and providing additional feedback mechanisms, high-low testing encourages the latter.

Finding a balance

Testing high means you can feel confident that the individual pieces of your software are glued together and functional. This confidence goes a long way when you're shipping to production multiple times per day. You'll feel more comfortable in your own skin and vehemently reassured that the work you're doing is productive.

Testing low means you can focus on the individual components that comprise the functional system. The rapid feedback mechanisms provided by high-low testing aid in forging well-crafted software. The speed of the unit suite can be a precious asset when refactoring or quickly check the health of the system. Fast tests that encourage better object-orientation helps release the endorphins needed to keep your team happy and stable.

Like everything else in software engineering, deciding whether to apply high-low testing is full of tradeoffs. When is the right time to create abstractions? How important are fast tests? How easy is it for the team to understand the goals? These are all highly relevant questions that each organization must discuss.

If a team decides to practice high-low testing, the benefits are invaluable. Near-instantaneous unit tests at scale, better object-oriented design, and a keen understanding of system dependencies are amongst the perks. Such perks can prevent code cruft, especially in larger codebases. Since the principles of high-low testing can be easily conveyed to others, it becomes a nice framework for discussion.

Even if the approaches discussed in this article don't fit your style, the core concepts can be the foundation for further enlightenment. It may seem academic, but I will attest that over the past four years of applying high-low testing, I've found my code to be better factored and easier to maintain. Your mileage may vary, but with enough persistence, I believe most teams will largely prosper both in productivity and happiness.

This concludes the explanation of high-low testing, so feel free to finish reading. If you'd like to explore more real-world tradeoffs, please continue reading the following sections.






The Appendix — Some unanswered questions

This article has mostly outlined some high level details about high-low testing. We haven't addressed many common issues and how to accomplish fast tests within the confines of a real project. Let's address that now.

High-low is not a prescription

What's being described here is not a prescription for success, but an outline for fast tests. Developer happiness is the primary goal. Applying these techniques to your codebase does not guarantee you will produce better object-oriented code. Though alone it will likely encourage better choices, additional architectural paradigms are required to create well-designed and maintainable code.

There are plenty such architectural paradigms that accomplish the goal of decoupling the application from the framework. Two such examples are DCI and hexagonal architecture. Other less-comprehensive examples can be found in this excellent article on decomposing ActiveRecord models. In this article, we've used decorators and service objects as examples of framework decoupling.

Why is testing Rails unique?

In just about all the apps I've worked on, I can say that I've felt happy testing only a handful. As a community growing out of it's infancy, we're beginning to realize that the prescribed testing patterns passed down from previous generations are inadequately accommodating the problems we're facing today. Particularly, problems of scale. As compared to even five years ago, Rails has grown, dependencies have grown, apps have grown and teams have grown.

The same principles from the Java world we so vehemently defied are the only principles that will save this community. No, we don't want Rails to become J2EE. Introducing new and impressionable blood to the Ruby world *should* be done in such a way that encourages ease of use. Ease of use often comes from few abstractions, which is why we're here today.

Rails hates business abstractions. Coincidentally, framework abstractions are OK, like routing constraints. You're encouraged to use three containers: models, views and controllers. Within those containers, you're welcome to use concerns, which are a displacement of logic. 

The problem is probably better described as frameworks who discourage abstractions. This is littered throughout Rails. Take, for example, custom validators. You'd think to yourself, "if this system is properly constructed, I'd be able to use standard validators in my custom validator." You'd be partially right.

Let me show you an example. Say I want to use a presence validator in my custom validator:

class MyCustomValidator < ActiveModel::Validator
  def validate(record)
    validator = ActiveModel::Validations::PresenceValidator.new(attributes: [:my_field])
    validator.validate(record)
  end
end

This works, if you expect your model to be intrinsically tied to your validations. The above code will properly validate the presence of :my_field, but when the validation fails, errors will be added to the model. What if your model doesn't have an errors field? The PresenceValidator assumes you're working with an ActiveModel model, as if no other types of models exist. Not only that, but your model also needs to respond to various other methods, like #read_attribute_for_validation.

I want something more akin to this API:

class MyCustomValidator < ActiveModel::Validator
  def validate(record)
    validator = ActiveModel::Validations::PresenceValidator.new
    validator.validate('A value') #=> true
    validator.validate(nil) #=> false
    validator.validate('') #=> false
  end
end

There is no need for the PresenceValidator to be tied to ActiveModel, but it is and so are many other things in Rails. Everyone knows, if you don't follow The Rails Way, you'll likely experience some turbulence. Yes, there should be guidelines, it should not be the frameworks job to mandate how you architect your software.

Isn't the resulting code harder to understand?

An argument could certainly be made to illustrate the deficiencies of code produced as a result of high-low testing. The argument would likely include rhetoric around a mockist style of testing. Indeed, a mockist style can certainly encourage some odd approaches, including dependency injection and superclass inheritance mocking.

These arguments are valid and exhibit important trade offs to weigh. However, if the mocking boundaries are well-defined and controlled, they can be minimized.

The most prominent mocking territory in high-low tested apps is the boundary between Rails and the application logic. In most large applications I've seen, the application logic comprises the majority of the system. The framework leverages the application on the web. It's important to clearly define this boundary so that the framework does not bleed into the domain.

If this separation is properly maintained, the amount of additional mockist style techniques can be nonexistent, if desired. The application logic can be tested in a classical, mockist or other style, depending on the preferences of the team.

Don't fall down the slippery slope

You may be inclined to load a single Rails dependency because you're used to various APIs. ActiveSupport, for instance, contains some useful tools and you may feel inclined to allow your application to rely on such a library. Adding it seems harmless at first, but beware of the slippery slope of dependencies.

You might think to yourself, "adding ActiveSupport will add, what, 1 second to my test suite boot time?" And you may be correct. But when you consider that ActiveSupport comprises about 43K LOC or 37% of the entire Rails codebase as of 4.1.1, and that you'll likely be using a very very small fraction of the packaged code, the tradeoffs feel far too imbalanced.

Instead, build your own micro-library to handle the conveniences you need. Maybe this request seems ludicrous, but I guarantee you'll be surprised by how few methods you actually need from ActiveSupport. The tradeoff for speed is compelling and warranted.

What about validations, associations and scopes?

You have a few options, none of which are ideal.

Firstly, you shouldn't be testing associations. Let's be serious, if your associations aren't in place, a lot is broken.

If you must test your validations because they're complex or you don't believe the single line it took to write it is enough of an assurance, one option is to test them at the acceptance level. This is a slow means of verifying validations are in place, but it is effective:

Scenario: User signs up with incomplete data
  Given I am on the registration page
  When I press "Submit"
  Then I should see "Username can't be blank"
  And I should see "Email can't be blank"

Slow and cumbersome, but effective.

If you're testing a complex validation, it's probably best to move the validation into a designated ActiveModel::Validator class. Since this class depends on Rails, you'll need to compose some objects so Rails isn't loaded within your tests. It works well to have a mixin that does the true validation, which gets included in your ActiveModel::Validator class:

module ComplexValidations
  def validate(record)
    record.errors[:base] = 'Failed validations'
  end
end

This would be mixed into the validator:

class ComplexValidator < ActiveModel::Validator
  include ComplexValidations
end

Testing the mixin doesn't require Rails:

require 'spec_helper'
require 'validations/complex_validations'

describe ComplexValidations do
  let(:validator) do
    Class.new.include(ComplexValidations).new
  end

  it 'validates complexities' do
    errors = {}
    record = double('AR Model', errors: errors)

    validator.validate(record)

    expect(errors[:base]).to eq('Failed validations')
  end
end

Another technique, and my preferred approach, is to introduce a third suite, similar to an existing unit suite in a legacy codebase. The third suite would live exclusively for testing things of this nature and comprised of slow unit tests. Of course, this suite should not be continually run with Guard, but will exist for situations that need it, like testing validations, associations and scopes. Be careful though, this introduces a slippery slope that makes it easy to neglect high-low testing and throw everything into the slow unit suite.

Scopes are tricky because they do contain valuable business logic that does warrant tests. Just like validations, verifying scopes can be moved to the acceptance level, but my recommended approach is to introduce a slow unit suite to test these.

If like me, you cringe at the thought of introducing a slow unit suite, one option is to stub the scopes. Set up a mixin like the previous example and use normal Rails APIs:

module PostScopes
  def published
    where(published: true)
  end
end

When testing, just assert that the correct query methods are called:

describe PostScopes do
  subject do
    Class.new.extend(PostScopes)
  end

  describe '.published' do
    it 'queries for published posts' do
      expect(subject).to receive(:where).with(published: true)

      subject.published
    end
  end
end

More complex mocking is necessary if the scope performs multiple queries:

def latest
  where(published: true).order('published_at DESC').first
end
describe '.latest' do
  it 'queries for the last published post' do
    relation = double('AR::Relation')

    expect(subject).to receive(:where).with(published: true).and_return(relation)
    expect(relation).to receive(:order).with('published_at DESC').and_return(relation)
    expect(relation).to receive(:first)

    subject.latest
  end
end

This level of mocking is too aggressive for my taste, but it works and allows for all the convenient scoping behavior we're used to. It's a little brittle, but damn fast.

Posted by Mike Pack on 07/15/2014 at 10:45AM

Tags: testing, rails, ruby, high-low, rspec


The Right Way to Code DCI in Ruby

Many articles found in the Ruby community largely oversimplify the use of DCI. These articles, including my own, highlight how DCI injects Roles into objects at runtime, the essence of the DCI architecture. Many posts regard DCI in the following way:

class User; end # Data
module Runner # Role
  def run
    ...
  end
end

user = User.new # Context
user.extend Runner
user.run

There's a few flaws with oversimpilfied examples like this. First, it reads "this is how to do DCI". DCI is far more than just extending objects. Second, it highlights #extend as the go-to means of adding methods to objects at runtime. In this article, I would like to specifically address the former issue: DCI beyond just extending objects. A followup post will contain a comparison of techniques to inject Roles into objects using #extend and otherwise.

DCI (Data-Context-Interaction)

As stated previously, DCI is about much more than just extending objects at runtime. It's about capturing the end user's mental model and reconstructing that into maintainable code. It's an outside → in approach, similar to BDD, where we regard the user interaction first and the data model second. The outside → in approach is one of the reasons I love the architecture; it fits well into a BDD style, which further promotes testability.

The important thing to know about DCI is that it's about more than just code. It's about process and people. It starts with principles behind Agile and Lean and extends those into code. The real benefit of following DCI is that it plays nicely with Agile and Lean. It's about code maintainability, responding to change, and decoupling what the system does (it's functionality) from what the system is (it's data model).

I'll take a behavior-driven approach to implementing DCI within a Rails app, starting with the Interactions and moving to the Data model. For the most part, I'm going to write code first then test. Of course, once you have a solid understanding of the components behind DCI, you can write tests first. I just don't feel test-first is a great way of explaining concepts.

User Stories

User stories are an important feature of DCI although not distinct to the architecture. They're the starting point of defining what the system does. One of the beauties of starting with user stories is that it fits well into an Agile process. Typically, we'll be given a story which defines our end-user feature. A simplified story might look like the following:

"As a user, I want to add a book to my cart."

At this point, we have a general idea of the feature we'll be implementing.

Aside: A more formal implementation of DCI would require turning a user story into a use case. The use case would then provide us with more clarification on the input, output, motivation, roles, etc.

Write Some Tests

We should have enough at this point to write an acceptance test for this feature. Let's use RSpec and Capybara:

spec/integration/add_to_cart_spec.rb

describe 'as a user' do
  it 'has a link to add the book to my cart' do
    @book = Book.new(:title => 'Lean Architecture')
    visit book_path(@book)
    page.should have_link('Add To Cart')
  end
end

In the spirit of BDD, we've started to identify how our domain model (our Data) will look. We know that Book will contain a title attribute. In the spirit of DCI, we've identified the Context for which this use case enacts and the Actors which play key parts. The Context is adding a book to the cart. The Actor we've identified is the User.

Realistically, we would add more tests to further cover this feature but the above suits us well for now.

The "Roles"

Actors play Roles. For this specific feature, we really only have one Actor, the User. The User plays the Role of a customer looking to add an item to their cart. Roles describe the algorithms used to define what the system does.

Let's code it up:

app/roles/customer.rb

module Customer
  def add_to_cart(book)
    self.cart << book
  end
end

Creating our Customer Role has helped tease out more information about our Data model, the User. We now know we'll need a #cart method on any Data objects which plays the Customer Role.

The Customer role defined above doesn't expose much about what #cart is. One design decision I made ahead of time, for the sake of simplicity, is to assume the cart will be stored in the database instead of the sesssion. The #cart method defined on any Actor playing the Customer Role should not be an elaborate implementation of a cart. I merely assume a simple association.

Roles also play nicely with polymorphism. The Customer Role could be played by any object who responds to the #cart method. The Role itself never knows what type of object it will augment, leaving that decision up to the Context.

Write Some Tests

Let's jump back into testing mode and write some tests around our newly created Role.

spec/roles/customer_spec.rb

describe Customer do
  let(:user) { User.new }
  let(:book) { Book.new }

  before do
    user.extend Customer
  end

  describe '#add_to_cart' do
    it 'puts the book in the cart' do
      user.add_to_cart(book)
      user.cart.should include(book)
    end
  end
end

The above test code also expresses how we will be using this Role, the Customer, within a given Context, adding a book to the cart. This makes the segway into actually writing the Context dead simple.

The "Context"

In DCI, the Context is the environment for which Data objects execute their Roles. There is always at least one Context for every one user story. Depending on the complexity of the user story, there may be more than one Context, possibly necessitating a story break-down. The goal of the Context is to connect Roles (what the system does) to Data objects (what the system is).

At this point, we know the Role we'll be using, the Customer, and we have a strong idea about the Data object we'll be augmenting, the User.

Let's code it up:

app/contexts/add_to_cart_context.rb

class AddToCartContext
  attr_reader :user, :book

  def self.call(user, book)
    AddToCartContext.new(user, book).call
  end

  def initialize(user, book)
    @user, @book = user, book
    @user.extend Customer
  end

  def call
    @user.add_to_cart(@book)
  end
end

Update: Jim Coplien's implementation of Contexts uses AddToCartContext#execute as the context trigger. To support Ruby idioms, procs and lambdas, the examples have been changed to use AddToCartContext#call.

There's a few key points to note:

  • A Context is defined as a class. The act of instantiating the class and calling it's #call method is known as triggering.
  • Having the class method AddToCartContext.call is simply a convenience method to aid in triggering.
  • The essence of DCI is in @user.extend Customer. Augmenting Data objects with Roles ad hoc is what allows for strong decoupling. There're a million ways to inject Roles into objects, #extend being one. In a followup article, I'll address other ways in which this can be accomplished.
  • Passing user and book objects to the Context can lead to naming collisions on Role methods. To help alleviate this, it would be acceptable to pass user_id and book_id into the Context and allow the Context to instantiate the associated objects.
  • A Context should expose the Actors for which it is enabling. In this case, attr_reader is used to expose @user and @book. @book isn't an Actor in this Context, however it's exposed for completeness.
  • Most noteably: You should rarely have to (impossibly) #unextend a Role from an object. A Data object will usually only play one Role at a time in a given Context. There should only be one Context per use case (emphasis: per use case, not user story). Therefore, we should rarely need to remove functionality or introduce naming collisions. In DCI, it is acceptable to inject multiple Roles into an object within a given Context. So the problem of naming collisions still resides but should rarely occur.

Write Some Tests

I'm generally not a huge proponent of mocking and stubbing but I think it's appropriate in the case of Contexts because we've already tested running code in our Role specs. At this point we're just testing the integration.

spec/contexts/add_to_cart_context_spec.rb

describe AddToCartContext do
  let(:user) { User.new }
  let(:book) { Book.new }

  it 'adds the book to the users cart' do
    context = AddToCartContext.new(user, book)
    context.user.should_recieve(:add_to_cart).with(context.book)
    context.call
  end
end

The main goal of the above code is to make sure we're calling the #add_to_cart method with the correct arguments. We do this by setting the expectation that the user Actor within the AddToCartContext should have it's #add_to_cart method called with book as an argument.

There's not much more to DCI. We've covered the Interaction between objects and the Context for which they interact. The important code has already been written. The only thing left is the dumb Data.

The "Data"

Data should be slim. A good rule of thumb is to never define methods on your models. This is not always the case. Better put: "Data object interfaces are simple and minimal: just enough to capture the domain properties, but without operations that are unique to any particular scenario" (Lean Architecture). The Data should really only consist of persistence-level methods, never how the persisted data is used. Let's look at the Book model for which we've already teased out the basic attributes.

class Book < ActiveRecord::Base
  validates :title, :presence => true
end

No methods. Just class-level definitions of persistence, association and data validation. The ways in which Book is used should not be a concern of the Book model. We could write some tests around the model, and we probably should. Testing validations and associations is fairly standard and I won't cover them here.

Keep your Data dumb.

Fitting Into Rails

There's not a whole lot to be said about fitting the above code into Rails. Simply put, we trigger our Context within the Controller.

app/controllers/book_controller.rb

class BookController < ApplicationController
  def add_to_cart
    AddToCartContext.call(current_user, Book.find(params[:id]))
  end
end

Here's a diagram illustrating how DCI compliments Rails MVC. The Context becomes a gateway between the user interface and the data model.

MVC + DCI

What We've Done

The following could warrant it's own article, but I want to briefly look at some of the benefits of structuring code with DCI.

  • We've highly decoupled the functionality of the system from how the data is actually stored. This gives us the added benefit of compression and easy polymorphism.
  • We've created readable code. It's easy to reason about the code both by the filenames and the algorithms within. It's all very well organized. See Uncle Bob's gripe about file-level readability.
  • Our Data model, what the system is, can remain stable while we progress and refactor Roles, what the system does.
  • We've come closer to representing the end-user mental model. This is the primary goal of MVC, something that has been skewed over time.

Yes, we're adding yet another layer of complexity. We have to keep track of Contexts and Roles on top of our traditional MVC. Contexts, specifically, exhibit more code. We've introduce a little more overhead. However, with this overhead comes a large degree of prosperity. As a developer or team of developers, it's your descretion on whether these benefits could resolve your business and engineering ailments.

Final Words

Problems with DCI exist as well. First, it requires a large paradigm shift. It's designed to compliment MVC (Model-View-Controller) so it fits well into Rails but it requires you to move all your code outside the controller and model. As we all know, the Rails community has a fetish for putting code in models and controllers. The paradigm shift is large, something that would require a large refactor for some apps. However, DCI could probably be refactored in on a case-by-case basis allowing apps to gradually shift from "fat models, skinny controllers" to DCI. Second, it potentially carries performance degradations, due to the fact that objects are extended ad hoc.

The main benefit of DCI in relation to the Ruby community is that it provides a structure for which to discuss maintainable code. There's been a lot of recent discussion in the vein of "'fat models, skinny controllers is bad'; don't put code in your controller OR your model, put it elsewhere." The problem is we're lacking guidance for where our code should live and how it should be structured. We don't want it in the model, we don't want it in the controller, and we certainly don't want it in the view. For most, adhering to these requirements leads to confusion, overengineering, and a general lack of consistency. DCI gives us a blueprint to break the Rails mold and create maintainable, testable, decoupled code.

Aside: There's been other work in this area. Avdi Grimm has a phenominal book called Objects on Rails which proposes alternative solutions.

Happy architecting!

This article is translated to Serbo-Croatian.

Further Reading

DCI: The King of the Open/Closed Principle
DCI With Ruby Refinements
DCI Role Injection in Ruby
Benchmarking DCI in Ruby

Posted by Mike Pack on 01/24/2012 at 12:58PM

Tags: testing, dci, ruby, rails, architecture


Benchmarking DCI in Ruby

I've recently become quite intrigued with the concepts behind DCI (Data Context and Interaction). I won't go too in depth about what DCI is or why you might use it, that's been discussed many times elsewhere. In short, DCI is an architecture which allows us to delineate our domain objects from the actual functions they perform. It mixes-in Roles (functionality) into our Data component when and only when that functionality it needed; when in Context. Most of the value DCI brings to the table derives from the way it forces you to abstract out behavior into testable modules.

What I'd like to do is take a look at the performance implications of using DCI in Ruby applications.

I think it should be said upfront that this is purely academic and may have minimal bearing within the ecosystem of a complex app. For this sake, I won't try to draw any vast conclusions.

How to use DCI in Ruby

DCI can be used in Ruby by augmenting your objects with Roles at runtime so that the necessary interactions are available to that object.

class User
  ...
end

module Runner
  def run
    ...
  end
end
 
user = User.new
user.extend Runner
user.run 

In more traditional, idiomatic Ruby you would normally just include the module while defining the class:

class User
  include Runner
  ...
end

user = User.new
user.run 

Hypothesis

Since every #extend called carries some memory and processing implications, lately I've been wondering if, while using DCI, we could be incurring a performance hit when extending many objects ad hoc. I decided to profile this to understand if we could be blindly degrading performance and whether there are optimization techniques I should be aware of.

My process involves taking the most simplified example (shown in the above snippets) and benchmarking the traditional approach against the DCI-inclined approach.

I'm running these benchmarks on a MacBook Pro - 2.2 GHz - 8 GB memory.

The Runner Module

Here's the Runner module used in the following examples. It's just one method that does some arbitrary calculation.

runner.rb

module Runner
  def run
    Math.tan(Math::PI / 4)
  end
end

Ruby Benchmark

Using Ruby's Benchmark library, we can extrapolate the amount of time taken for these processes to execute. First, we'll benchmark the traditional, idiomatic Ruby way: using an include to augment the class.

include_bm.rb

require 'benchmark'
require './runner'

class IncludeUser
  include Runner
end

Benchmark.bm do |bench|
  3.times do
    bench.report('include') do
      1000000.times do
        user = IncludeUser.new
        user.run
      end
    end
  end
end
$ ruby include_bm.rb
         user       system     total       real
include  0.500000   0.000000   0.500000 (  0.497114)
include  0.500000   0.000000   0.500000 (  0.497363)
include  0.490000   0.000000   0.490000 (  0.497342)

The results of this benchmark tell us that executing 1 million "run" operations results in roughly 0.5 seconds.

Let's look at how this compares to the DCI implementation.

dci_bm.rb

require 'benchmark'
require './runner'

class DCIUser; end

Benchmark.bm do |bench|
  3.times do
    bench.report('DCI') do
      1000000.times do
        user = DCIUser.new
        user.extend Runner
        user.run
      end
    end
  end
end
$ ruby dci_bm.rb
     user       system     total       real
DCI  8.430000   0.000000   8.430000 (  8.429382)
DCI  8.490000   0.010000   8.500000 (  8.486804)
DCI  8.450000   0.010000   8.460000 (  8.447363)

Quite a difference! It's probably safe to say at this point that calling extend 1 million times is a lot less performant than including the module one time as the class is defined. The reasoning is simple. Including the module once injects it in the user objects' lookup hierarchy. When the run method is called, the hierarchy is traversed and the method is fetched. In the traditional (include) approach, the module never leaves or reenters the hierarchy after it's been defined. Conversely, in DCI, the module enters the hierarchy each time extend is called.

Let's profile these two approaches and discover why they're so different.

perftools.rb

Assuming the same class/module structure as above, let's use perftools.rb to profile their execution. Using perftools.rb is a two-step process. First, generate the profile: a summary of where the code is spending it's time. Second, display the profile in the designated format. To visualize the components, we'll genereate graphs using the GIF format. You'll need the dot tool in order to generate graphs. Check out this presentation for more info on using perftools.rb.

Let's first observe the traditional approach:

include_profile.rb

require 'perftools'
require './runner'

class IncludeUser
  include Runner
end

PerfTools::CpuProfiler.start('/tmp/include_profile') do
  1000000.times do
    user = IncludeUser.new
    user.run
  end
end
$ ruby include_profile.rb
$ pprof.rb --gif /tmp/include_profile > include_profile.gif

include_profile.gif

The above graph tell us that most of the execution time is happening in the iteration of our test loop. Barely any time is spent creating the objects or executing the arbitrary math calculation. More info on reading the pprof.rb output can be found here.

Now let's take a look at the DCI approach:

dci_profile.rb

require 'perftools'
require './runner'

class DCIUser; end

PerfTools::CpuProfiler.start('/tmp/dci_profile') do
  1000000.times do
    user = DCIUser.new
    user.extend Runner
    user.run
  end
end
$ ruby dci_profile.rb
$ pprof.rb --gif /tmp/dci_profile > dci_profile.gif

dci_profile.gif

The above results tell us that almost half the time is spent extending objects at runtime through Module#extend_object. In this example, the time spent iterating over our test case is dwarfed against the time taken to extend objects. So, after profiling we can verify that extending the object is indeed taking up most of our time.

ObjectSpace.count_objects

Let's compare how the number of objects in memory stack up with the two implementations. Ruby 1.9 provides us with the ObjectSpace.count_objects method to inspect all objects currently initialized in memory. It's important to turn off garbage collection as it may be invoked mid-test, skewing the results. Here is the module used to inspect the number of objects currently in memory. It's a modified version of Aaron Patterson's implementation.

allocation.rb

module Allocation
  def self.count
    GC.disable
    before = ObjectSpace.count_objects
    yield
    after = ObjectSpace.count_objects
    after.each { |k,v| after[k] = v - before[k] }
    GC.enable
    after
  end
end

This method turns off the garbage collector, records the number of objects pre-benchmark, runs the benchmark, records the number of objects post-benchmark, then compiles the difference between the two. Let's gather more information by exracting object allocations.

include_space.rb

require './runner'
require './allocation'

class IncludeUser
  include Runner
end

p(Allocation.count do
  1000000.times do
    user = IncludeUser.new
    user.run
  end
end)
$ ruby include_space.rb
{:TOTAL=>2995684, :FREE=>-4344, :T_OBJECT=>1000000, :T_CLASS=>0, :T_MODULE=>0, :T_FLOAT=>2000000, :T_STRING=>27, :T_REGEXP=>0, :T_ARRAY=>0, :T_HASH=>1, :T_BIGNUM=>0, :T_FILE=>0, :T_DATA=>0, :T_COMPLEX=>0, :T_NODE=>0, :T_ICLASS=>0}

Most of the keys in the printed hash refer to Ruby types. The ones we're interested in are :TOTAL, :T_CLASS, :T_ICLASS. The meaning of these keys isn't very well documented but the Ruby source hints at them. Here's my understanding:

:TOTAL is the total number of objects present in memory.
:T_CLASS is the total number of classes that have been declared (in memory).
:T_ICLASS is the total number of internal use classes, or iClasses, used when modules are mixed in.

The DCI approach:

dci_space.rb

require './runner'
require './allocation'

class DCIUser; end

p(Allocation.count do
  1000000.times do
    user = DCIUser.new
    user.extend Runner
    user.run
  end
end)
$ ruby dci_space.rb
{:TOTAL=>4995536, :FREE=>-4492, :T_OBJECT=>1000000, :T_CLASS=>1000000, :T_MODULE=>0, :T_FLOAT=>2000000, :T_STRING=>27, :T_REGEXP=>0, :T_ARRAY=>0, :T_HASH=>1, :T_BIGNUM=>0, :T_FILE=>0, :T_DATA=>0, :T_COMPLEX=>0, :T_NODE=>0, :T_ICLASS=>1000000}

Let's looks at the significant differences between the two:

# Include
{:TOTAL=>2995684, :FREE=>-4344, :T_CLASS=>0, :T_ICLASS=>0}

# DCI
{:TOTAL=>4995536, :FREE=>-4492, :T_CLASS=>1000000, :T_ICLASS=>1000000}

These results expose a few facts about the two implementations.

  • The DCI approach uses roughly 40% more objects in memory.
  • The DCI approach initializes many more classes and internal (mixed-in) classes.

Wrapping It Up

As I said at the beginning, it's quite hard to determine how, if at all, using DCI will affect the performance of a real world application. Certainly, a single web request will never invoke 1 million inclusions of a module at runtime.

What this does show us is that Ruby is not optimized for this architecture. In idiomatic Ruby, modules are usually included during the class definition. It's possible that languages, like Scala, with built-in tools to extend objects ad hoc perform better than Ruby. Scala's traits provide high-level support for this type of functionality and are optimized for use with DCI.

I'm still quite interested in DCI. Specifically, in optimizations for Ruby. I'm also quite interested in running these benchmarks against a production app, something that'll just have to wait.

All the code used here can be found on github.

Happy benchmarking!

Posted by Mike Pack on 01/17/2012 at 12:46PM

Tags: dci, ruby, rails, performance, benchmarking, profiling, architecture


Seeding data to your Rails tests with factory_girl

Many applications rely on seed data as a basic set of information to get their app off the ground and functional. Seed data to an application is like wheels to a car. Without the wheels, your car won't function.

The concept of seed data works great when thinking about how the applications functions in the real world. However, when running tests against your application, it poses the question of "Well, where did the data come from?" It's like the philosophical question of the chicken or the egg.

You have a test database ready to rock. You might even use rake db:test:prepare to get it set up. Now you run your specs and uh oh! they fail because your seed data doesn't exist. Lets review some solutions to this problem.

Just seed the test database!

Stop with all the fuss and just seed the test database. rake db:seed and your test will be happy.

The problem with seeding your test database is you're now creating a dependency, and a large one at that. Every time you plan to execute your tests on a fresh database, you need the additional step of seeding. Not only that, but your tests are written to rely on an outside source to provide the seed data.

Your tests should viably run in a bubble. Give them an empty database and some application code and they should run to completion. Seeding the test database is an anti-pattern, in my opinion.

Enter the seed data when appropriate

Say you have a User model and after the user is created, they are given a gold star award for signing up. Assume your awards are stored in an awards table with an Award model and are seeded into your application.

When your tests rely on seed data, they might look like this (using factory_girl):

describe 'after signing up as a user'
  before do
    @user = Factory(:user)
  end

  it 'it gives me a gold star award' do
    @user.has_award('gold_star')
  end
end

In the above example, we're relying upon the fact that the gold star award has already been seeded into the database.

When your tests don't rely on seed data, they might look like this:

describe 'after signing up as a user'
  before do
    # Add the gold star award
    Factory(:award, :name => 'gold_star')
    @user = Factory(:user)
  end

  it 'it gives me a gold star award' do
    @user.has_award('gold_star')
  end
end

Now you've successfully removed the dependency on the seed data and brought the responsibility of database inadequacies to the test suite. This is a great step forward in refactoring the tests but now, every time you create a User with Factory(:user), you need to add the gold star award to the database. Lets fix this up.

Let factory_girl do the heavy lifting

factory_girl comes built in with a few callbacks. The available callbacks are #after_build, #after_create and #after_stub. Read more about using factory_girl callbacks on thoughtbot's blog (updated docs).

So, in lieu of adding the gold star award right before creating a User in every one of our tests, lets use a callback when creating a User.

FactoryGirl.define do
factory :user do
  name "Mike Pack"
  after_build do |user|
     Factory(:award, :name => 'gold_star') unless Award.find_by_name('gold_star').present?
  end
end
end 

Now, our tests can be as clean as possible:

describe 'after signing up as a user'
  before do
   @user = Factory(:user)
  end

  it 'it gives me a gold star award' do
    @user.has_award('gold_star')
  end
end

Every time a user is created with factory_girl, the dependency is built using the after_build callback.

Happy testing!

Posted by Mike Pack on 12/22/2011 at 03:01PM

Tags: rails, rspec, testing, factory_girl


Conditional Indexing with Sunspot

One of my favorite new features of Sunspot 1.3 is the ability to conditionally index an instance of a model based on anything that returns a boolean.

Say I have a Post model to store my blog posts. We want to index only blog posts which are published so users aren't searching on unpublished posts. The syntax looks as follows:

class Post < ActiveRecord::Base
  attr_accessible :title, :content, :published, :external_source

  searchable :if => :published do
    text :title
    text :content
  end
end

Pretty nice. But let's elaborate a little. Say we want to index only published posts but we don't want to index posts where content comes from an external source.

searchable :if => :published, :unless => :external_source do
  ...
end

Let's flip things around. What if want to index only published blog posts which come from external sources? Supply an array.

searchable :if => [:published, :external_source] do
  ...
end 

What if the conditions for indexing are more complex than a simple boolean method on our model? Supply a proc.

searchable :if => proc { |post| post.content.size > 0 } do
  ...
end 

Indexing with sunspot just got a whole lot easier.

Happy indexing!

Posted by Mike Pack on 11/16/2011 at 02:32PM

Tags: sunspot, search, indexing, rails


Stubbing Internal Methods and Rails Associations with RSpec

Occasionally, times arise where you would like to unit test the inner workings of a method. As a disclaimer, I don't recommend it because tests should generally be behavior driven. Tests should treat your methods as black boxes; you put something in, you get something out. How it works internally shouldn't really matter. However, if you would like to test the inner workings of your methods there's a number of ways to do so with pure Ruby including :send, :instance_variable_get and others. Testing the innards feels dirty no matter which way you spin it but I like to at least do it with RSpec.

Why Test the Internals?

Lets say you have a method that does some expensive lookup:

class Library < ActiveRecord::Base
  include ExpensiveQueries

  att_accessor :books
  def books
    @books ||= expensive_query # The expensive query takes 5 seconds
  end
end

The above example should be familiar, you plan to perform something Ruby or database expensive and you would like to cache the result in an instance variable so that all subsequent calls to that method draw from the instance variable.

If you were taking a TDD approach, you wouldn't have this class already written. In that case, you know you'll be performing something very expensive and you want to ensure that method caches the result. How do you test this without knowing the internals of the method?

Stubbing with RSpec

Let's say you're writing your tests before you write the above class. You could use RSpec's stubbing library to ensure your method is caching it's result.

describe Library do
  describe '#books' do
    it 'caches the result' do
# Assume some books get associated upon creation
      @library = Library.create!

# 5 seconds for this call
      the_books = @library.books

# Stub out the expensive_query method so it raises an error
      @library.stub(:expensive_query) { raise 'Should not execute' }

# If the value was cached, expensive_query shouldn't be called
      lambda { @library.books }.should_not raise_error
    end
  end
end

The key component here is the @library.stub call. This is also where we're breaking the black box, behavior driven test idiom. We assume at this line that we know there will be a method call internally named expensive_query. This test is also brittle because if expensive_query ever changes it's name to really_expensive_query, our test will break even though the functionality of our method remains the same.

Stubbing Rails Associations

What if your expensive_query is really an ActiveRecord association? So, let's say your Library class looks more like the following:

class Library < ActiveRecord::Base
  has_many books

  att_accessor :authors
  def authors
    @authors ||= books.authors # The expensive query takes 5 seconds
  end
end

You could use the nifty stub_chain method provided by RSpec to stub the books.authors method and ensure it only gets called once.

describe Library do
  describe '#books' do
    it 'caches the result' do
# Assume some books and authors get associated upon creation
      @library = Library.create!

# 5 seconds for this call
      the_authors = @library.authors

# Stub out the books.authors association so it raises an error
      @library.stub_chain(:books, :authors) { raise 'Should not execute' }

# If the value was cached, books.authors shouldn't be called
      lambda { @library.authors }.should_not raise_error
    end
  end
end

Arguments to stub_chain represent the associations used. stub_chain could also be used to stub out additional methods which get called within the chain.

Happy stubbing!

Posted by Mike Pack on 10/07/2011 at 11:20AM

Tags: rails, rspec, stubbing


Managing Devise's current_user, current_admin and current_troll with CanCan

CanCan is awesome. It lets you manage user abilities easily and provides ways to define complex scenarios. I highly recommend using it for anyone who has more than one user type (like Troll).

Devise is great for authentication. When you have more than one user type as distinct classes, Devise will create current_* to be used in your controllers and views. So, User class corresponds to current_user. Admin class corresponds to current_admin. Troll class (used to identify Trolls under your application's bridge) corresponds to current_troll.

The problem

CanCan doesn't work with current_admin and current_troll out-of-the-box. It assumes that current_user is defined and current_user's abilities are defined in the Ability class. What if you want to break this paradigm? It turns out CanCan makes this pretty easy. Here are the current_user and Ability assumptions I am referring to:

def current_ability
@current_ability ||= ::Ability.new(current_user)
end

CanCan defines current_ability on your controller. This grabs an instance of the Ability class for the current user. So it assumes that you have current_user set and you have an Ability class defined. When your user types get more complex than what can be handled by one User model, it's time to make some changes.

Working with numerous Ability classes

Up front, your project might not require many different user types that vary greatly from one another. It might make sense to use Rail's nifty STI (Single Table Inheritance) and add all your abilities to one class. This can be nice in some respect. For instance, all users, no matter which type, can be reference by current_user.

When your user types get too complex to use one User model, your Ability class is too complex as well. In it's most simple form, say you have an Ability class that looks as follows:

class Ability
  include CanCan::Ability
 
  def initialize(user)
    user
||= User.new # guest user (not logged in)
    if user.is_a? Admin
      # Admin abilities
    elsif user.is_a? Troll
      # Troll abilities
    elsif user.new_record?
      # Guest abilities
    else
      # Basic user abilities
    end
  end
end

This structure gives you some flexibility in how you define your abilities but it's on it's way to Maintenance Hell, a deep dark place with no exit.

It would be best to define your abilities in different classes. Here we define UserAbility, AdminAbility, TrollAbility, and GuestAbility.

class UserAbility
  include CanCan::Ability
  def initialize(user)
    # Basic user abilities
  end
end
class AdminAbility
  include CanCan::Ability

  def initialize
    # Admin abilities
  end
end
class TrollAbility
  include CanCan::Ability
  def initialize(user)
   # Troll abilities
end
end
class GuestAbility
  include CanCan::Ability

  def initialize
    # Guest abilities
  end
end

Keep in mind that if your abilities are a subset of another user's abilities, you can inherit from other ability class. So in our case a Troll is a user who lives under a bridge. We don't want the Trolls to talk, so we limit their ability to post comments. Otherwise, they can do everything a User can do.

class TrollAbility < UserAbility
  def initialize(user)
    super(user)
    cannot :create, Comment     # More Troll abilities   end
end

Hooking up the Ability classes

When you use CanCan's "can? :create, Comment" method, it refers to current_ability to determine whether the given abilities include :create, Comment.

Since CanCan makes the assumption we're working with current_user and strictly Ability, we need to extend the built-in functionality. We do this by instantiating the new Ability classes based on the current user type (defined by Devise). CanCan has a brief wiki post on this topic.

def current_ability
  @current_ability ||= case
                       when current_user
                         UserAbility.new(current_user)
                       when current_admin
                         AdminAbility.new                        when current_troll                          TrollAbility.new(current_troll)
                       else
                         GuestAbility.new
                       end
end

 Now, when CanCan needs to check abilities (when you call "can? :create, Comment"), your current_ability method will return the appropriate Ability class.

Happy CanCaning!

 

Posted by Mike Pack on 07/21/2011 at 09:01PM

Tags: rails, devise, cancan


Testing Mobile Rails Apps with Capybara

Every web app should have a mobile version and every mobile version should be tested. Testing mobile web apps shouldn't be any more painful than testing desktop apps with the assumption that you're still serving up HTML, CSS, and JavaScript.

My Setup

For mobile detection, I use ActiveDevice, a User Agent sniffing library and some helper methods. While User Agent sniffing isn't the best approach for client-side (use feature detection with something like Modernizr), it's a reliable way to detect mobile devices in Rails.

For acceptance testing that doesn't need to be readily demonstrated to stakeholders, I use straight up Capybara with RSpec. Sometimes I use Steak.

The Pain of Testing Mobile

It's difficult to test mobile web apps because Capybara's default drivers are all desktop User Agents. You could acquire Capybara-iphone, but this solution didn't produce the expected results for me. I was given my mobile views but not my mobile layout. Plus, all this driver does is reset the User Agent for the Rack-Test driver. Further, what if you use Selenium for your default driver?

Platformatec wrote a nice blog post about mobile testing with user agents. The problem is it relies on Selenium. I was seeking a more concrete, driver agnostic way to serve up mobile views to my tests.

How I Test Mobile

This is a simple, straightforward way to invoke your mobile app from within your tests. I'm not convinced it's the most elegant way, but it's clean and simple.

Setup Your Application

In your ApplicationController, give some support for changing over to  your mobile app. This will also come in handy when you want to work on and test your mobile app on your desktop browser.

app/controllers/application_controller.rb

@@format = :html
cattr_accessor :format

Here we simply add a class attribute accessor with a default of :html. This will allow us to say ApplicationController.format

Next we need to add a before filter which will set the desired format upon each request.

app/controllers/application_controller.rb

before_filter :establish_format

private
def establish_format
  # If the request is from a true mobile device, don't set the format
request.format = self.format unless request.format == :mobile
end

Here's what a sample Rails 3 ApplicationController would look like:

app/controllers/application_controller.rb

class ApplicationController < ActionController::Base
  protect_from_forgery
  include ActiveDevice

  # force the mobile version for development:
  #@@format = :mobile
  @@format = :html
  cattr_accessor :format
  before_filter :establish_format

private
  def establish_format
# If the request is from a true mobile device, don't set the format
request.format = self.format unless request.format == :mobile
end
end

Setup Your Tests

Now, in your tests, you can set the desired format for your application.

spec/integration/mobile/some_spec.rb

require 'spec_helper'

describe 'on a mobile device' do
  before do
    ApplicationController.format = :mobile
  end

  after do
    ApplicationController.format = :html
  end

  describe 'as a guest on the home page' do
    before do
      visit root_path
    end

    it 'does what I want' do
      page.should do_what_i_want
    end
  end
end

By setting ApplicationController.format = :mobile, we force the application to render the mobile version of files, for instance: index.mobile.erb. Your application will be invoked, your before filter will be run, and your are serving the mobile app to your tests.

Note: You need to reset your format to :html after your mobile tests are run so that tests which follow this in your suite are run under the default format, :html.

Happy testing!

Posted by Mike Pack on 06/16/2011 at 11:05AM

Tags: rails, testing, rspec, capybara, mobile


Dynamically Requesting Facebook Permissions with OmniAuth

One of the major benefits of dynamically requesting the Facebook permissions is the increased rate of users who will allow you to access their account. Facebook puts it nicely, "There is a strong inverse correlation between the number of permissions your app requests and the number of users that will allow those permissions."

This solution uses OmniAuth to handle the authentication. The concept is simple. Ask the user to allow your application access to their most basic information (or the bare minimum your app needs). When they perform an action that requires more than the permissions they have currently allowed, redirect them to Facebook and ask for more permissions.

If you haven't set up OmniAuth, follow Ryan Bate's Railscasts, Part 1 and Part 2.

Configuring OmniAuth

OmniAuth expects you to configure your authentication schemes within your initializers.

config/initializers/omniauth.rb

Rails.application.config.middleware.use OmniAuth::Builder do
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET']
end

Now, when you visit /auth/facebook, you will be redirected to Facebook and asked for basic permissions.

In order to permit your app to dynamically change the OmniAuth Stategy, you'll need a controller which has access to your OmniAuth Strategy. OmniAuth provides a pre-authorization setup hook to handle this. Update your omniauth.rb initializer to look like the following:

config/initializers/omniauth.rb

Rails.application.config.middleware.use OmniAuth::Builder do
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET'], :setup => true
end

Now when you visit /auth/facebook you'll be redirected to /auth/facebook/setup. You should add a route for this:

config/routes.rb

match '/auth/facebook/setup', :to => 'facebook#setup'

Your Facebook controller with the setup action should look as follows:

app/controllers/facebook_controller.rb

class FacebookController < ApplicationController
  def setup
    request.env['omniauth.strategy'].options[:scope] = session[:fb_permissions]
    render :text => "Setup complete.", :status => 404
  end
end

Note: OmniAuth says, "we render a response with a 404 status to let OmniAuth know that it should continue on with the authentication flow."

Once your FacebookController#setup action has completed, OmniAuth will take it from there and process your request through to Facebook.

Dynamically Setting the Permissions

The appropriate code can be used like so:

app/controllers/some_controller.rb

session[:fb_permissions] = 'user_events'
redirect_to '/auth/facebook'

session[:fb_permissions] is the interface between your two controller actions: the one that wants to request more permissions (some_controller.rb) and the one that wants to modify your OmniAuth Strategy (facebook_controller.rb).

For reference, here's a list of available Facebook permissions you can use; comma deliminated.

That's it. Upon redirection, Facebook will ask to allow the new permissions, redirect back to your app, and you can now successfully make calls to the Facebook API (I use Koala to work with the Facebook API).

One Gotcha

One thing I ran into on OmniAuth 0.2.4 and Rails 3.0.7 is the OmniAuth Stategy which was available in request.env['omniauth.stategy']. If you have more than one provider in your OmniAuth::Builder DSL, request.env['omniauth.stategy'] will be set to the last entry in the DSL. If you have your initializer set up like the following:

Rails.application.config.middleware.use OmniAuth::Builder do
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET'], :setup => true
provider :twitter, ENV['TWITTER_CONSUMER_KEY'], ENV['TWITTER_CONSUMER_SECRET']
end

request.env['omniauth.stategy'] will be set to #<OmniAuth::Strategies::Twitter>, not exactly what you want. Your Facebook stategy needs to be the last entry in the DSL, like so:

Rails.application.config.middleware.use OmniAuth::Builder do
provider :twitter, ENV['TWITTER_CONSUMER_KEY'], ENV['TWITTER_CONSUMER_SECRET']
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET'], :setup => true
end

Happy Facebooking!

Posted by Mike Pack on 04/27/2011 at 10:51AM

Tags: ruby, rails, omniauth, facebook