Bounding Unicorns

Implementing Complex Changes Effectively With Tests And Git

Today I want to describe how I tackle implementing complex changes on a project. Example categories of such changes are:

  • Integrating a large library or component into an even larger program.
  • Supporting an additional environment.
  • Upgrading underlying framework or a library.

Specific examples that I dealt with in each of these categories:

  • In phpbb, making rewritten template engine work.
  • Adding python 3 support to pycurl.
  • Upgrading a rails 1 application to rails 2.

The shared trait of these changes is they deal with a large body of code and that code presently works. You want the code to remain working after the change. This is unlike adding a new feature where the feature did not "work" before by virtue of not existing. When changing a working application, the expectation is that it will largely if not entirely continue to work. Users are much more forgiving of new functionality not working than of old functionality breaking on them.

Typically the entire program or system is so large that manually testing it is impractical. In addition, the person implementing the change might not even have a very good grasp on the system in the first place. How can the change be done with as little breakage as possible in as little time as possible?

Test suite

A test suite is extremely important for validating that the system works correctly, or at all. Some environments and cultures write automated tests as part of normal development process, and some don't. My rules are:

  • If there is a test suite but it has failures, the failures must be fixed before any other work can commence.
  • If there is no test suite but I know ahead of time of the big project, I would start writing tests proactively. Any tests are better than nothing, and the more there are, the quicker code will be validated.
  • If there is no test suite and the complex change must be implemented now, I will attempt to write tests concurrently but get them into the existing version of the application prior to merging in the complex feature. This is made possible by git and rebasing, and can actually be fairly straightforward, but does require care and decent foresight.
  • If there is no test suite and no opportunity to create one, I would be wondering why I am working on this project...

With the test suite implemeted and passing, the next step is to set up a continuous itegration server to run the tests automatically if one does not already exists. A continuous integration server, like a test suite, is a virtual requirement for modern software development.


All sane projects today should use version control, but not all version control systems are equal.

Git, with its interactive rebasing specifically and history editing capabilities in general is extremely well suited to long-running, complex work with high quality requirements. Other distributed VCSes may or may not fit the bill - I have not used them thus don't know.

A handy aspect of git is its integration with most other version control systems out there. Even if the project formally uses something other than git, it is often possible to use git as a local VCS with that project.

One change per commit

Your commits should be small. It is much easier to squash commits than it is to split them, therefore, always err on the side of smaller commits.

Signs that your commits are doing too much:

  • You use the word "and" in a commit message.
  • You have a bulleted list in a commit message.
  • You commit once per day or even more rarely.
  • Your commit diff is more than a couple hundred lines long and it is not a result of a single "search and replace" operation.
  • Your commit is so large is slows your browser down to a crawl when you try to view it in github.

I often commit maybe every 30 minutes or so of actual code writing. If I am doing research, I am not writing code and this does not apply. You definitely commit whenever I am done with each bug fix or feature or a defined sub-task of one.

Interactive rebasing

The following steps will heavily rely on git rebase feature, and particularly on interactive rebasing. There are four key operations that will be needed:

  1. Rebasing against current master, or development head.
  2. Reordering commits, to keep related commits together.
  3. Squashing commits, to eliminate changes that have been proven to be wrong and are of no value anymore.
  4. Splitting commits. Assuming you follow the rule of one change per commit, splitting typically is used when you make a change somewhere and update call sites in the same commit, and later want to redo how the change is made and kill the initial change entirely.

Keep task branch small

The principal task is large. The branch it is in will be getting bigger every day. After several days of solid effort it will stop being readable in github and then it will stop being reviewable. If there is a code review policy in place, the implications should be obvious. If there isn't, the fact is that even the person who writes a large change can forget important details in it over time. This has to do with context switches which means in a 2,000 line diff you can forget stuff that you've done a week ago.

The task branch is kept small by continuously extracting anything that is independent enough to work on its own, and making those changes to master. For example, in a Rails 2 to 3 upgrade one of the changes being done is replacement of RAILS_ENV with Rails.env. This is a large change in terms of number of lines of code changed, but easy to review and Rails 2 understands Rails.env as well. This is a perfect example of a change that should be done against master to keep the upgrade branch smaller.

After changes are merged to master, rebase the task branch on master. This is where interactive rebase and small commits start to come in. You can identify conflicts easier in small commits and you can check that the history is being correctly replanted on master via an interactive rebase.

Fix errors first

If you are dealing with a library or framework upgrade, two common issues are features that are removed and deprecated (on which note, "removed" and "deprecated" are not the same thing: features that were removed produce errors when code attempts to use them, whereas features that were deprecated continue to work fine but maybe produce storms of warning messages).

Fix the code that uses removed features first. Ideally the tests should pass while the code continues to use deprecated features. Once all errors are taken care of, you can start dealing with deprecated features.

Keep in mind that addressing deprecation warnings is optional. Do not get stuck trying to rebuild working functionality when parts of your application are outright broken!

Start with core changes

Many applications, such as those using Ruby on Rails, have levels of tests: unit tests on the bottom level, functional tests higher up, integration tests higher up still. If you are doing test-driven development and are fixing the failing tests resulting from your changes, start with the bottom level and work your way up. Make sure all unit tests pass before changing anything related to functional tests.

Keep master green

The test suite should always pass on master. Typically the tests will be broken on the task branch until the task is complete. If the task is not complete but the tests pass, write more tests! And once you add more tests, ask yourself if they will work on master, and if so backport them to master to keep the size of the task branch down.

Editing history

In a non-trivial project, you are likely to jump from one area of the code to another and back, many times. Eventually you will have a history with many commits, some of which turned out to not work. Some of those might be helpful to keep just so you know what you tried, but others are simply hopeless.

Don't keep the useless commits - instead, squash them with other commits to get a history that helps you get to your destination. The question "OK, what have I done with X?" should be easy to answer by glancing on your development history. You should not need to go through "first I did A, then that apparently did not work so I did B, hmm" type of process.

Reordering history - following changes

Especially if you are in a project or company that has code reviews, someone will need to review your work before it is accepted. If you give that person a single commit with a message "rails 2 to 3 upgrade" and a 10,000+ line diff, that person will not be doing any reviewing. Either the work will be accepted on faith, or the reviewer will do a surface review also known as "syntax check". A contribution like that made to a free software project stands good chances of being rejected.

This is where the interactive rebase comes in handy again to reorder the commits, squash and split them to the point where each commit makes sense and logically follows from previous commits. Someone going through the commits should have a clear picture of the changes made including reasoning for them.

That someone might be you! If you are not the only developer, or if the complex task is only a part of what you have to do, chances are there are other commits happening on master concurrently with your work. If they conflict with your complex task you need to figure out how to resolve conflicts. As you will be changing your own implementation, it helps for your implementation to be readable, at least to you.

This process will not be easy to follow the first time, but the more you do it the better you will become at it.

Rebase often

If the master branch continues to receive bugfixes or minor or even not so minor features while you are working on your complex task, make sure to rebase the task branch on master frequently. While you may need to deal with conflicts, each conflict will tend to be small, and it will be with recently done work so resolving it should be easier. Especially if you are the person who continues to commit to master, resolving conflicts with your own work right away is significantly more efficient than postponing conflict resolution until later.

Don't just fix test failures - think about why they happened

A failing test can, generally, be fixed in one of two ways:

  1. Change the test.
  2. Change the code the test is testing.

Usually, the test suites are supposed to change less frequently than the code they are testing. So, typically, if a change breaks a test then the code is broken in some way.

A similar situation can appear when upgrading frameworks and otherwise changing dependencies in major ways. Suppose you have code like this:

def foo():
    # ...

... and you have a test that foo works in a certain way. Now you upgrade a library and foo ceases to work properly. You find out that you can rewrite foo as follows:

def foo():
    # ...

... and that makes the test pass. Maybe bar and quux are different spellings of more-or-less the same thing. You can stop here, or you can think about why where bar worked previously it no longer does. Do you understand why the code used bar? Why bar changed behavior? Is quux really correct? Does anything else in your project rely on the behavior of bar that changed, perhaps calling some variation of bar or code that bar called or code that ends up calling bar in some fashion?

Especially for projects with low test coverage it is very important to think about these questions, because if the underlying semantics of what bar does changed the change may affect code you don't have test coverage for.

To give an example, one of the differences between Python 2 and 3 is string handling. In Python 3 there is no longer a "string-of-bytes" type; strings are either Unicode strings or byte sequences which are not really strings. PycURL has to pass a sequence of bytes to libcurl, and gets Unicode strings from Python code under Python 3 whereas previously it received "byte strings" that could be passed to libcurl as is. The submitted Python 3 patch converted Unicode data to bytes using C standard library, in the process of doing which it 1) leaked memory and 2) did not use Python-supplied encodings, so while the trivial tests using ASCII worked the code would have crashed and burned when faced with nontrivial Unicode data. The solution that I eventually went with involved requiring Python code to encode all Unicode strings before giving them to PycURL, because PycURL could not correctly encode them in all possible cases.