Implementing Complex Changes Effectively With Tests And Git
- Test suite
- One change per commit
- Interactive rebasing
- Keep task branch small
- Fix errors first
- Start with core changes
- Keep master green
- Editing history
- Reordering history - following changes
- Rebase often
- Don't just fix test failures - think about why they happened
Today I want to describe how I tackle implementing complex changes on a project. Example categories of such changes are:
- Integrating a large library or component into an even larger program.
- Supporting an additional environment.
- Upgrading underlying framework or a library.
Specific examples that I dealt with in each of these categories:
- In phpbb, making rewritten template engine work.
- Adding python 3 support to pycurl.
- Upgrading a rails 1 application to rails 2.
The shared trait of these changes is they deal with a large body of code and that code presently works. You want the code to remain working after the change. This is unlike adding a new feature where the feature did not "work" before by virtue of not existing. When changing a working application, the expectation is that it will largely if not entirely continue to work. Users are much more forgiving of new functionality not working than of old functionality breaking on them.
Typically the entire program or system is so large that manually testing it is impractical. In addition, the person implementing the change might not even have a very good grasp on the system in the first place. How can the change be done with as little breakage as possible in as little time as possible?
A test suite is extremely important for validating that the system works correctly, or at all. Some environments and cultures write automated tests as part of normal development process, and some don't. My rules are:
- If there is a test suite but it has failures, the failures must be fixed before any other work can commence.
- If there is no test suite but I know ahead of time of the big project, I would start writing tests proactively. Any tests are better than nothing, and the more there are, the quicker code will be validated.
- If there is no test suite and the complex change must be implemented now, I will attempt to write tests concurrently but get them into the existing version of the application prior to merging in the complex feature. This is made possible by git and rebasing, and can actually be fairly straightforward, but does require care and decent foresight.
- If there is no test suite and no opportunity to create one, I would be wondering why I am working on this project...
With the test suite implemeted and passing, the next step is to set up a continuous itegration server to run the tests automatically if one does not already exists. A continuous integration server, like a test suite, is a virtual requirement for modern software development.
All sane projects today should use version control, but not all version control systems are equal.
Git, with its interactive rebasing specifically and history editing capabilities in general is extremely well suited to long-running, complex work with high quality requirements. Other distributed VCSes may or may not fit the bill - I have not used them thus don't know.
A handy aspect of git is its integration with most other version control systems out there. Even if the project formally uses something other than git, it is often possible to use git as a local VCS with that project.
One change per commit
Your commits should be small. It is much easier to squash commits than it is to split them, therefore, always err on the side of smaller commits.
Signs that your commits are doing too much:
- You use the word "and" in a commit message.
- You have a bulleted list in a commit message.
- You commit once per day or even more rarely.
- Your commit diff is more than a couple hundred lines long and it is not a result of a single "search and replace" operation.
- Your commit is so large is slows your browser down to a crawl when you try to view it in github.
I often commit maybe every 30 minutes or so of actual code writing. If I am doing research, I am not writing code and this does not apply. You definitely commit whenever I am done with each bug fix or feature or a defined sub-task of one.
The following steps will heavily rely on
git rebase feature,
and particularly on interactive rebasing.
There are four key operations that will be needed:
- Rebasing against current
master, or development head.
- Reordering commits, to keep related commits together.
- Squashing commits, to eliminate changes that have been proven to be wrong and are of no value anymore.
- Splitting commits. Assuming you follow the rule of one change per commit, splitting typically is used when you make a change somewhere and update call sites in the same commit, and later want to redo how the change is made and kill the initial change entirely.
Keep task branch small
The principal task is large. The branch it is in will be getting bigger every day. After several days of solid effort it will stop being readable in github and then it will stop being reviewable. If there is a code review policy in place, the implications should be obvious. If there isn't, the fact is that even the person who writes a large change can forget important details in it over time. This has to do with context switches which means in a 2,000 line diff you can forget stuff that you've done a week ago.
The task branch is kept small by continuously extracting anything that
is independent enough to work on its own, and making those changes
to master. For example, in a Rails 2 to 3 upgrade one of the changes
being done is replacement of
Rails.env. This is a
large change in terms of number of lines of code changed, but easy to
review and Rails 2 understands
Rails.env as well. This is a perfect
example of a change that should be done against master to keep the
upgrade branch smaller.
After changes are merged to master, rebase the task branch on master. This is where interactive rebase and small commits start to come in. You can identify conflicts easier in small commits and you can check that the history is being correctly replanted on master via an interactive rebase.
Fix errors first
If you are dealing with a library or framework upgrade, two common issues are features that are removed and deprecated (on which note, "removed" and "deprecated" are not the same thing: features that were removed produce errors when code attempts to use them, whereas features that were deprecated continue to work fine but maybe produce storms of warning messages).
Fix the code that uses removed features first. Ideally the tests should pass while the code continues to use deprecated features. Once all errors are taken care of, you can start dealing with deprecated features.
Keep in mind that addressing deprecation warnings is optional. Do not get stuck trying to rebuild working functionality when parts of your application are outright broken!
Start with core changes
Many applications, such as those using Ruby on Rails, have levels of tests: unit tests on the bottom level, functional tests higher up, integration tests higher up still. If you are doing test-driven development and are fixing the failing tests resulting from your changes, start with the bottom level and work your way up. Make sure all unit tests pass before changing anything related to functional tests.
Keep master green
The test suite should always pass on master. Typically the tests will be broken on the task branch until the task is complete. If the task is not complete but the tests pass, write more tests! And once you add more tests, ask yourself if they will work on master, and if so backport them to master to keep the size of the task branch down.
In a non-trivial project, you are likely to jump from one area of the code to another and back, many times. Eventually you will have a history with many commits, some of which turned out to not work. Some of those might be helpful to keep just so you know what you tried, but others are simply hopeless.
Don't keep the useless commits - instead, squash them with other commits to get a history that helps you get to your destination. The question "OK, what have I done with X?" should be easy to answer by glancing on your development history. You should not need to go through "first I did A, then that apparently did not work so I did B, hmm" type of process.
Reordering history - following changes
Especially if you are in a project or company that has code reviews, someone will need to review your work before it is accepted. If you give that person a single commit with a message "rails 2 to 3 upgrade" and a 10,000+ line diff, that person will not be doing any reviewing. Either the work will be accepted on faith, or the reviewer will do a surface review also known as "syntax check". A contribution like that made to a free software project stands good chances of being rejected.
This is where the interactive rebase comes in handy again to reorder the commits, squash and split them to the point where each commit makes sense and logically follows from previous commits. Someone going through the commits should have a clear picture of the changes made including reasoning for them.
That someone might be you! If you are not the only developer, or if the complex task is only a part of what you have to do, chances are there are other commits happening on master concurrently with your work. If they conflict with your complex task you need to figure out how to resolve conflicts. As you will be changing your own implementation, it helps for your implementation to be readable, at least to you.
This process will not be easy to follow the first time, but the more you do it the better you will become at it.
If the master branch continues to receive bugfixes or minor or even not so minor features while you are working on your complex task, make sure to rebase the task branch on master frequently. While you may need to deal with conflicts, each conflict will tend to be small, and it will be with recently done work so resolving it should be easier. Especially if you are the person who continues to commit to master, resolving conflicts with your own work right away is significantly more efficient than postponing conflict resolution until later.
Don't just fix test failures - think about why they happened
A failing test can, generally, be fixed in one of two ways:
- Change the test.
- Change the code the test is testing.
Usually, the test suites are supposed to change less frequently than the code they are testing. So, typically, if a change breaks a test then the code is broken in some way.
A similar situation can appear when upgrading frameworks and otherwise changing dependencies in major ways. Suppose you have code like this:
def foo(): # ... some_object.bar()
... and you have a test that
foo works in a certain way. Now you upgrade
a library and
foo ceases to work properly. You find out that you can
foo as follows:
def foo(): # ... some_object.quux()
... and that makes the test pass. Maybe
quux are different
spellings of more-or-less the same thing. You can stop here, or you can think
about why where
bar worked previously it no longer does.
Do you understand why the code used
bar changed behavior?
quux really correct? Does anything else in your project rely on the
bar that changed, perhaps calling some variation of
bar called or code that ends up calling
bar in some fashion?
Especially for projects with low test coverage it is very important to
think about these questions, because if the underlying semantics of what
bar does changed the change may affect code you don't have test coverage for.
To give an example, one of the differences between Python 2 and 3 is string handling. In Python 3 there is no longer a "string-of-bytes" type; strings are either Unicode strings or byte sequences which are not really strings. PycURL has to pass a sequence of bytes to libcurl, and gets Unicode strings from Python code under Python 3 whereas previously it received "byte strings" that could be passed to libcurl as is. The submitted Python 3 patch converted Unicode data to bytes using C standard library, in the process of doing which it 1) leaked memory and 2) did not use Python-supplied encodings, so while the trivial tests using ASCII worked the code would have crashed and burned when faced with nontrivial Unicode data. The solution that I eventually went with involved requiring Python code to encode all Unicode strings before giving them to PycURL, because PycURL could not correctly encode them in all possible cases.