Context
You have a very big system with thousands and thousands of tests. Your builds are slowing down more and more and is starting to become a burden for the team. You’re working on new functionality and a large portion of the tests is testing stuff you haven’t touched for months. While you want the safety of not regressing existing functionality you’re actually spending hours every day testing something that’s less likely to break.
Solution
Create two builds: the fast build and the full build. Keep a list of excluded tests and exclude the majority of the tests from the fast build and make it run really fast, around 10 minutes. Only leave in things that are very likely to break, things you’re working on currently and things that are sensitive. Don’t bother too much about classifying tests, just think about whether it’s likely that they will be broken or not. The full build runs all the tests.
Run the fast build as part of every commit, before the commit and as your continuous integration commit-verification build.
Run the full build more seldom. Nightly, a couple of times per day, or on a separate machine running continuously (but much less frequent as it’s obviously much slower). When you do larger refactorings in shared components you should probably also run the full build. Also, the changeset associated with the full build should be all the changes since the last successful full build.
Additionally keep a list of included tests for the fast build, as a test fails in the full build it should be automatically added to this list and start running in the next fast build.
Motivation
Continuous integration is all about feedback. The more rapid the feedback the faster you can solve a problem and move on. The productivity gains of a 10 minute build is in the order of magnitude higher than a one hour build.
When you have a large and slow build you’ve essentially lost the rapid feedback already. To get back some level of rapid feedback you introduce a fast build that will pick up a large portion of the possible failures. You’re not sacrificing any safety at all because the full build is still running and protecting you against regressions.
This safety is further enhanced by automatically adding a recently failed test to the fast build as soon as it fails in the full build. If it has recently failed it can obviously fail again. This also ensures developers fix the test as they shouldn’t check in with a broken fast build.
Extensions
This pattern can be extended to include several tiers of builds. For example, you can have tiers with fast build, full build and then the entire functional test suite. Functional tests with the entire backend systems plugged in and active can often be extremely slow, it’s not uncommon to have suites that run in more than 24 hours.