Orka Small Teams edition announced - available now for immediate use. Learn more

Flaky CI Tests: Costs, Causes and Solutions

Flaky CI tests can hurt developer confidence, and ultimately your team’s bottom line. Learn about common causes of flaky tests and how to resolve them.

Continuous integration (CI) is the process by which new code is added safely to an existing project. The safety comes from the fact that a test suite is executed against the codebase with the newly added code to see if the new addition has introduced any breaking changes to the project. Breaking changes are caught by failing tests, and can then be remedied before they are added into the codebase.

However, this system only works as well as its tests perform. A distinct challenge in CI testing is the fact that not all tests behave in a consistent manner. Tests that exhibit both passing and failing behavior without any change to the underlying code are considered flaky tests, and they present a number of problems for development teams.

Problems that arise from flaky tests

Crying “wolf”

Flaky tests can desensitize the team to tests that are actually failing. If false positives are consistent as a result of flakiness in your CI test suite, tests that are actually catching race conditions or other bugs may be more easily dismissed as also being flaky.

Slows development

Oftentimes, engineers will need to rerun the entire test suite repeatedly in order for all tests to pass. This can slow down the work of other developers, as they will likely need to wait in line for their own contributions to be tested. Moreover, the resources required to rerun tests time and again are generally finite, and each run constitutes an additional cost in both resources and time.

Business costs

Flaky tests ultimately cost teams time and money as local builds may be required to identify which tests in the CI suite are flaky, and it will then take additional time to figure out why the test is flaky and correct it.

Other business operations will feel the delays as well. If every development cycle takes longer because of the time spent on tracking and mediating flaky tests, products will take longer to release, which will extend the impact of flaky tests to include marketing, sales, customer success, and business development efforts.

Common causes of flaky tests

Tests need to be executed in a particular order to pass

If a test both passes and fails without any change to the underlying codebase, odds are that the application state needs to be “X” for the test to pass. However, when the application state is influenced by the tests themselves – i.e., tests interact with and change the state of the application – tests may become dependent on each other in order to achieve the state required for a given test to pass.

End-to-end testing

End-to-end tests, or those that chart an entire user flow from start to finish, are inherently inclined to flakiness. Because end-to-end tests are by their very nature changing the state of the application, they are depending on that change of state at each stage of the test. In light of this, you may want to limit the degree of coverage you target with end-to-end tests. For example, write five rather than 50 end-to-end tests.

How to deal with flaky tests

Sooner is better

Generally speaking, the best way to deal with flaky tests is to do so quickly. If a newly written test is passing only after having failed several times with no other changes to the codebase, it will be freshest in the mind of the developer who wrote it at that point, and by extension, it will be easiest to fix at that time.

If an older test suddenly exhibits flaky behavior, and it has consistently passed previously, there is a good chance that the new code has caused the change in behavior. And the best time to fix code, like tests, is when it is fresh in the mind of the developer who wrote it, as it will take less time to understand what the code is doing exactly, so it will likely take less time to fix.  

Document flaky tests

For tests that cannot be fixed immediately, the behavior should be documented, and a bug report should be opened so that when the team does have the bandwidth to circle back to it, they won’t be starting from square one.

Quarantine flaky tests until a fix can be found

If a test continually exhibits flaky behavior, and the root cause can’t be determined immediately, it can be quarantined – that is, it won’t need to pass in order for the test suite to be considered to have passed, and thus allow the newly written code to be merged into the codebase.

Additional Reading

Let’s Chat: How do you manage flaky tests?

Do you have any good tips for mitigating flaky tests? Join our MacStadium Community Slack and share with our community your experiences with flaky tests and how to fix them.