Flaky CI Tests: Costs, Causes and Solutions

By Jeff Vincent|

June 09, 2021

Continuous integration (CI) is the process by which new code is added safely to an existing project. The safety comes from the fact that a test suite is executed against the codebase with the newly added code to see if the new addition has introduced any breaking changes to the project. Breaking changes are caught by failing tests, and can then be remedied before they are added into the codebase.

However, this system only works as well as its tests perform. A distinct challenge in CI testing is the fact that not all tests behave in a consistent manner. Tests that exhibit both passing and failing behavior without any change to the underlying code are considered flaky tests, and they present a number of problems for development teams.

Problems that arise from flaky tests

Crying “wolf”

Flaky tests can desensitize the team to tests that are actually failing. If false positives are consistent as a result of flakiness in your CI test suite, tests that are actually catching race conditions or other bugs may be more easily dismissed as also being flaky.

Slows development

Oftentimes, engineers will need to rerun the entire test suite repeatedly in order for all tests to pass. This can slow down the work of other developers, as they will likely need to wait in line for their own contributions to be tested. Moreover, the resources required to rerun tests time and again are generally finite, and each run constitutes an additional cost in both resources and time.

Business costs

Flaky tests ultimately cost teams time and money as local builds may be required to identify which tests in the CI suite are flaky, and it will then take additional time to figure out why the test is flaky and correct it.

Other business operations will feel the delays as well. If every development cycle takes longer because of the time spent on tracking and mediating flaky tests, products will take longer to release, which will extend the impact of flaky tests to include marketing, sales, customer success, and business development efforts.

Common causes of flaky tests

Tests need to be executed in a particular order to pass

If a test both passes and fails without any change to the underlying codebase, odds are that the application state needs to be “X” for the test to pass. However, when the application state is influenced by the tests themselves – i.e., tests interact with and change the state of the application – tests may become dependent on each other in order to achieve the state required for a given test to pass.

End-to-end testing

End-to-end tests, or those that chart an entire user flow from start to finish, are inherently inclined to flakiness. Because end-to-end tests are by their very nature changing the state of the application, they are depending on that change of state at each stage of the test. In light of this, you may want to limit the degree of coverage you target with end-to-end tests. For example, write five rather than 50 end-to-end tests.

How to deal with flaky tests

Sooner is better

Generally speaking, the best way to deal with flaky tests is to do so quickly. If a newly written test is passing only after having failed several times with no other changes to the codebase, it will be freshest in the mind of the developer who wrote it at that point, and by extension, it will be easiest to fix at that time.

If an older test suddenly exhibits flaky behavior, and it has consistently passed previously, there is a good chance that the new code has caused the change in behavior. And the best time to fix code, like tests, is when it is fresh in the mind of the developer who wrote it, as it will take less time to understand what the code is doing exactly, so it will likely take less time to fix.  

Document flaky tests

For tests that cannot be fixed immediately, the behavior should be documented, and a bug report should be opened so that when the team does have the bandwidth to circle back to it, they won’t be starting from square one.

Quarantine flaky tests until a fix can be found

If a test continually exhibits flaky behavior, and the root cause can’t be determined immediately, it can be quarantined – that is, it won’t need to pass in order for the test suite to be considered to have passed, and thus allow the newly written code to be merged into the codebase.

Additional Reading

  • Flaky Tests at Google and How We Mitigate Them
  • Test Flakiness - Methods for identifying and dealing with flaky tests
  • How to Deal With and Eliminate Flaky Tests
  • Flaky tests | GitLab

Let’s Chat: How do you manage flaky tests?

Do you have any good tips for mitigating flaky tests? Join our MacStadium Community and share with our community your experiences with flaky tests and how to fix them.

Share this article


Orka, Orka Workspace and Orka Pulse are trademarks of MacStadium, Inc. Apple, Mac, Mac mini, Mac Pro, Mac Studio, and macOS are trademarks of Apple Inc. The names and logos of third-party products and companies shown on the website are the property of their respective owners and may also be trademarked.

©2024 MacStadium, Inc. is a U.S. corporation headquartered at 3525 Piedmont Road, NE, Building 7, Suite 700, Atlanta, GA 30305. MacStadium, Ltd. is registered in Ireland, company no. 562354.