How to avoid Flaky Tests : Methods

Home Guide How to avoid Flaky Tests?

What are Flaky Tests?
How to avoid Flaky Tests in a Test Suite?

How to avoid Flaky Tests?

Flaky tests are unreliable automated tests that sometimes pass and sometimes fail without any real code changes. They disrupt CI/CD pipelines, waste debugging time, and reduce trust in automation.

Overview

What are Flaky Tests?

Tests with inconsistent outcomes (pass/fail randomly).
Caused by timing, environment setup, or concurrency issues.
Example: A test fails only when run after another test.

Why are Flaky Tests Harmful?

Block CI/CD pipelines with false failures.
Waste developer and tester time in debugging.
Lower confidence in automation results.

How to Avoid Flaky Tests?

Replace Sleep with conditional Wait commands.
Run tests frequently on CI pipelines and real devices.
Document flaky tests to identify recurring patterns.
Ensure test independence by avoiding order dependencies.

This article explains what flaky tests are, why they cause problems, and the most effective ways to prevent them in real-world testing.

What are Flaky Tests?

Ask a developer or tester this question, and they will probably answer by groaning in exasperation. They have good reason to since flaky tests are notoriously painful to debug.

This is because flaky tests are ones with non-deterministic outcomes. Essentially, it’s a test that, when executed, sometimes pass and sometimes fail. This makes them hard to reproduce because inconsistent results in the same environment will naturally make it harder to pinpoint an actual error in the code.

Additionally, flaky tests are a drain on developer time because even when they fail, they don’t necessarily indicate the existence of a bug. In a software pipeline that runs regression tests before each code commit, flaky tests cause delays. This is because they may seem related to one of more commits even though, in reality, they are entirely unrelated.

How to avoid Flaky Tests in a Test Suite?

1. Abandon the Sleep

Almost all automated tests must wait for a web page, app, or at least certain web elements to load before replicating specific user actions to verify software performance. When this is accomplished with the Sleep commands, it has been observed that flaky tests appear more frequently.

The Sleep command asks a test to pause for a specified amount of time before continuing to execute. However, Sleep statements tend to be imprecise, which can lead to test failure in variant circumstances.

Fixing this is simple: use Wait commands instead of Sleep. Wait commands can be set to pause a test until a certain condition becomes true until a particular timeout value. Now, the test would still be flaky if the elements take longer to load than the timeout value. However, in most cases, the condition becomes true, and the test proceeds as expected. When Sleep statements are used, the test will wait for the specified time, no matter what.

A test with a Sleep statement set to 40 seconds waits 40 seconds even if web elements render in 5 seconds. With a Wait command, the test waits 5 seconds even if the timeout is 40 seconds.

2. Get Flaky Tests out in the open

Devs and testers often notice that flaky tests show up on Continuous Integration (CI) environments more often than in their local machines. This is mainly because the entire test suite is run primarily in CI rather than local development machines. Developers also don’t usually run the whole test suite too regularly.

How is getting more flaky tests to show up related to avoiding them? Simple. It helps devs and testers recognize patterns. Once patterns are identified, one can be careful to avoid them in the future, thus reducing test flakiness in the long run.

Use a CI server from the beginning of the project. Create a branch to find flaky tests. Set up the CI server to schedule a build on this branch as frequently as possible. In a few days, enough builds will be run to cause flaky tests (if any) to pop up.

Execute builds at different times of the day to determine if flaky tests show up at a specific time; this may help identify its cause.

Remember to run all tests on real devices. Emulators and simulators simply do not provide the real user conditions required to monitor a website or app accurately and evaluate their performance.

Running tests directly on real devices removes all room for doubt. Whether manual testing or automated Selenium testing, real devices are non-negotiable in the testing equation. In the absence of an in-house device lab (regularly updated with new devices and maintains each of them at the highest levels of functionality), opt for cloud-based testing infrastructure. BrowserStack provides 2000+ real browsers and devices that can be accessed for testing from anywhere in the world at any time.

Additionally, the BrowserStack is set up to facilitate a DevOps testing strategy. Its cloud provides integrations with popular CI/CD tools such as Jira, Jenkins, TeamCity, Travis CI, and much more. Additionally, there are in-built debugging tools that let testers identify and resolve bugs immediately.

Users can sign up, select a device-browser-OS combination, and start testing for free. They can simulate user conditions such as low network and battery, changes in location (both local and global changes), and viewport sizes and screen resolutions.

Run Tests on Real Devices for Free

3. Document, Document, Document

Much like pushing flaky tests into the open, documenting them helps to identify patterns, which usually helps to pinpoint a cause. Document every flaky test in the ticketing system. If tests can be fixed, do it. If not, gather as much data as possible. Once the frequency and nature of flaky tests become apparent, the team can make informed decisions about a long-term fix.

Look at Concurrency: Problems with concurrency often leads to test flakiness. Concurrency issues such as data races, deadlocks, and atomicity breakdown.

In these cases, flakiness comes from the fact that the developer has incorrectly calculated the order of operations being run on variant threads. Multiple code behaviors in a project can be perfectly legitimate, but if a test takes into account only a portion of these behaviors, its outcome will be non-deterministic.

Resolve this by modifying the test to accept a greater range of code behavior or add a synchronization block. Any test with concurrent code might benefit from synchronizing some of its statements. Look at tools like IMUnit for this purpose, as they can be used to test different thread schedules.

4. Look at Test Order

Sometimes, flaky tests occur because these tests pass or daily on the basis on which other tests were run before them. In most projects, tests make implicit assumptions about the environment they will run is (database, memory, etc.) without much verification.

Flakiness can occur in the following ways:

A test runs and fails if another particular test runs before it. This is because the first test changes elements in the test environment.
A test runs and fails if another test does not run before it because the first test sets up the environment with the variables necessary for its success.

Resolve this by creating tests that can run independently, are able to set up the environment at hand, or aren’t disturbed by changes in the environment (this is usually tough to program). Ideally, it should leave the environment in a pristine condition after execution is complete.

Flaky Tests are thorns on both developers’ and testers’ sides. It’s hard to avoid them altogether, but a few steps can be taken to minimize their occurrence or build in quick fixes for when they do show up. Invest some time in these, and save time and effort in the long run.