How to Detect and Avoid Playwright Flaky Tests?

Flaky tests in Playwright pass and fail unpredictably without code changes. This kind of inconsistency slows down deployments, makes it hard to trust the test results, and can throw off your CI pipeline.

Flaky behavior usually comes from timing problems, unreliable selectors, or differences in the test environment. Therefore, it’s important to find and fix these issues early so your test automation stays reliable and gives you clear feedback during development.

This article explains how flaky tests affect Playwright projects, why they occur, how to prevent them, and how to detect them using tools like BrowserStack Automate.

What Are Flaky Tests in Playwright?

Flaky tests in Playwright are automated tests that pass during one execution and fail the next, even when no changes have been made to the codebase. They create both false positives and false negatives, which waste debugging time and make it harder to catch real issues.

Also Read: How to avoid False Positives and False Negatives in Testing?

Impact of Flaky Tests on CI Pipelines

Flaky tests waste time on false failures and slow down releases. Here are some ways they impact CI pipelines.

False alarms: Flaky tests cause failures that are not real and make teams spend time investigating issues that do not actually exist.
Blocked deployments: When test results feel unreliable teams hesitate to push code which slows down releases and disrupts development flow.

Repeated reruns: Tests often need multiple runs to determine if a failure is real or just flaky, which wastes time and resources.
Loss of trust: Over time, teams stop trusting test results and ignore failures, risking missing real problems.
Slower feedback: Flaky tests create noise that makes it take longer to identify and fix genuine bugs, delaying the development process.

Why Do Flaky Tests Occur?

Flaky tests produce inconsistent results despite unchanged code. This occurs because:

Timing issues: Tests fail if the page elements haven’t fully loaded or appeared yet. For example, if a button hasn’t appeared, the test can’t click it and will fail.
Unstable selectors: Tests break when developers change the page layout because the selectors can’t find the right elements anymore. For example, if a developer changes a button’s class name, the test won’t be able to find it and will end up failing.

Read More: Playwright Selectors: Types

External service problems: If tests depend on APIs or databases that are slow or unreliable, they’ll fail when those services take too long to respond or crash.
Varying test environments: Testing environments vary because browsers, operating systems, and networks behave and perform differently. These differences affect how pages load or respond, causing tests to pass in one environment but fail in another.
Limited resources: Tests can slow down or fail unexpectedly if the machine running them has limited memory or CPU resources.
Test interference: Tests that share data or state can conflict with each other, leading to unreliable results. For example, if two tests update the same user account at the same time, one might overwrite the other’s changes and lead to inconsistent results.
Random data and async operations: Tests that use random inputs or deal with asynchronous code can behave differently each time they run.

How to Avoid Writing Flaky Tests in Playwright

Stable Playwright tests require specific practices that reduce flakiness. Implement the following methods with examples and code snippets:

1. Run and Debug Tests Before Committing

Execute tests locally using npx playwright test and –debug mode to catch failures early. Playwright’s debug mode pauses at breakpoints, making it easier to trace flaky behavior before pushing to CI.

Here’s how to use Playwright’s debug mode:

test('example test', async ({ page }) => {

  await page.goto('https://example.com');

  await page.pause(); // Opens Playwright Inspector for debugging

  await expect(page.locator('h1')).toHaveText('Example Domain');

});

2. Avoid Hard Waits

Hard waits pause test execution for a fixed amount of time, even if the app is already ready or still loading. This leads to failures when the app loads slower or faster than expected. Use Playwright’s built-in auto-waiting, which waits for elements to become visible, attached, and actionable before interacting with them.

// Avoid this

await page.waitForTimeout(5000);




// Use auto-waiting

await page.click('button:has-text("Submit")');

3. Control the Testing Environment

Inconsistent environments can cause tests to behave differently across runs. Use fixed settings for viewport size, browser, and permissions to reduce variability. Set these in playwright.config.ts to ensure every test runs in the same environment:

use: {

  browserName: 'chromium',

  viewport: { width: 1280, height: 720 },

  permissions: ['geolocation'],

}

This helps prevent environment-based flakiness and makes test results more predictable.

4. Use Locators Instead of Selectors

CSS or XPath selectors often break when the UI changes. Playwright locators are more stable because they rely on roles, text, or labels. Use this approach for more reliable element targeting:

// Avoid

await page.click('#submit-btn');




// Use

await page.getByRole('button', { name: 'Submit' }).click();

5. Avoid Random Values

Random inputs make tests unreliable because they produce different outcomes each run. Use fixed values or seed your randomness to keep test behavior predictable.

Here’s an example of using fixed input values:

const username = 'testuser';

await page.fill('#username', username);

6. Ensure Test Isolation

Tests that depend on shared state can interfere with each other and create flakiness. Isolation tests ensure each test runs independently with a clean setup.

Here’s how to reset the state before each test:

test.beforeEach(async ({ page }) => {

  await page.goto('https://example.com/reset');

});

7. Configure Automatic Retry

Some failures come from temporary network or environment issues. Setting retries helps detect flaky tests by rerunning failed ones before marking them as broken.

Here’s how to configure retries in Playwright:

module.exports = {

  retries: 2, // Retry failed tests up to 2 times

};

8. Limit Dependence on External Resources

External services like third-party APIs can be slow or unreliable during tests. Mock these calls using page.route to avoid delays and ensure consistent responses:

await page.route('**/external-api/**', route =>

  route.fulfill({ status: 200, body: JSON.stringify({ data: 'mocked' }) })

);

How to Detect Flaky Tests in Playwright?

Use these methods to identify flaky tests.

1. Loop Suspected Tests to Confirm Flakiness

If a test fails intermittently in CI but not locally, it may be flaky. Run that specific test multiple times in a controlled local environment to see if it fails at random. Use a loop or a script to automate repeated execution:

for (let i = 0; i < 20; i++) {

  const result = await exec('npx playwright test tests/sample.spec.ts');

  console.log(`Run ${i + 1}:`, result.status);

}

This helps confirm whether the issue is real or just a one-time failure.

2. Use Playwright’s Built-In Retry Feature

Retries help surface tests that fail for non-deterministic reasons. When a test passes only after a retry, it likely depends on unstable factors like timing, network conditions, or inconsistent setup.

You can enable retries globally or per test:

// Global setting in playwright.config.ts

retries: 2




// Per-test setting

test('example test', async ({ page }) => {

  // test logic

}).retries(2);

3. Capture Flaky Behavior With TestInfo

TestInfo in Playwright provides metadata about the current test run, such as status, retries, and error details. Logging these helps identify flaky tests by showing how often a test needs rerunning and what causes failures. For example, you can log retry attempts inside your test:

test('retry log demo', async ({ page }, testInfo) => {

  if (testInfo.retry) {

    console.log(`This test is retrying for the ${testInfo.retry} time`);

  }

});

This information highlights tests that behave inconsistently and need investigation.

4. Analyze With Trace Viewer and Video

Enable tracing or video for flaky tests to replay the test step-by-step and identify where timing or async failures happen.

use: {

  trace: 'on-first-retry',

  video: 'on-first-retry'

}

5. Run Flaky Tests in Isolation Using Test.only

Test.only is a command in Playwright that runs only the specified test, skipping all others. Use it to isolate flaky tests that pass alone but fail when run with the full suite. This helps identify issues caused by shared state, side effects, or test order.

6. Use Strict Locators to Reduce Timing Issues

Flaky failures like “locator not found” often happen because the test tries to access elements before they appear. Using strict locators and awaiting visibility checks makes tests wait properly and reduces timing-related flakiness. Here’s how to use it:

await expect(page.locator('button:has-text("Submit")')).toBeVisible();

7. Monitor flaky test patterns over time

Track recurring flaky tests using Playwright’s HTML reports or CI analytics tools. Adding tags like @flaky helps filter these tests and prioritize fixing them based on frequency and impact.

Fixing Playwright Flaky Tests

Once a flaky test is identified, fix it by targeting the root cause to ensure consistent results:

1. Analyze Failure Details

Use Playwright’s Trace Viewer to inspect failures in detail. Run the following to review test steps, DOM snapshots, and network activity.

npx playwright show-trace trace.zip

2. Use Robust Locators

Replace fragile selectors with Playwright’s role-based or chained locators for stable element targeting. For example:

await page.getByRole('button', { name: 'Submit' }).click();

3. Avoid Hard waits

Remove fixed delays. Instead, wait for specific app conditions like responses or element states:

await expect(page.locator('.toast-message')).toHaveText('Saved successfully');

Also Read: Understanding Playwright waitforloadstate

4. Handle Async Flows Properly

Ensure navigation and API calls are complete before proceeding. Use Playwright’s built-in waiting and assertions.

5. Run Dependent Tests Sequentially

If tests share state or data, disable parallelism for that scope:

test.describe.configure({ mode: 'serial' });

6. Validate Fixes Across Environments

Test your changes locally, in CI, and across browsers to confirm stability before merging.

How BrowserStack Helps Detect Flaky Tests in Playwright

Playwright handles test flakiness with auto-waits, locator strategies, and trace debugging. However, it runs locally or in headless containers, which may miss environment-specific failures. BrowserStack runs tests on real devices and browsers with actual operating systems, screen sizes, and network conditions.

While Playwright can simulate throttled conditions like slow networks or devices, BrowserStack reveals real-world variability in rendering, performance, and third-party behavior. This makes it easier to identify flakiness tied to platform differences.

To run Playwright tests on BrowserStack Automate and detect flaky issues:

1. Install Dependencies

npm install -D @browserstack/playwright-cli

2. Authenticate Using BrowserStack Credentials

export BROWSERSTACK_USERNAME='your_username'

export BROWSERSTACK_ACCESS_KEY='your_access_key'

3. Set Up Test Configuration

Update playwright.config.ts to include BrowserStack’s project name, build, and test settings. For example:

use: {

  browserName: 'chromium',

  viewport: { width: 1280, height: 720 },

  screenshot: 'on',

  video: 'on',

  trace: 'on',

}

4. Run Tests with BrowserStack

browserstack-playwright --config playwright.config.ts

5. Enable Retries and Traces:

retries: 2,

use: {

  trace: 'on-first-retry',

  video: 'on',

}

6. Review Session Logs

Utilize the BrowserStack dashboard to view video recordings, console logs, traces, and failure reasons.

Talk to an Expert

Conclusion

Flaky tests cause intermittent failures due to timing issues, environment differences, or external factors, undermining trust in test results and slowing development. Detecting them requires running tests repeatedly across varied environments and analyzing logs to spot inconsistencies.

However, real device testing is crucial because it reveals issues caused by actual hardware, operating systems, and network conditions that simulators or local setups miss. BrowserStack provides access to thousands of real devices and browsers, allowing Playwright tests to run in real user conditions. Its detailed logs, videos, and trace data help quickly detect and fix flaky tests, improving test reliability and product quality.

Try BrowserStack for Free

Useful Resources for Playwright

Tool Comparisons:

Automation Tests on Real Devices & Browsers

Seamlessly Run Automation Tests on 3500+ real Devices & Browsers

Get answers on our Discord Community

Join our Discord community to connect with others! Get your questions answered and stay informed.

Join Discord Community

How to Detect and Avoid Playwright Flaky Tests?