Flaky tests in Playwright pass and fail unpredictably without code changes. This kind of inconsistency slows down deployments, makes it hard to trust the test results, and can throw off your CI pipeline.
Flaky behavior usually comes from timing problems, unreliable selectors, or differences in the test environment. Therefore, it’s important to find and fix these issues early so your test automation stays reliable and gives you clear feedback during development.
This article explains how flaky tests affect Playwright projects, why they occur, how to prevent them, and how to detect them using tools like BrowserStack Automate.
What Are Flaky Tests in Playwright?
Flaky tests in Playwright are automated tests that pass during one execution and fail the next, even when no changes have been made to the codebase. They create both false positives and false negatives, which waste debugging time and make it harder to catch real issues.
Impact of Flaky Tests on CI Pipelines
Flaky tests waste time on false failures and slow down releases. Here are some ways they impact CI pipelines.
- False alarms: Flaky tests cause failures that are not real and make teams spend time investigating issues that do not actually exist.
- Blocked deployments: When test results feel unreliable teams hesitate to push code which slows down releases and disrupts development flow.
Read More: Testing Tactics for Faster Release Cycles
- Repeated reruns: Tests often need multiple runs to determine if a failure is real or just flaky, which wastes time and resources.
- Loss of trust: Over time, teams stop trusting test results and ignore failures, risking missing real problems.
- Slower feedback: Flaky tests create noise that makes it take longer to identify and fix genuine bugs, delaying the development process.
Read More: How to improve DevOps Feedback Loop
Why Do Flaky Tests Occur?
Flaky tests produce inconsistent results despite unchanged code. This occurs because:
- Timing issues: Tests fail if the page elements haven’t fully loaded or appeared yet. For example, if a button hasn’t appeared, the test can’t click it and will fail.
- Unstable selectors: Tests break when developers change the page layout because the selectors can’t find the right elements anymore. For example, if a developer changes a button’s class name, the test won’t be able to find it and will end up failing.
Read More: Playwright Selectors: Types
- External service problems: If tests depend on APIs or databases that are slow or unreliable, they’ll fail when those services take too long to respond or crash.
- Varying test environments: Testing environments vary because browsers, operating systems, and networks behave and perform differently. These differences affect how pages load or respond, causing tests to pass in one environment but fail in another.
- Limited resources: Tests can slow down or fail unexpectedly if the machine running them has limited memory or CPU resources.
- Test interference: Tests that share data or state can conflict with each other, leading to unreliable results. For example, if two tests update the same user account at the same time, one might overwrite the other’s changes and lead to inconsistent results.
- Random data and async operations: Tests that use random inputs or deal with asynchronous code can behave differently each time they run.
How to Avoid Writing Flaky Tests in Playwright
Stable Playwright tests require specific practices that reduce flakiness. Implement the following methods with examples and code snippets:
1. Run and Debug Tests Before Committing
Execute tests locally using npx playwright test and –debug mode to catch failures early. Playwright’s debug mode pauses at breakpoints, making it easier to trace flaky behavior before pushing to CI.
Here’s how to use Playwright’s debug mode:
test('example test', async ({ page }) => { await page.goto('https://example.com'); await page.pause(); // Opens Playwright Inspector for debugging await expect(page.locator('h1')).toHaveText('Example Domain'); });
Read More: How to start with Playwright Debugging?
2. Avoid Hard Waits
Hard waits pause test execution for a fixed amount of time, even if the app is already ready or still loading. This leads to failures when the app loads slower or faster than expected. Use Playwright’s built-in auto-waiting, which waits for elements to become visible, attached, and actionable before interacting with them.
// Avoid this await page.waitForTimeout(5000); // Use auto-waiting await page.click('button:has-text("Submit")');
3. Control the Testing Environment
Inconsistent environments can cause tests to behave differently across runs. Use fixed settings for viewport size, browser, and permissions to reduce variability. Set these in playwright.config.ts to ensure every test runs in the same environment:
use: { browserName: 'chromium', viewport: { width: 1280, height: 720 }, permissions: ['geolocation'], }
This helps prevent environment-based flakiness and makes test results more predictable.
4. Use Locators Instead of Selectors
CSS or XPath selectors often break when the UI changes. Playwright locators are more stable because they rely on roles, text, or labels. Use this approach for more reliable element targeting:
// Avoid await page.click('#submit-btn'); // Use await page.getByRole('button', { name: 'Submit' }).click();
5. Avoid Random Values
Random inputs make tests unreliable because they produce different outcomes each run. Use fixed values or seed your randomness to keep test behavior predictable.
Here’s an example of using fixed input values:
const username = 'testuser'; await page.fill('#username', username);
6. Ensure Test Isolation
Tests that depend on shared state can interfere with each other and create flakiness. Isolation tests ensure each test runs independently with a clean setup.
Here’s how to reset the state before each test:
test.beforeEach(async ({ page }) => { await page.goto('https://example.com/reset'); });
7. Configure Automatic Retry
Some failures come from temporary network or environment issues. Setting retries helps detect flaky tests by rerunning failed ones before marking them as broken.
Here’s how to configure retries in Playwright:
module.exports = { retries: 2, // Retry failed tests up to 2 times };
8. Limit Dependence on External Resources
External services like third-party APIs can be slow or unreliable during tests. Mock these calls using page.route to avoid delays and ensure consistent responses:
await page.route('**/external-api/**', route => route.fulfill({ status: 200, body: JSON.stringify({ data: 'mocked' }) }) );
How to Detect Flaky Tests in Playwright?
Use these methods to identify flaky tests.
1. Loop Suspected Tests to Confirm Flakiness
If a test fails intermittently in CI but not locally, it may be flaky. Run that specific test multiple times in a controlled local environment to see if it fails at random. Use a loop or a script to automate repeated execution:
for (let i = 0; i < 20; i++) { const result = await exec('npx playwright test tests/sample.spec.ts'); console.log(`Run ${i + 1}:`, result.status); }
This helps confirm whether the issue is real or just a one-time failure.
2. Use Playwright’s Built-In Retry Feature
Retries help surface tests that fail for non-deterministic reasons. When a test passes only after a retry, it likely depends on unstable factors like timing, network conditions, or inconsistent setup.
You can enable retries globally or per test:
// Global setting in playwright.config.ts retries: 2 // Per-test setting test('example test', async ({ page }) => { // test logic }).retries(2);
3. Capture Flaky Behavior With TestInfo
TestInfo in Playwright provides metadata about the current test run, such as status, retries, and error details. Logging these helps identify flaky tests by showing how often a test needs rerunning and what causes failures. For example, you can log retry attempts inside your test:
test('retry log demo', async ({ page }, testInfo) => { if (testInfo.retry) { console.log(`This test is retrying for the ${testInfo.retry} time`); } });
This information highlights tests that behave inconsistently and need investigation.
4. Analyze With Trace Viewer and Video
Enable tracing or video for flaky tests to replay the test step-by-step and identify where timing or async failures happen.
use: { trace: 'on-first-retry', video: 'on-first-retry' }
5. Run Flaky Tests in Isolation Using Test.only
Test.only is a command in Playwright that runs only the specified test, skipping all others. Use it to isolate flaky tests that pass alone but fail when run with the full suite. This helps identify issues caused by shared state, side effects, or test order.
6. Use Strict Locators to Reduce Timing Issues
Flaky failures like “locator not found” often happen because the test tries to access elements before they appear. Using strict locators and awaiting visibility checks makes tests wait properly and reduces timing-related flakiness. Here’s how to use it:
await expect(page.locator('button:has-text("Submit")')).toBeVisible();
7. Monitor flaky test patterns over time
Track recurring flaky tests using Playwright’s HTML reports or CI analytics tools. Adding tags like @flaky helps filter these tests and prioritize fixing them based on frequency and impact.
Fixing Playwright Flaky Tests
Once a flaky test is identified, fix it by targeting the root cause to ensure consistent results:
1. Analyze Failure Details
Use Playwright’s Trace Viewer to inspect failures in detail. Run the following to review test steps, DOM snapshots, and network activity.
npx playwright show-trace trace.zip
2. Use Robust Locators
Replace fragile selectors with Playwright’s role-based or chained locators for stable element targeting. For example:
await page.getByRole('button', { name: 'Submit' }).click();
3. Avoid Hard waits
Remove fixed delays. Instead, wait for specific app conditions like responses or element states:
await expect(page.locator('.toast-message')).toHaveText('Saved successfully');
Also Read: Understanding Playwright waitforloadstate
4. Handle Async Flows Properly
Ensure navigation and API calls are complete before proceeding. Use Playwright’s built-in waiting and assertions.
5. Run Dependent Tests Sequentially
If tests share state or data, disable parallelism for that scope:
test.describe.configure({ mode: 'serial' });
6. Validate Fixes Across Environments
Test your changes locally, in CI, and across browsers to confirm stability before merging.
How BrowserStack Helps Detect Flaky Tests in Playwright
Playwright handles test flakiness with auto-waits, locator strategies, and trace debugging. However, it runs locally or in headless containers, which may miss environment-specific failures. BrowserStack runs tests on real devices and browsers with actual operating systems, screen sizes, and network conditions.
While Playwright can simulate throttled conditions like slow networks or devices, BrowserStack reveals real-world variability in rendering, performance, and third-party behavior. This makes it easier to identify flakiness tied to platform differences.
To run Playwright tests on BrowserStack Automate and detect flaky issues:
1. Install Dependencies
npm install -D @browserstack/playwright-cli
2. Authenticate Using BrowserStack Credentials
export BROWSERSTACK_USERNAME='your_username' export BROWSERSTACK_ACCESS_KEY='your_access_key'
3. Set Up Test Configuration
Update playwright.config.ts to include BrowserStack’s project name, build, and test settings. For example:
use: { browserName: 'chromium', viewport: { width: 1280, height: 720 }, screenshot: 'on', video: 'on', trace: 'on', }
4. Run Tests with BrowserStack
browserstack-playwright --config playwright.config.ts
5. Enable Retries and Traces:
retries: 2, use: { trace: 'on-first-retry', video: 'on', }
6. Review Session Logs
Utilize the BrowserStack dashboard to view video recordings, console logs, traces, and failure reasons.
Conclusion
Flaky tests cause intermittent failures due to timing issues, environment differences, or external factors, undermining trust in test results and slowing development. Detecting them requires running tests repeatedly across varied environments and analyzing logs to spot inconsistencies.
However, real device testing is crucial because it reveals issues caused by actual hardware, operating systems, and network conditions that simulators or local setups miss. BrowserStack provides access to thousands of real devices and browsers, allowing Playwright tests to run in real user conditions. Its detailed logs, videos, and trace data help quickly detect and fix flaky tests, improving test reliability and product quality.
Useful Resources for Playwright
- Playwright Automation Framework
- Playwright Java Tutorial
- Playwright Python tutorial
- Playwright Debugging
- End to End Testing using Playwright
- Visual Regression Testing Using Playwright
- Mastering End-to-End Testing with Playwright and Docker
- Page Object Model in Playwright
- Scroll to Element in Playwright
- Understanding Playwright Assertions
- Cross Browser Testing using Playwright
- Playwright Selectors
- Playwright and Cucumber Automation
Tool Comparisons: