Automate Tests on Safari using Playwright

I’ve seen it happen countless times-teams focus on validating tests in Chromium, only to have Safari-specific issues sneak into production. Things like layout shifts, broken input behaviors, and autoplay quirks that don’t show up in Chromium. That’s why I rely on Playwright. It lets me run tests on the WebKit engine, which powers Safari, so I can catch those WebKit-only bugs before they affect my users.

Overview

Common Safari Automation Challenges

Flaky navigation waits after clicks:page.waitForNavigation() intermittently times out in WebKit.
File uploads behave inconsistently: Issues with file uploads when interacting with OS dialogs.
Date pickers and custom inputs differ in WebKit: Inconsistent behavior when interacting with date pickers or custom input fields.
WebKit-only visual regressions: UI elements may render differently in WebKit, causing visual regressions.

Why Safari Tests Often Fail in CI Environments

Resource contention and timing variance: CI machines often have limited resources, causing issues with test reliability.
Differences between headless and headed behavior: Subtle behavior differences may occur when WebKit is run headless in CI but headed locally.

Best Practices for Stable Safari Automation with Playwright

Isolate WebKit with Playwright Test projects
Use semantic selectors
Avoid pixel-perfect assertions
Stabilize navigation
Minimize repeated logins
Collect traces on retry
Run a curated Safari smoke suite

This article explains how to automate tests on Safari using Playwright, ensuring your cross-browser tests are solid and reliable.

What Makes Safari Testing Different from Other Browsers

Safari is not “just another browser target.” Even when the same standards exist, WebKit’s behavior can diverge in ways that matter for E2E automation.

Rendering and layout differences that break UI assertions

Safari/WebKit can render fonts, sub-pixel rounding, sticky positioning, and scroll behaviors differently than Chromium. Visual or DOM-position assertions that are stable in Chrome can become flaky in WebKit, especially when:

Assertions depend on exact element bounding boxes or pixel offsets.
Tests click elements near edges where hit-testing differs.
Layout shifts occur due to font loading differences.

Practical mitigation: prefer semantic assertions (role/text/state) over pixel-based checks, and ensure stable font loading (self-hosted fonts, deterministic network, or preloaded fonts) for critical pages.

Web API gaps and “same API, different behavior”

Safari can implement APIs with different edge-case behavior, especially around:

WebRTC/media permissions and autoplay policies.
Storage quotas and third-party cookie/storage partitioning behavior.
Input events on iOS-like interactions (touch, focus/blur ordering).

That is why “runs on Chrome” is not a proxy for “works on Safari.”

How Playwright Supports Safari via WebKit

Playwright supports three browser engines: Chromium, Firefox, and WebKit. Safari coverage in Playwright is achieved by running tests against WebKit, not by driving the Safari application directly.

WebKit in Playwright vs real Safari

Playwright downloads and runs a WebKit build that is close to Safari’s engine behavior, but it is still not identical to the full Safari browser app with Apple’s proprietary integrations, Safari-specific UI features, and OS-level services.

What this means in practice:

WebKit automation is usually enough to catch most rendering and standards-related regressions that would appear in Safari.
Some bugs reproduce only on specific Safari versions, specific macOS/iOS builds, or real devices where performance, memory pressure, and OS policies differ.

Read More: Enabling JavaScript on Safari

Prerequisites for Automating Safari Tests with Playwright

Before writing Safari-focused tests, ensure the setup supports deterministic runs and clear debugging.

Technical requirements

Node.js LTS installed (recommended for Playwright Test workflows).
Playwright installed with browsers, including WebKit.
A test runner strategy that isolates state per test (Playwright Test does this by default via browser contexts).

Test design requirements

Safari reliability is less about “special Safari commands” and more about reducing nondeterminism:

Avoid timing-based waits; rely on locator auto-wait and explicit conditions.
Keep selectors resilient (role-based locators, test IDs).
Control test data and environment to avoid backend-induced flakiness.

Setting Up Playwright for Safari (WebKit) Automation

Playwright’s WebKit runs on Windows, Linux, and macOS, so Safari-like coverage is possible even without a Mac.

Install Playwright and WebKit

npm init -y npm i -D @playwright/test

npx playwright install

npx playwright install This downloads the browser engines, including WebKit.

Create a baseline test

// tests/safari-smoke.spec.tsimport { test, expect } from “@playwright/test”;

test(“homepage loads and primary CTA is visible (WebKit)”, async ({ page }) => {

await page.goto(“https://example.com”, { waitUntil: “domcontentloaded” });

await expect(page.getByRole(“heading”, { name: /example/i })).toBeVisible();

});

Configure projects so WebKit runs consistently

Playwright Test supports projects so the same suite can run across engines.

// playwright.config.tsimport { defineConfig, devices } from “@playwright/test”;

export default defineConfig({

testDir: “./tests”,

retries: process.env.CI ? 2 : 0,

use: {

trace: “on-first-retry”,

video: “retain-on-failure”,

screenshot: “only-on-failure”,

baseURL: “https://example.com”,

},

projects: [

{

name: “webkit-safari-like”,

use: {

…devices[“Desktop Safari”],

browserName: “webkit”,

},

},

],

});

This makes the intent explicit: the project is WebKit with Safari-like device settings. (Device presets are supported by Playwright’s emulation options.)

Running Playwright Tests on Safari Locally

Once configured, WebKit runs the same way as other engines.

Run only WebKit

npx playwright test –project=webkit-safari-like

Run headed for debugging UI issues

npx playwright test –project=webkit-safari-like –headed

Run with Playwright UI mode when iterating

npx playwright test –project=webkit-safari-like –ui

Local WebKit is excellent for catching many Safari-type issues early, especially CSS/layout, event ordering, and common Web API differences.

Key Limitations of Local Safari Testing

Local WebKit testing is valuable, but it does not fully replace real Safari on real Apple hardware.

1. WebKit is not the Safari application

Playwright WebKit does not execute tests in the actual Safari browser app. It automates a WebKit build that approximates Safari behavior.

Practical impact:

Some Safari-only bugs tied to Safari’s versioning and OS integrations may not reproduce.
Debugging parity can vary because Safari’s tooling and protocols differ from Chromium-based ecosystems.

2. iOS Safari behavior is strongly device- and OS-dependent

Playwright includes Mobile Safari emulation capabilities (viewport, UA, touch), but emulation is not the same as real iOS Safari constraints like memory pressure, backgrounding, and real network conditions.

If Safari reliability on iPhones/iPads is the primary risk, real device coverage becomes important.

Handling Safari-Specific WebKit Issues in Playwright

Safari automation gets stable faster when tests explicitly account for WebKit patterns.

Stabilize interactions with explicit, state-based assertions

Clicks that “work in Chrome” can fail in WebKit if an element is overlapped, animating, or not truly clickable.

import { test, expect } from “@playwright/test”;test(“submit flow waits for button to be enabled”, async ({ page }) => {

await page.goto(“/checkout”);

const payButton = page.getByRole(“button”, { name: “Pay” });

await expect(payButton).toBeVisible();

await expect(payButton).toBeEnabled(); // avoids racing transitions

await payButton.click();

await expect(page.getByText(“Payment confirmed”)).toBeVisible();

});

Why this helps on WebKit:

It avoids “timing clicks” during transitions.
It ensures the element is interactable, not just present in the DOM.

Prefer role-based locators over brittle CSS

Safari layout differences can affect DOM structure or pseudo-elements in ways that break strict CSS-based assumptions. Role-based locators are more resilient when markup shifts slightly.

const email = page.getByRole(“textbox”, { name: “Email” });await email.fill(“qa@example.com”);

Use storageState to avoid fragile login flows

Authentication flows are frequent sources of Safari flakiness due to redirects, cookie policies, and third-party storage behavior. Store authenticated state once and reuse it.

// tests/auth.setup.tsimport { test as setup } from “@playwright/test”;

setup(“authenticate”, async ({ page }) => {

await page.goto(“/login”);

await page.getByLabel(“Email”).fill(process.env.E2E_USER!);

await page.getByLabel(“Password”).fill(process.env.E2E_PASS!);

await page.getByRole(“button”, { name: “Sign in” }).click();

await page.waitForURL(“**/dashboard”);

await page.context().storageState({ path: “storageState.json” });

});

// playwright.config.ts (snippet)

export default defineConfig({

projects: [

{

name: “webkit-safari-like”,

use: {

browserName: “webkit”,

storageState: “storageState.json”,

},

},

],

});

This reduces test time and avoids repeating sensitive, failure-prone login steps.

Common Safari Automation Challenges and Fixes

These are common WebKit/Safari pain points and specific Playwright tactics to address them.

1. Challenge: flaky navigation waits after clicks

Symptom: page.waitForNavigation() intermittently times out in WebKit.

Fix: wait for a deterministic outcome (URL pattern, element presence) instead of raw navigation.

await page.getByRole(“link”, { name: “Orders” }).click();await page.waitForURL(“**/orders”);

await expect(page.getByRole(“heading”, { name: “Orders” })).toBeVisible();

2. Challenge: file uploads behave inconsistently

Fix: use Playwright’s setInputFiles on the rather than OS dialogs.

const upload = page.locator(‘input[type=”file”]’);await upload.setInputFiles(“fixtures/invoice.pdf”);

await expect(page.getByText(“invoice.pdf”)).toBeVisible();

3. Challenge: date pickers and custom inputs differ in WebKit

Fix: avoid clicking pixel-perfect calendar UIs where possible. Prefer filling the underlying input value if the control supports it, and then trigger change/blur.

const date = page.getByLabel(“Start date”);await date.fill(“2026-01-13”);

await date.blur();

await expect(page.getByText(“Jan 13, 2026”)).toBeVisible();

4. Challenge: WebKit-only visual regressions

Fix: keep UI assertions semantic and add targeted screenshots only for critical components, not every page.

await expect(page.locator(“[data-testid=pricing-card]”)) .toHaveScreenshot(“pricing-card-webkit.png”);

Writing Reliable Playwright Tests for Safari

Safari stability comes from making tests deterministic and isolating state.

Build a Safari-focused smoke suite

Not every test must run on WebKit. A practical pattern is:

Run the full suite on Chromium for speed and breadth.
Run a curated smoke/regression suite on WebKit covering the highest-risk flows: auth, checkout, media, complex layouts, and critical forms.

In Playwright Test, that can be done using tags:

import { test } from “@playwright/test”;test.describe(“critical @safari”, () => {

test(“checkout works”, async ({ page }) => {

// critical Safari coverage here

});

});

Then run only Safari-tagged tests on WebKit:

npx playwright test –project=webkit-safari-like -g “@safari”

Keep timeouts intentional, not inflated

Increasing global timeouts often hides WebKit performance issues rather than fixing them. Better approach:

Keep reasonable global timeouts.
Add targeted waits for known slow points (first load, heavy API calls).
Capture traces on retry to diagnose timing variance.

Playwright’s use options support trace/video/screenshot collection to debug failures faster.

Debugging and Tracing Safari Tests in Playwright

Safari-like debugging is easiest when failures produce artifacts that explain “what happened,” not just “it failed.”

Use traces for WebKit failures

Enable tracing on first retry in config (already shown). When a test flakes in WebKit, the trace typically reveals:

Whether the click happened
Whether the page navigated
Whether the target element ever became visible/enabled
Network timings around the failure

Capture WebKit console and page errors

test(“captures browser errors”, async ({ page }) => { const errors: string[] = [];

page.on(“pageerror”, (err) => errors.push(err.message));

page.on(“console”, (msg) => {

if (msg.type() === “error”) errors.push(msg.text());

});

await page.goto(“/settings”);

// assertions…

});

This is especially useful for WebKit-only runtime errors caused by unsupported APIs or subtle differences.

Cross-Browser Validation: Safari vs Chromium and Firefox

Playwright projects allow running the same test logic across engines, which is valuable for isolating whether a failure is:

WebKit-specific behavior
A real product bug exposed by stricter handling
Test design that assumes Chromium issues

Playwright supports running tests on Chromium, Firefox, and WebKit using one API and project configuration.

A practical approach:

If a test fails only on WebKit, check selectors, event timing, and web API usage.
If it fails on all engines, it is likely a product regression or unstable test data.

Run Playwright tests on real Safari, Chromium, and Firefox with BrowserStack Automate to quickly identify cross-browser issues and ensure consistent performance across all major browsers.

Talk to an Expert

Running Playwright Safari Tests in CI Pipelines

CI introduces constraints that make WebKit failures more likely if the pipeline is not tuned.

Use retries strategically and collect artifacts

For CI:

Set retries: 2 for WebKit projects.
Collect trace/video/screenshot on failure or retry.
Ensure the CI machine has sufficient resources; WebKit can be more sensitive to CPU throttling.

Split WebKit into a separate CI job

Keep feedback fast:

Job 1: Chromium full suite (fast, broad).
Job 2: WebKit Safari smoke suite (targeted, high-signal).
Optional Job 3: nightly WebKit expanded suite.

This structure prevents WebKit execution time from slowing every PR while still catching Safari regressions early.

Why Safari Tests Often Fail in CI Environments

Safari-like runs are often stable locally but flaky in CI for predictable reasons.

1. Resource contention and timing variance

CI machines often have:

Shared CPU
Limited memory
No GPU acceleration
Background processes competing for resources

Symptoms: elements not ready when expected, animations slower, and timeouts that occur only under load.

Fixes:

Reduce parallelism for WebKit jobs.
Avoid global slowMo; instead, add explicit state-based waits.
Prefer waitForURL / expect(locator).toBeVisible() over navigation waits.

2. Differences between headless and headed behavior

If WebKit is run headless in CI but headed locally, subtle behavior differences can appear. Running headed for debugging (or reproducing the same mode locally as CI) reduces churn.

Best Practices for Stable Safari Automation with Playwright

Safari stability improves when test engineering decisions assume WebKit will behave differently.

Use Playwright Test projects to isolate WebKit and keep configuration explicit.
Keep selectors semantic: roles, labels, and stable test IDs.
Avoid pixel-perfect assertions except where necessary; prefer state and content assertions.
Stabilize navigation by asserting outcomes (URL + visible landmarks).
Minimize repeated logins using storageState.
Collect traces on retry to debug WebKit-only failures quickly.
Run a curated Safari smoke suite on every PR, and expand coverage nightly.

Why Automate Safari Playwright Tests with BrowserStack?

Running Safari Playwright tests on BrowserStack solves gaps that local WebKit execution and CI machines cannot fully cover. The value is not just scale, but accuracy and confidence in real Safari behavior.

Real Safari, not just WebKit: Executes tests on actual Safari browsers instead of WebKit builds, exposing Safari-only bugs tied to Apple’s browser and OS behavior.
Safari version accuracy: Validates Playwright tests across real Safari and macOS versions to catch issues caused by Safari version fragmentation.
Stable CI execution: Eliminates flaky Safari failures caused by shared CI resources, headless differences, and timing variance.
True iOS Safari coverage: Runs Playwright tests on real iPhones and iPads where mobile Safari behavior differs from desktop and emulation.
Actionable debugging artifacts: Provides videos, logs, and screenshots to diagnose Safari-specific failures without local reproduction.
No macOS infrastructure overhead: Removes the need to maintain Mac hardware while enabling parallel Safari test execution at scale.
Higher production confidence: Ensures Safari regressions are caught before release, not after real users hit them.

Try BrowserStack Automate

Conclusion

Automating tests on Safari using Playwright is practical when the goal is to catch WebKit-specific regressions early and keep cross-browser confidence high. WebKit projects, resilient locators, deterministic waits, and trace-driven debugging eliminate most Safari flakiness. For coverage that must match real users on real Apple hardware and specific Safari versions, running Playwright on BrowserStack Automate adds the final layer of confidence where local WebKit alone can fall short.

Safari bugs showing up only after release?

Run Playwright Safari tests on real macOS devices with BrowserStack to catch failures early.

Get answers on our Discord Community

Join our Discord community to connect with others! Get your questions answered and stay informed.

Join Discord Community

Run Safari Playwright Tests with Confidence