I’ve seen it happen countless times-teams focus on validating tests in Chromium, only to have Safari-specific issues sneak into production. Things like layout shifts, broken input behaviors, and autoplay quirks that don’t show up in Chromium. That’s why I rely on Playwright. It lets me run tests on the WebKit engine, which powers Safari, so I can catch those WebKit-only bugs before they affect my users.
Overview
- Flaky navigation waits after clicks:page.waitForNavigation() intermittently times out in WebKit.
- File uploads behave inconsistently: Issues with file uploads when interacting with OS dialogs.
- Date pickers and custom inputs differ in WebKit: Inconsistent behavior when interacting with date pickers or custom input fields.
- WebKit-only visual regressions: UI elements may render differently in WebKit, causing visual regressions.
Why Safari Tests Often Fail in CI Environments
- Resource contention and timing variance: CI machines often have limited resources, causing issues with test reliability.
- Differences between headless and headed behavior: Subtle behavior differences may occur when WebKit is run headless in CI but headed locally.
Best Practices for Stable Safari Automation with Playwright
- Isolate WebKit with Playwright Test projects
- Use semantic selectors
- Avoid pixel-perfect assertions
- Stabilize navigation
- Minimize repeated logins
- Collect traces on retry
- Run a curated Safari smoke suite
This article explains how to automate tests on Safari using Playwright, ensuring your cross-browser tests are solid and reliable.
What Makes Safari Testing Different from Other Browsers
Safari is not “just another browser target.” Even when the same standards exist, WebKit’s behavior can diverge in ways that matter for E2E automation.
Rendering and layout differences that break UI assertions
Safari/WebKit can render fonts, sub-pixel rounding, sticky positioning, and scroll behaviors differently than Chromium. Visual or DOM-position assertions that are stable in Chrome can become flaky in WebKit, especially when:
- Assertions depend on exact element bounding boxes or pixel offsets.
- Tests click elements near edges where hit-testing differs.
- Layout shifts occur due to font loading differences.
Practical mitigation: prefer semantic assertions (role/text/state) over pixel-based checks, and ensure stable font loading (self-hosted fonts, deterministic network, or preloaded fonts) for critical pages.
Web API gaps and “same API, different behavior”
Safari can implement APIs with different edge-case behavior, especially around:
- WebRTC/media permissions and autoplay policies.
- Storage quotas and third-party cookie/storage partitioning behavior.
- Input events on iOS-like interactions (touch, focus/blur ordering).
That is why “runs on Chrome” is not a proxy for “works on Safari.”
How Playwright Supports Safari via WebKit
Playwright supports three browser engines: Chromium, Firefox, and WebKit. Safari coverage in Playwright is achieved by running tests against WebKit, not by driving the Safari application directly.
WebKit in Playwright vs real Safari
Playwright downloads and runs a WebKit build that is close to Safari’s engine behavior, but it is still not identical to the full Safari browser app with Apple’s proprietary integrations, Safari-specific UI features, and OS-level services.
Read More:What does webkit-text-size-adjust do
What this means in practice:
- WebKit automation is usually enough to catch most rendering and standards-related regressions that would appear in Safari.
- Some bugs reproduce only on specific Safari versions, specific macOS/iOS builds, or real devices where performance, memory pressure, and OS policies differ.
Read More: Enabling JavaScript on Safari
Prerequisites for Automating Safari Tests with Playwright
Before writing Safari-focused tests, ensure the setup supports deterministic runs and clear debugging.
Technical requirements
- Node.js LTS installed (recommended for Playwright Test workflows).
- Playwright installed with browsers, including WebKit.
- A test runner strategy that isolates state per test (Playwright Test does this by default via browser contexts).
Test design requirements
Safari reliability is less about “special Safari commands” and more about reducing nondeterminism:
- Avoid timing-based waits; rely on locator auto-wait and explicit conditions.
- Keep selectors resilient (role-based locators, test IDs).
- Control test data and environment to avoid backend-induced flakiness.
Setting Up Playwright for Safari (WebKit) Automation
Playwright’s WebKit runs on Windows, Linux, and macOS, so Safari-like coverage is possible even without a Mac.
Install Playwright and WebKit
npm init -y npm i -D @playwright/testnpx playwright install This downloads the browser engines, including WebKit.
npx playwright install
Create a baseline test
// tests/safari-smoke.spec.tsimport { test, expect } from “@playwright/test”;
test(“homepage loads and primary CTA is visible (WebKit)”, async ({ page }) => {
await page.goto(“https://example.com”, { waitUntil: “domcontentloaded” });
await expect(page.getByRole(“heading”, { name: /example/i })).toBeVisible();
});Configure projects so WebKit runs consistentlyPlaywright Test supports projects so the same suite can run across engines.
// playwright.config.tsimport { defineConfig, devices } from “@playwright/test”;
export default defineConfig({
testDir: “./tests”,
retries: process.env.CI ? 2 : 0,
use: {
trace: “on-first-retry”,
video: “retain-on-failure”,
screenshot: “only-on-failure”,
baseURL: “https://example.com”,
},
projects: [
{
name: “webkit-safari-like”,
use: {
…devices[“Desktop Safari”],
browserName: “webkit”,
},
},
],
});This makes the intent explicit: the project is WebKit with Safari-like device settings. (Device presets are supported by Playwright’s emulation options.)Running Playwright Tests on Safari Locally
Once configured, WebKit runs the same way as other engines.
Run only WebKit
npx playwright test –project=webkit-safari-likeRun headed for debugging UI issues
npx playwright test –project=webkit-safari-like –headedRun with Playwright UI mode when iterating
npx playwright test –project=webkit-safari-like –uiLocal WebKit is excellent for catching many Safari-type issues early, especially CSS/layout, event ordering, and common Web API differences.
Read More: Devtools for Safari Mobile View in 2026
Key Limitations of Local Safari Testing
Local WebKit testing is valuable, but it does not fully replace real Safari on real Apple hardware.
1. WebKit is not the Safari application
Playwright WebKit does not execute tests in the actual Safari browser app. It automates a WebKit build that approximates Safari behavior.
Practical impact:
- Some Safari-only bugs tied to Safari’s versioning and OS integrations may not reproduce.
- Debugging parity can vary because Safari’s tooling and protocols differ from Chromium-based ecosystems.
2. iOS Safari behavior is strongly device- and OS-dependent
Playwright includes Mobile Safari emulation capabilities (viewport, UA, touch), but emulation is not the same as real iOS Safari constraints like memory pressure, backgrounding, and real network conditions.
If Safari reliability on iPhones/iPads is the primary risk, real device coverage becomes important.
Handling Safari-Specific WebKit Issues in Playwright
Safari automation gets stable faster when tests explicitly account for WebKit patterns.
Stabilize interactions with explicit, state-based assertions
Clicks that “work in Chrome” can fail in WebKit if an element is overlapped, animating, or not truly clickable.
import { test, expect } from “@playwright/test”;test(“submit flow waits for button to be enabled”, async ({ page }) => {
await page.goto(“/checkout”);
const payButton = page.getByRole(“button”, { name: “Pay” });
await expect(payButton).toBeVisible();
await expect(payButton).toBeEnabled(); // avoids racing transitions
await payButton.click();
await expect(page.getByText(“Payment confirmed”)).toBeVisible();
});Why this helps on WebKit:- It avoids “timing clicks” during transitions.
- It ensures the element is interactable, not just present in the DOM.
Prefer role-based locators over brittle CSS
Safari layout differences can affect DOM structure or pseudo-elements in ways that break strict CSS-based assumptions. Role-based locators are more resilient when markup shifts slightly.
const email = page.getByRole(“textbox”, { name: “Email” });await email.fill(“qa@example.com”);Use storageState to avoid fragile login flowsAuthentication flows are frequent sources of Safari flakiness due to redirects, cookie policies, and third-party storage behavior. Store authenticated state once and reuse it.
// tests/auth.setup.tsimport { test as setup } from “@playwright/test”;
setup(“authenticate”, async ({ page }) => {
await page.goto(“/login”);
await page.getByLabel(“Email”).fill(process.env.E2E_USER!);
await page.getByLabel(“Password”).fill(process.env.E2E_PASS!);
await page.getByRole(“button”, { name: “Sign in” }).click();
await page.waitForURL(“**/dashboard”);
await page.context().storageState({ path: “storageState.json” });
});
// playwright.config.ts (snippet)
export default defineConfig({
projects: [
{
name: “webkit-safari-like”,
use: {
browserName: “webkit”,
storageState: “storageState.json”,
},
},
],
});This reduces test time and avoids repeating sensitive, failure-prone login steps.Common Safari Automation Challenges and Fixes
These are common WebKit/Safari pain points and specific Playwright tactics to address them.
1. Challenge: flaky navigation waits after clicks
Symptom: page.waitForNavigation() intermittently times out in WebKit.
Fix: wait for a deterministic outcome (URL pattern, element presence) instead of raw navigation.
await page.getByRole(“link”, { name: “Orders” }).click();await page.waitForURL(“**/orders”);
await expect(page.getByRole(“heading”, { name: “Orders” })).toBeVisible();2. Challenge: file uploads behave inconsistentlyFix: use Playwright’s setInputFiles on the rather than OS dialogs.
const upload = page.locator(‘input[type=”file”]’);await upload.setInputFiles(“fixtures/invoice.pdf”);3. Challenge: date pickers and custom inputs differ in WebKit
await expect(page.getByText(“invoice.pdf”)).toBeVisible();
Fix: avoid clicking pixel-perfect calendar UIs where possible. Prefer filling the underlying input value if the control supports it, and then trigger change/blur.
const date = page.getByLabel(“Start date”);await date.fill(“2026-01-13”);4. Challenge: WebKit-only visual regressions
await date.blur();
await expect(page.getByText(“Jan 13, 2026”)).toBeVisible();
Fix: keep UI assertions semantic and add targeted screenshots only for critical components, not every page.
await expect(page.locator(“[data-testid=pricing-card]”)) .toHaveScreenshot(“pricing-card-webkit.png”);
Writing Reliable Playwright Tests for Safari
Safari stability comes from making tests deterministic and isolating state.
Build a Safari-focused smoke suite
Not every test must run on WebKit. A practical pattern is:
- Run the full suite on Chromium for speed and breadth.
- Run a curated smoke/regression suite on WebKit covering the highest-risk flows: auth, checkout, media, complex layouts, and critical forms.
In Playwright Test, that can be done using tags:
import { test } from “@playwright/test”;test.describe(“critical @safari”, () => {
test(“checkout works”, async ({ page }) => {
// critical Safari coverage here
});
});Then run only Safari-tagged tests on WebKit:npx playwright test –project=webkit-safari-like -g “@safari”Keep timeouts intentional, not inflated
Increasing global timeouts often hides WebKit performance issues rather than fixing them. Better approach:
- Keep reasonable global timeouts.
- Add targeted waits for known slow points (first load, heavy API calls).
- Capture traces on retry to diagnose timing variance.
Playwright’s use options support trace/video/screenshot collection to debug failures faster.
Debugging and Tracing Safari Tests in Playwright
Safari-like debugging is easiest when failures produce artifacts that explain “what happened,” not just “it failed.”
Use traces for WebKit failures
Enable tracing on first retry in config (already shown). When a test flakes in WebKit, the trace typically reveals:
- Whether the click happened
- Whether the page navigated
- Whether the target element ever became visible/enabled
- Network timings around the failure
Capture WebKit console and page errors
test(“captures browser errors”, async ({ page }) => { const errors: string[] = [];
page.on(“pageerror”, (err) => errors.push(err.message));
page.on(“console”, (msg) => {
if (msg.type() === “error”) errors.push(msg.text());
});
await page.goto(“/settings”);
// assertions…
});This is especially useful for WebKit-only runtime errors caused by unsupported APIs or subtle differences.Cross-Browser Validation: Safari vs Chromium and Firefox
Playwright projects allow running the same test logic across engines, which is valuable for isolating whether a failure is:
- WebKit-specific behavior
- A real product bug exposed by stricter handling
- Test design that assumes Chromium issues
Playwright supports running tests on Chromium, Firefox, and WebKit using one API and project configuration.
A practical approach:
- If a test fails only on WebKit, check selectors, event timing, and web API usage.
- If it fails on all engines, it is likely a product regression or unstable test data.
Run Playwright tests on real Safari, Chromium, and Firefox with BrowserStack Automate to quickly identify cross-browser issues and ensure consistent performance across all major browsers.
Running Playwright Safari Tests in CI Pipelines
CI introduces constraints that make WebKit failures more likely if the pipeline is not tuned.
Use retries strategically and collect artifacts
For CI:
- Set retries: 2 for WebKit projects.
- Collect trace/video/screenshot on failure or retry.
- Ensure the CI machine has sufficient resources; WebKit can be more sensitive to CPU throttling.
Split WebKit into a separate CI job
Keep feedback fast:
- Job 1: Chromium full suite (fast, broad).
- Job 2: WebKit Safari smoke suite (targeted, high-signal).
- Optional Job 3: nightly WebKit expanded suite.
This structure prevents WebKit execution time from slowing every PR while still catching Safari regressions early.
Why Safari Tests Often Fail in CI Environments
Safari-like runs are often stable locally but flaky in CI for predictable reasons.
1. Resource contention and timing variance
CI machines often have:
- Shared CPU
- Limited memory
- No GPU acceleration
- Background processes competing for resources
Symptoms: elements not ready when expected, animations slower, and timeouts that occur only under load.
Fixes:
- Reduce parallelism for WebKit jobs.
- Avoid global slowMo; instead, add explicit state-based waits.
- Prefer waitForURL / expect(locator).toBeVisible() over navigation waits.
2. Differences between headless and headed behavior
If WebKit is run headless in CI but headed locally, subtle behavior differences can appear. Running headed for debugging (or reproducing the same mode locally as CI) reduces churn.
Best Practices for Stable Safari Automation with Playwright
Safari stability improves when test engineering decisions assume WebKit will behave differently.
- Use Playwright Test projects to isolate WebKit and keep configuration explicit.
- Keep selectors semantic: roles, labels, and stable test IDs.
- Avoid pixel-perfect assertions except where necessary; prefer state and content assertions.
- Stabilize navigation by asserting outcomes (URL + visible landmarks).
- Minimize repeated logins using storageState.
- Collect traces on retry to debug WebKit-only failures quickly.
- Run a curated Safari smoke suite on every PR, and expand coverage nightly.
Why Automate Safari Playwright Tests with BrowserStack?
Running Safari Playwright tests on BrowserStack solves gaps that local WebKit execution and CI machines cannot fully cover. The value is not just scale, but accuracy and confidence in real Safari behavior.
- Real Safari, not just WebKit: Executes tests on actual Safari browsers instead of WebKit builds, exposing Safari-only bugs tied to Apple’s browser and OS behavior.
- Safari version accuracy: Validates Playwright tests across real Safari and macOS versions to catch issues caused by Safari version fragmentation.
- Stable CI execution: Eliminates flaky Safari failures caused by shared CI resources, headless differences, and timing variance.
- True iOS Safari coverage: Runs Playwright tests on real iPhones and iPads where mobile Safari behavior differs from desktop and emulation.
- Actionable debugging artifacts: Provides videos, logs, and screenshots to diagnose Safari-specific failures without local reproduction.
- No macOS infrastructure overhead: Removes the need to maintain Mac hardware while enabling parallel Safari test execution at scale.
- Higher production confidence: Ensures Safari regressions are caught before release, not after real users hit them.
Conclusion
Automating tests on Safari using Playwright is practical when the goal is to catch WebKit-specific regressions early and keep cross-browser confidence high. WebKit projects, resilient locators, deterministic waits, and trace-driven debugging eliminate most Safari flakiness. For coverage that must match real users on real Apple hardware and specific Safari versions, running Playwright on BrowserStack Automate adds the final layer of confidence where local WebKit alone can fall short.



