Skip to main content
Experience faster, smarter testing with BrowserStack AI Agents. See what your workflow’s been missing. Explore now!
No Result Found
Connect & Get help from fellow developers on our Discord community. Ask the CommunityAsk the Community

Events and alerts in Automate Turboscale

Event Alerts provide real-time updates on the health of your self-hosted infrastructure. Use them to monitor for potential problems and quickly resolve issues that might disrupt your testing.

Event alerts provide real-time visibility into the health of your self-hosted grid. You can:

  • View real-time events when you open the page.
  • Review the last 7 days of event history for a cluster.
  • See impact and fix suggestions for each error or warning.
  • Surface cluster-level events in the build dashboard so teams notice issues while looking at tests.

By monitoring key infrastructure events, you can identify and resolve issues in a self-serve manner that might impact your test runs, often before they cause widespread failures.

This guide explains how to use Event Alerts to monitor your grid, understand event logs, and take corrective action.

Prerequisites

  • Events are collected only from clusters where the BrowserStack Agent is installed and running.
  • Install and configure the Agent on every cluster you want to monitor.
  • If the Agent is not installed, the Grid ManagementEvents page will show setup steps and no events will appear for that cluster.
  • You need access to Grid Management in the project.

Find events

Event alerts appear in two locations, giving you both a high-level overview and a build-specific view of your grid’s health.

Cluster-level alerts

Cluster-level alerts provide a centralized view of your entire grid’s health. You can find them in the Grid Management section. These events are captured continuously, even when no tests are running, helping you monitor the overall stability of your infrastructure.

  • Use this view to monitor the overall health of nodes and pods across your entire cluster.

  • This alert appears under Grid Management > Events.

Build-level alerts

Build-level alerts appear directly on the Automate Dashboard when an infrastructure event occurs that may have directly impacted a specific test build. This helps you quickly correlate a test failure with an underlying grid issue.

  • Use this view to quickly diagnose if a test failure was caused by a problem with the grid infrastructure.
  • This alert appears as a banner on the Build Details page.

View and filter events

To access and filter your cluster’s event logs:

  1. From the left-hand navigation menu, select Grid Management.

  2. Click the Events tab to open the event log view.

  3. By default, you will see events from the last hour. To change the time range, use the filter buttons:

    • 15m, 30m, 1H, 1D: Select a predefined time range.
    • Custom: Click the date field to open a calendar and select a custom date range. You can view logs for up to the last 7 days.

After following these steps, you’ll see a list of all cluster events that occurred within your selected time frame.

Understanding event details

Each event log provides details to help you diagnose the issue. For critical errors, we also provide suggested actions.

Here are some common build events you might encounter:

Event Type Cause and Impact Fix
NodeNotReady Cause: Kubelet stops heartbeats (network, crash, overload)
Impact: Pods unreachable, may be evicted
Check kubelet logs
Verify network
Restart kubelet
Check node CPU/memory
NodeHasDiskPressure Cause: Disk usage > 85%
Impact: Node unschedulable, pods evicted
Clean unused images
Remove old logs
Add disk space
Configure log rotation
Check large files
OOMKilling Cause: Kernel killed process (OOM)
Impact: Pod restart, disruption
Increase pod memory limits
Add node memory
Review memory usage
Enable swap
Optimize app memory usage
FailedScheduling Cause: Scheduler cannot place pod
Impact: Pod stuck Pending
Check requests vs capacity
Review affinity/taints
Add nodes
Adjust constraints
FailedCreatePodContainer Cause: Runtime failed to create container
Impact: Pod creation fails
Verify image exists
Check runtime health
Review security contexts
Check resources
Evicted Cause: Pod removed (resource pressure, policy)
Impact: Pod terminated, rescheduled
Reduce resource pressure
Review QoS/priority
Check eviction policies
Increase requests
BackoffLimitExceeded Cause: Job exceeded retries
Impact: Job marked failed
Increase backoffLimit
Fix app issues
Review job settings
Check resources
FailedRetrieveImagePullSecret Cause: Cannot access registry secret
Impact: Image pull fails
Verify secret exists
Check credentials
Review serviceAccount settings
Test registry auth
FailedCreatePodSandBox Cause: Runtime cannot create sandbox
Impact: Pod creation fails
Check runtime logs
Verify CNI
Review network config
Check resources

Here are some common cluster events you might encounter:

Event Type Cause and Impact Fix
KubeletIsDown Cause: Kubelet stopped/crashed
Impact: Node mgmt stops, pods unresponsive, node NotReady
Restart kubelet
Check logs
Verify API connectivity
Check system resources
PIDPressure Cause: PID limit exceeded
Impact: Node unschedulable, prevents processes
Increase PID limit
Restart nodes
Review workloads
Monitor PID usage
MemoryPressure Cause: Node memory exceeded threshold
Impact: Node unschedulable, pods evicted
Review requests/limits
Scale down workloads
Add memory
Enable swap
Monitor usage
FailedMount Cause: Volume mount failed
Impact: Pod startup fails
Verify volume
Check permissions
Review storage class
Check PV/PVC binding
WorkflowFailed Cause: Workflow step failed/timeout
Impact: Entire workflow failed
Check step logs
Review retry policies
Fix failing steps
Check timeouts/resources
NetworkNotReady Cause: CNI/plugin issue
Impact: Pod cannot communicate
Check CNI
Restart network daemon
Verify policies
Check node config
FailedToUpdateEndpointSlices Cause: Service controller failed
Impact: Service routing broken
Check controller logs
Verify RBAC
Restart controller
Review selector config
SyncLoadBalancerFailed Cause: Cloud LB sync failed
Impact: External access broken
Check cloud API
Review controller logs
Check LB config
Verify quotas
FailedAttachVolume Cause: Cannot attach PV
Impact: Pod cannot start
Check volume availability
Verify zone
Review storage class
Check limits
FilesystemIsReadOnly Cause: FS mounted read-only
Impact: Apps cannot write
Check corruption
Remount RW
Review options
Check storage health
InvalidDiskCapacity Cause: Requested disk invalid
Impact: Pod cannot start
Review limits
Check storage class
Verify cloud limits
Adjust PVC request
FailedToScaleUpGroup Cause: Autoscaler cannot add nodes
Impact: Pods pending
Check quotas
Verify autoscaler config
Review node group
Check subnet/security groups
ScaleUpTimedOut Cause: Provisioning too slow
Impact: Scaling aborted
Check provisioning time
Increase timeout
Check availability
Review config
KubeletServingCertificateInvalid Cause: Expired/misconfigured cert
Impact: Insecure API comms
Renew certs
Check validity
Restart kubelet
Verify CA chain
Drain Cause: Node being drained
Impact: Workloads moved
Monitor evacuation
Handle stuck pods
Verify redistribution
Check storage
ContainerRuntimeIsDown Cause: Runtime stopped/unresponsive
Impact: Node cannot manage containers
Restart runtime
Check logs
Verify resources
Review config

We're sorry to hear that. Please share your feedback so we can do better

Contact our Support team for immediate help while we work on improving our docs.

We're continuously improving our docs. We'd love to know what you liked





Thank you for your valuable feedback

Is this page helping you?

Yes
No

We're sorry to hear that. Please share your feedback so we can do better

Contact our Support team for immediate help while we work on improving our docs.

We're continuously improving our docs. We'd love to know what you liked





Thank you for your valuable feedback!

Talk to an Expert
Download Copy Check Circle