Webinar: How weather.com achieves quality at speed with BrowserStack

You can read about BrowserStack's privacy policy here.

weather.com is accessed by millions of people each day, so it is critical for the team to ensure each customer gets the best experience possible. For this to happen, the team must check that releases work well across their entire user base, regardless of browser, device or OS used.

Todd Eaton, Head of DevOps and Web Operations at IBM & Weather, and Vikas Thange, Lead SDET at Xpanxion LLC, hosted a webinar to share how the team achieves quality at speed.

The duo explain that the team used to have large, infrequent releases with an unacceptable number of failures. They didn't have enough test coverage to catch all failures, and they weren't releasing fast enough.

Todd and Vikas describe how they revamped their test-and-release process to increase test coverage, reduce test failure rate, and release multiple times a day. This resulted in 90% automated testing (up from 30%) and a hybrid framework for high-level testing that is now a benchmark in the industry.

Along the way, they were asked questions about weather.com's test-and-release process, automations, regressions, and best practices. Here's a roundup of their answers:

What are some best practices for manual teams moving to automation?

You need a robust, reliable automation framework which allows fast adoption and early success. While this requires a huge investment for initial development, in the end, it's worth it.
You need the approval and willingness of all stakeholders involved, for migrating manual testing to complete automation.

At what stage do you start writing automation tests for new features?

Right when the app is in development, you need to start writing automation tests for any new features. You shouldn't consider development as 'done' until all automated tests are in place.

What are the steps and best practices we need to follow before code merge?

Step 1: QA tests a functionality locally and works with the developer for fixes.
Step 2: The developer runs unit tests along with any local tests, including security, accessibility and performance tests.
Step 3: The developer builds a PR and submits it for a code review.
Step 4: A code reviewer reviews the PR. They either comment on the PR and send it back to the developer, or approve it and move the Jira ticket to 'Dev complete'.

What does your QA and release tech stack look like?

Our web products are React and Angular apps running in Kubernetes. Because of this, we are able to spin up and down environments to run tests on containers that look like production.

We use a hybrid test automation framework, where the framework is decoupled from all test case management work. Test cases are written, maintained, and executed in the test management system with historic runs & results.

Our hybrid framework is based on Selenium/Java. We run this against browsers or devices, either in our private internal grid or on BrowserStack.

We use Jenkins to pass parameters and run tests. These tests can be called individually, scheduled for daily runs, or called from the build/deploy pipleline.

We also perform visual regression testing with a custom visual regression testing framework with the help of WebDriver and AShot; performance testing with WebPageTest for page load and Jmeter for load testing, IBM accessibility scanning tools for accessibility, and OWASP ZAP for security testing.

Our release stack uses Jenkins (Groovy) hosted on IBM Cloud with a Kubernetes cluster to pull artifacts from Artifactory and use build and deploy piplelines. These build and deploy pipelines are built by the Web DevOps team. Steps and additional test calls or approvals are provided by the Dev and QA teams.

How many test cases do you run in your automation suite? How many tests do you execute in parallel?

The number of tests vary depending on which application and when the execution takes place. We execute smoke tests for every release. Each takes approximately one hour to run. We also run nightly regressions (for about 5 hours) and weekend regressions (for 12+ hours).

For desktop-based tests, we run 70 test cases in parallel. For mobile, it's about 30.

How many devices do you test your regression suite on?

We use analytics reports to decide which devices to test on. Generally, we test the last 2-3 versions of Android and iPhone devices in regression. In case our reports show any other devices that warrant testing, we adjust our list accordingly.

What's the best setup to bring down release failures? What parameters should we consider while setting up the process?

There is no best setup that is applicable to everyone. What we did works for us, but may not work for others because of how we work.

With that in mind, there are things you can do to bring down your release failures. The first is to perform extensive testing prior to code merge. Developers and testers should test the code with their various tools and ensure code is working as expected, prior to merging.

The next is to write automated tests, so when code is ready to be merged, you can test it in an integrated environment on your supported browsers and devices. Your tests need to be flexible enough to run on multiple devices, and if the tests are not, you risk releasing bugs or delaying the process.

Lastly, you should have a robust regression set and help from developers to identify impact areas you can review prior to releasing. Running a large set of tests can give a lot of false positives, and you need to have the ability to weed through the noise to identify the important failures—and how they affect your application.

How do we scale from monthly releases to daily releases?

This is not something you can do right away. At weather.com, we went from twice monthly to weekly releases, then from 2-3 times weekly to daily, and finally to multiple times a day. You need to ensure your release process is solid enough to find bugs properly.

Beyond that, we had to convince management that our test-and-release process would protect us from downtime—and that when failures did occur, we could find out why something failed and how to mitigate it.

Overall, we had to gradually get people to trust the process and prove to them that it was doable.

How did you gain confidence in automation?

We have not automated all processes. We still do some manual exploratory testing and have a manual approval process for deploying to production.

But, to increase confidence, we did a couple of things:

Organize review meetings with the entire QA team and product owner, for test cases developed for each new feature.
Ensure the test case has been vetted thoroughly for failures.

Did you encounter any pushback while making these changes? How did you convince your stakeholders to get on board?

We didn't get much pushback, but our stakeholders wanted to ensure we had enough controls in place to mitigate risk.

We started out with easy wins (like using a detailed checklist and showing that each process in the checklist was automated), and when our stakeholders felt more comfortable automating processes, we increased velocity.

We also increased communication about the releases to a wider audience, over multiple channels. This way, stakeholders could stay informed about any changes made.

We were very transparent about our release failures, and this increased the stakeholders' confidence in the process—and got a lot more on board.

Does org structure affect the testing process?

In our case, the org structure did not affect the testing process. This could be because we were given autonomy for the QA group even though we reported to different managers (QA, Dev, Operations).

Testing weather.com: Quality at speed with Jenkins and BrowserStack