Architecture of Selenium WebDriver
By Sonal Dwivedi, Community Contributor - September 4, 2023
Testing is an indispensable part of Software Development and no product can be launched in the market without proper testing. As we live in the fastest growing era and a competitive market, developing and releasing the product should be expeditious. This reasons the need for automation testing. And when we are talking about web automation, Selenium is one of the popular open-source test automation frameworks used for web automation testing across the industry.
In 2011, Selenium RC and Selenium WebDriver were combined to form Selenium 2 and over the years Selenium has gone into major updates and Selenium 3 got introduced in 2016 with bug fixes, security enhancements and support for modern browsers. Selenium 4 is the latest release with several new features and enhancements from previous versions and is fully W3C compliant.
What is Selenium WebDriver
Through test scripts, WebDriver simulates user actions, navigates through web pages, interacts with elements (such as button, text, dropdown menu, forms, links, etc), submit forms, perform validations, assertions and many more.
As per Selenium document “WebDriver drives a browser natively, as a user would, either locally or on a remote machine using the Selenium server, marks a leap forward in terms of.”
Let us first understand Selenium 3 architecture before Selenium 4, which will aid in relating how Selenium 4 has more supremacy over the previous one.
Architecture of Selenium WebDriver (Selenium 3)
Selenium WebDriver Architecture is made up of four major components:
- Selenium Client library: Selenium provides support to multiple libraries such as Ruby, Python, Java, etc as language bindings
- Browser Drivers: Selenium browser drivers are native to each browser, interacting with the browser by establishing a secure connection. Selenium supports different browser drivers such as ChromeDriver, GeckoDriver, Microsoft Edge WebDriver, SafariDriver, and InternetExplorerDriver.
- Browsers: Selenium provides support for multiple browsers like Chrome, Firefox, Safari, Internet Explorer etc.
Below diagram depicts Selenium 3 WebDriver Architecture:
Selenium 3 Architecture
Therefore, JSON Wire protocol is being used as a mediator between client and server to encode and decode the requests and responses made by client and server respectively. This results in limited browser interaction, inefficient communication and lack of standardization which ultimately led to flaky test and slower test execution.
Read More: How to upgrade from Selenium 3 to Selenium 4
Architecture of Selenium 4 WebDriver
The architecture of Selenium 4 is similar to Selenium 3, however it uses W3C protocol instead of JSON wire protocol for communication between Client Libraries and Browser Drivers.
Below diagram depicts Selenium 4 WebDriver architecture:
Selenium 4 Architecture
WebDriver in Selenium 4 is fully W3C compliant!
Now what does this mean? So, let us first understand what W3C is.
W3C stands for the World Wide Web Consortium, an international community that develops and maintains standards and guidelines for the World Wide Web. The main aim of the W3C is to ensure the long-term growth and interoperability of the Web.
It creates open standards and specifications that promote compatibility and consistency across various web technologies and platforms. And when we say Selenium 4 is W3C compliant it states that Selenium adheres to the standards and specifications laid by the W3C for web automation.
All the browsers and the browser drivers in Selenium architecture follow W3C, except Selenium 3 WebDriver. And hence, JSON Wire Protocol is used to encode and decode the requests and responses. Selenium 4 WebDriver was made W3C compliant to make the communication easy and direct between the client libraries and the browser drivers. Improved communication led to more stability.
This has also enhanced browser compatibility, performance and efficiency as there is no overhead of HTTP requests and responses for communication between the WebDriver client and the browser driver. Instead, WebDriver now utilises native browser communication channels and protocols.
Also Read: Selenium 3 vs Selenium 4: Core Differences
Following pointers would help to understand the communication between client and server using WebDriver protocol:
- The WebDriver client serialises the request into a standardised format specified by the WebDriver protocol. This format can be JSON or a similar format, depending on the specific implementation.
- The serialised request is transmitted to the browser driver which acts as a bridge between the WebDriver client and the Web browser.
- The browser driver processes the serialised request and then performs the necessary actions on the Web browser.
- Browser driver generates a response of the command execution which includes relevant data or information, such as the status and the success or failure status.
- The browser driver serialise the response into the standardised format by the WebDriver protocol and transmits it back to the client.
- The client receives the response from the browser driver and deserializes the response. It extracts the relevant information, and the client can use this information to verify success / failure of the command execution.
Read More: Selenium 4: Understanding Key Features
Difference between Architecture of Selenium 3 & Selenium 4
With the release of Selenium 4 there has been some significant differences between the Selenium 3 and 4 which are highlighted below:
1. Communication between client-server: Selenium 3 architecture uses JSON Wire protocol to transfer information from the client to the server over HTTP. This protocol is used to serialise and deserialize object’s data to JSON format and vice versa respectively. However, Selenium 4 has dropped the JSON Wire protocol to ensure direct communication between client and the server.
2. W3C compliant: Selenium 3 does not fully adhere to W3C guidelines whereas Selenium 4 is fully W3C compliant as it acts in accordance with the W3C standards and guidelines.
3. Selenium Grid: In Selenium Grid 3, testers are bound to start the hub and node jars every time they need to execute the test automation. On the contrary, in Selenium Grid 4, hub and node jars are packed in a single jar and it is not required for the testers to start it each time they need to execute the automation tests.
4. ChromeDriver: In Selenium 3 class ChromeDriver directly extended RemoteWebDriver class however in Selenium 4 ChromeDriver class extends ChromiumDriver.
5. Selenium IDE: Selenium IDE is a record and play tool which only supported the Firefox browser in Selenium 3. In Selenium 4, it supports Chrome browser along with Firefox. New Plug-in system, allows any browser to easily plug into the new Selenium IDE with its locator strategy and IDE plugin. It also allows parallel test execution and provides metrics on the total tests executed, as PASS/FAIL status.
6. Relative Locators: Relative Locators newly introduced in Selenium 4 allows locating elements located near to the location of other web elements on the page with the help of methods such as above(), below(), toLeftOf(), toRightOf(), near(). Selenium 3 lacked this feature.
Read More: Locators in Selenium: A Detailed Guide
7. ChromeDevTools Protocol (CDP): Selenium 3 has no support for ChromeDevTools Protocol. Selenium 4 supports CDP which provides access to a wide range of advanced browser debugging and automation capabilities. Testers can benefit from features such as DOM inspection, Performance profiling and network traffic analysis.
This article will help you to choose the best Selenium WebDriver version for your project. And it’s needless to say whichever version you choose, it is always wise to test on real devices and browsers for getting efficient results. BrowserStack’s Real Device Cloud platform facilitates 3000+ real devices and browsers to test under real conditions. It supports cross-browser testing, parallel testing and delivers a seamless user experience across browsers and devices.