Architecture of Selenium WebDriver
By Sonal Dwivedi, Community Contributor - September 4, 2023
Testing is an indispensable part of Software Development and no product can be launched in the market without proper testing. As we live in the fastest growing era and a competitive market, developing and releasing the product should be expeditious. This reasons the need for automation testing. And when we are talking about web automation, Selenium is one of the popular open-source test automation frameworks used for web automation testing across the industry.
Selenium History
In 2004, Jason Huggins a software Engineer at ThoughtWorks created a program using JavaScript which was called as JavaScriptRunner to test web applications through script. It gained momentum in the testing community and later they made it open-source and renamed it as Selenium Core.
It allowed developers to automate web browsers by executing JavaScript commands directly in the browser. Later in 2006, Selenium Core got upgraded to Selenium Remote Control (Selenium RC) or Selenium 1. Selenium RC introduced a server component which acted as a proxy between the test script and the browser. This enabled cross-browser testing and supported multiple programming languages.
In 2009, Simon Stewart (then at Google) created a new cross platform library called WebDriver to automate browser testing. It was designed to overcome the complexities in Selenium RC and provide a simple and consistent interface by using native browser automation APIs rather than JavaScript injection.
In 2011, Selenium RC and Selenium WebDriver were combined to form Selenium 2 and over the years Selenium has gone into major updates and Selenium 3 got introduced in 2016 with bug fixes, security enhancements and support for modern browsers. Selenium 4 is the latest release with several new features and enhancements from previous versions and is fully W3C compliant.
What is Selenium WebDriver
Selenium WebDriver is a popular open-source library and a key component of the Selenium automation framework used to automate testing for web applications. It is a collection of APIs which leverages a programming interface for developers and testers to write scripts in various programming languages such as Java, JavaScript, C#, Python, etc. to automate web browser’s action and retrieve information from web pages.
Through test scripts, WebDriver simulates user actions, navigates through web pages, interacts with elements (such as button, text, dropdown menu, forms, links, etc), submit forms, perform validations, assertions and many more.
As per Selenium document “WebDriver drives a browser natively, as a user would, either locally or on a remote machine using the Selenium server, marks a leap forward in terms of.”
Let us first understand Selenium 3 architecture before Selenium 4, which will aid in relating how Selenium 4 has more supremacy over the previous one.
Architecture of Selenium WebDriver (Selenium 3)
Selenium WebDriver Architecture is made up of four major components:
- Selenium Client library: Selenium provides support to multiple libraries such as Ruby, Python, Java, etc as language bindings
- JSON wire protocol over HTTP: JSON is an acronym for JavaScript Object Notation. It is an open standard that provides a transport mechanism for transferring data between client and server on the web.
- Browser Drivers: Selenium browser drivers are native to each browser, interacting with the browser by establishing a secure connection. Selenium supports different browser drivers such as ChromeDriver, GeckoDriver, Microsoft Edge WebDriver, SafariDriver, and InternetExplorerDriver.
- Browsers: Selenium provides support for multiple browsers like Chrome, Firefox, Safari, Internet Explorer etc.
Below diagram depicts Selenium 3 WebDriver Architecture:
Selenium 3 Architecture
In Selenium 3, there is no direct communication between the client libraries (Java, Python, JavaScript, etc) and the browser drivers. The server (browser drivers) does not understand language but only the protocols and on the other hand, client libraries does not understand protocols used by browser drivers.
Therefore, JSON Wire protocol is being used as a mediator between client and server to encode and decode the requests and responses made by client and server respectively. This results in limited browser interaction, inefficient communication and lack of standardization which ultimately led to flaky test and slower test execution.
Read More: How to upgrade from Selenium 3 to Selenium 4
Architecture of Selenium 4 WebDriver
The architecture of Selenium 4 is similar to Selenium 3, however it uses W3C protocol instead of JSON wire protocol for communication between Client Libraries and Browser Drivers.
Below diagram depicts Selenium 4 WebDriver architecture:
Selenium 4 Architecture
WebDriver in Selenium 4 is fully W3C compliant!
Now what does this mean? So, let us first understand what W3C is.
W3C stands for the World Wide Web Consortium, an international community that develops and maintains standards and guidelines for the World Wide Web. The main aim of the W3C is to ensure the long-term growth and interoperability of the Web.
It creates open standards and specifications that promote compatibility and consistency across various web technologies and platforms. And when we say Selenium 4 is W3C compliant it states that Selenium adheres to the standards and specifications laid by the W3C for web automation.
All the browsers and the browser drivers in Selenium architecture follow W3C, except Selenium 3 WebDriver. And hence, JSON Wire Protocol is used to encode and decode the requests and responses. Selenium 4 WebDriver was made W3C compliant to make the communication easy and direct between the client libraries and the browser drivers. Improved communication led to more stability.
This has also enhanced browser compatibility, performance and efficiency as there is no overhead of HTTP requests and responses for communication between the WebDriver client and the browser driver. Instead, WebDriver now utilises native browser communication channels and protocols.
Also Read: Selenium 3 vs Selenium 4: Core Differences
Following pointers would help to understand the communication between client and server using WebDriver protocol:
- A command execution request is sent by the Selenium client (test script written in any language such as Java, Python, JavaScript, etc) to perform various actions on the browser such as navigating to a URL, interacting with the elements, executing code and so on.
- The WebDriver client serialises the request into a standardised format specified by the WebDriver protocol. This format can be JSON or a similar format, depending on the specific implementation.
- The serialised request is transmitted to the browser driver which acts as a bridge between the WebDriver client and the Web browser.
- The browser driver processes the serialised request and then performs the necessary actions on the Web browser.
- Browser driver generates a response of the command execution which includes relevant data or information, such as the status and the success or failure status.
- The browser driver serialise the response into the standardised format by the WebDriver protocol and transmits it back to the client.
- The client receives the response from the browser driver and deserializes the response. It extracts the relevant information, and the client can use this information to verify success / failure of the command execution.
Read More: Selenium 4: Understanding Key Features
Difference between Architecture of Selenium 3 & Selenium 4
With the release of Selenium 4 there has been some significant differences between the Selenium 3 and 4 which are highlighted below:
1. Communication between client-server: Selenium 3 architecture uses JSON Wire protocol to transfer information from the client to the server over HTTP. This protocol is used to serialise and deserialize object’s data to JSON format and vice versa respectively. However, Selenium 4 has dropped the JSON Wire protocol to ensure direct communication between client and the server.
2. W3C compliant: Selenium 3 does not fully adhere to W3C guidelines whereas Selenium 4 is fully W3C compliant as it acts in accordance with the W3C standards and guidelines.
3. Selenium Grid: In Selenium Grid 3, testers are bound to start the hub and node jars every time they need to execute the test automation. On the contrary, in Selenium Grid 4, hub and node jars are packed in a single jar and it is not required for the testers to start it each time they need to execute the automation tests.
4. ChromeDriver: In Selenium 3 class ChromeDriver directly extended RemoteWebDriver class however in Selenium 4 ChromeDriver class extends ChromiumDriver.
5. Selenium IDE: Selenium IDE is a record and play tool which only supported the Firefox browser in Selenium 3. In Selenium 4, it supports Chrome browser along with Firefox. New Plug-in system, allows any browser to easily plug into the new Selenium IDE with its locator strategy and IDE plugin. It also allows parallel test execution and provides metrics on the total tests executed, as PASS/FAIL status.
6. Relative Locators: Relative Locators newly introduced in Selenium 4 allows locating elements located near to the location of other web elements on the page with the help of methods such as above(), below(), toLeftOf(), toRightOf(), near(). Selenium 3 lacked this feature.
Read More: Locators in Selenium: A Detailed Guide
7. ChromeDevTools Protocol (CDP): Selenium 3 has no support for ChromeDevTools Protocol. Selenium 4 supports CDP which provides access to a wide range of advanced browser debugging and automation capabilities. Testers can benefit from features such as DOM inspection, Performance profiling and network traffic analysis.
This article will help you to choose the best Selenium WebDriver version for your project. And it’s needless to say whichever version you choose, it is always wise to test on real devices and browsers for getting efficient results. BrowserStack’s Real Device Cloud platform facilitates 3000+ real devices and browsers to test under real conditions. It supports cross-browser testing, parallel testing and delivers a seamless user experience across browsers and devices.