How to find broken links in Selenium
Shreya Bose, Technical Content Writer at BrowserStack - February 2, 2023
Before discussing how to find broken links using Selenium WebDriver, let’s address a more fundamental question.
What are Broken Links?
To start with, a link is an HTML object that enables users to migrate from one web page to another when they click on it. It is a means to navigate between different web pages on the internet.
A broken link, also often called a dead link, is one that does not work i.e. does not redirect to the webpage it is meant to. This usually occurs because the website or particular web page is down or does not exist. When someone clicks on a broken link, an error message is displayed.
Broken links may exist due to some kind of server error, which, in turn, causes the corresponding page to malfunction and not be displayed. A valid URL will have a 2xx HTTP status code. Broken links, which are essentially invalid HTTP requests have 4xx and 5xx status codes.
The 4xx status code refers to a client-side error, while the 5xx status code usually points to a server response error.
Read More: What is Browser Automation?
HTTP Status Codes for Broken Links
HTTP Status Code | Definition |
---|---|
400 (Bad Request) | Server unable to process request as URL is incorrect |
400 (Bad Request – Bad Host) | Server unable to process request as host name is invalid |
400 (Bad Request – Bad URL): | Server cannot process request as the URL is of incorrect format; missing characters like brackets, slashes, etc. |
400 (Bad Request – Empty) | Response returned by the server is empty with no content & no response code |
400 (Bad Request – Timeout) | HTTP requests have timed out |
400 (Bad Request – Reset) | Server is unable to process the request, as it is busy processing other requests or has been misconfigured by site owner |
404 (Page Not Found) | Page is not available on the server |
403 (Forbidden) | Server refuses to fulfill the request as authorization is required |
410 (Gone) | Page is gone. This code is more permanent than 404 |
408 (Request Time Out) | Server has timed-out waiting for the request. |
503 (Service Unavailable) | Server is temporarily overloaded and cannot process the request |
Why check for Broken Links in Selenium?
If a user clicks on a broken link, they will be directed to an error page. This obviously contributes to sub-par user experience. Broken links defeat the purpose of having the website in the first place because users cannot find the information or service they are looking for.
Every link on a website must be tested to ensure that it is functioning as expected. However, given that most websites have hundreds (sometimes thousands) of links required to make them work, manual testing of each link would require excessive amounts of time, effort, and resources. Moreover, with automated Selenium testing being an option, it would be completely unnecessary.
Read More: How to take Screenshots in Selenium
Common Reasons for Broken Links
- 404 Page Not Found – The destination web page has been removed by the owner
- 400 Bad Request – The server cannot process the HTTP request triggered by the link because the URL address requested is wrong
- Due to the user’s firewall settings, the browser cannot access the destination web page
- The link is misspelled
How to identify broken links in Selenium WebDriver
To check broken links in Selenium, the process is simple. On a web page, hyperlinks are implemented using the HTML Anchor (<a>) tag. All the script needs to do is to locate every anchor tag on a web page, get the corresponding URLs, and run through the links to check if any of them are broken.
Use the following steps to identify broken links in Selenium
- Collect all the links present on a web page based on the <a> tag
- Send HTTP request for each link
- Verify the HTTP response code
- Determine if the link is valid or broken based on the HTTP response code
- Repeat the process for all links captured with the first step
If you’re wondering how to find broken images using Selenium WebDriver, use the same process.
Read More: How to perform Double Click in Selenium
Finding Broken Links in Selenium: Example
package automationPractice; import java.io.IOException; import java.net.HttpURLConnection; import java.net.MalformedURLException; import java.net.URL; import java.util.Iterator; import java.util.List; import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; public class BrokenLinks { private static WebDriver driver = null; public static void main(String[] args) { // TODO Auto-generated method stub String homePage = "http://www.zlti.com"; String url = ""; HttpURLConnection huc = null; int respCode = 200; driver = new ChromeDriver(); driver.manage().window().maximize(); driver.get(homePage); List<WebElement> links = driver.findElements(By.tagName("a")); Iterator<WebElement> it = links.iterator(); while(it.hasNext()){ url = it.next().getAttribute("href"); System.out.println(url); if(url == null || url.isEmpty()){ System.out.println("URL is either not configured for anchor tag or it is empty"); continue; } if(!url.startsWith(homePage)){ System.out.println("URL belongs to another domain, skipping it."); continue; } try { huc = (HttpURLConnection)(new URL(url).openConnection()); huc.setRequestMethod("HEAD"); huc.connect(); respCode = huc.getResponseCode(); if(respCode >= 400){ System.out.println(url+" is a broken link"); } else{ System.out.println(url+" is a valid link"); } } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } driver.quit(); } }
Run Selenium Tests on Real Devices for Free
Let’s go through the code for a closer understanding of its functionality.
1. Import Packages
Import the package below along with default packages:
import java.net.HttpURLConnection;
The methods in this package allow the tester to send HTTP requests and capture HTTP response codes when they are returned by the code.
2. Collect all links on the web page
Find all the links on the webpage and place them in a list:
List<WebElement> links = driver.findElements(By.tagName("a"));
Obtain Iterator to move through the list of links:
Iterator<WebElement> it = links.iterator();
3: Identify and Validate URLs
This step is about checking if a certain URL belongs to a third-party domain or if it is empty/null.
The code below will retrieve the href of the anchor tag and store it in the URL variable.
url = it.next().getAttribute("href");
If the URL is null or Empty, skip the steps after this.
if(url == null || url.isEmpty()){ System.out.println("URL is either not configured for anchor tag or it is empty"); continue; }
If the URL belongs to the main domain, continue. If it belongs to a third party domain, skip the steps after this.
if(!url.startsWith(homePage)){ System.out.println("URL belongs to another domain, skipping it."); continue; }
4. Send HTTP request
Methods in the HttpURLConnection class will send HTTP requests and capture the HTTP response code. Therefore, the output of openConnection() method (URLConnection) is type casted to HttpURLConnection.
huc = (HttpURLConnection)(new URL(url).openConnection());
If testers set Request type as “HEAD” instead of “GET”, only headers will be returned, not the document body.
huc.setRequestMethod("HEAD");
When the tester invokes the connect() method, the actual connection to the URL is established and the HTTP request is sent.
huc.connect();
5. Validate Links
Use the getResponseCode() method to get the HTTP response code for the previously sent HTTP request.
respCode = huc.getResponseCode();
Check link status (broken or not) based on the response code
if(respCode >= 400){ System.out.println(url+" is a broken link"); } else{ System.out.println(url+" is a valid link"); }
Finding broken links in Selenium is an integral part of website development and testing. By using the method described in this article, testers can identify malfunctioning links quickly and correctly. Allowing broken links to pass into the production stage would severely damage the user experience and must be prevented with extreme thoroughness. This is why knowing how to test broken links in Selenium is an important part of a tester’s toolkit.
Follow-up Read: How to handle Multiple Tabs in Selenium