Home Guide How to find broken links in Selenium

How to find broken links in Selenium

Shreya Bose, Technical Content Writer at BrowserStack -

Before discussing how to find broken links in Selenium, let’s address a more fundamental question.

What are Broken Links?

To start with, a link is an HTML object that enables users to migrate from one web page to another when they click on it. It is a means to navigate between different web pages on the internet.

A broken link, also often called a dead link, is one that does not work i.e. does not redirect to the webpage it is meant to. This usually occurs because the website or particular web page is down or does not exist. When someone clicks on a broken link, an error message is displayed.

Broken links may exist due to some kind of server error, which, in turn, causes the corresponding page to malfunction and not be displayed. A valid URL will have a 2xx HTTP status code. Broken links, which are essentially invalid HTTP requests have 4xx and 5xx status code.

The 4xx status code refers to a client-side error, while the 5xx status code usually points to a server response error.

Why check for Broken Links?

If a user clicks on a broken link, they will be directed to an error page. This obviously contributed to bad user experience. Broken links defeat the purpose of having the website in the first place because users cannot find the information or service they are looking for.

Every link on a website must be tested to ensure that it is functioning as expected. However, given that most websites have hundreds (sometimes, thousands) of links required to make them work, manual testing of each link would require excessive amounts of time, effort, and resources. Moreover, with automated Selenium testing being an option, it would be completely unnecessary.

Reasons for Broken Links

  • 404 Page Not Found – The destination web page has been removed by the owner
  • 400 Bad Request – The server cannot process the HTTP request triggered by the link because the URL address requested is wrong
  • Due to the user’s firewall settings, the browser cannot access the destination web page
  • The link is misspelled

How to check broken links with Selenium WebDriver?

Use the following steps to identify broken links in Selenium

  1. Collect all the links present on a web page based on the <a> tag
  2. Send HTTP request for each link
  3. Verify the HTTP response code
  4. Determine if the link is valid or broken based on the HTTP response code.
  5. Repeat the process for all links captured with the first step

If you’re wondering how to find broken images using Selenium WebDriver, use the same process.

Code Snippet for Finding Broken Links in Selenium : Example

package automationPractice;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Iterator;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class BrokenLinks {

private static WebDriver driver = null;

public static void main(String[] args) {
// TODO Auto-generated method stub

String homePage = "http://www.zlti.com";
String url = "";
HttpURLConnection huc = null;
int respCode = 200;

driver = new ChromeDriver();

driver.manage().window().maximize();

driver.get(homePage);

List<WebElement> links = driver.findElements(By.tagName("a"));

Iterator<WebElement> it = links.iterator();

while(it.hasNext()){

url = it.next().getAttribute("href");

System.out.println(url);

if(url == null || url.isEmpty()){
System.out.println("URL is either not configured for anchor tag or it is empty");
continue;
}

if(!url.startsWith(homePage)){
System.out.println("URL belongs to another domain, skipping it.");
continue;
}

try {
huc = (HttpURLConnection)(new URL(url).openConnection());

huc.setRequestMethod("HEAD");

huc.connect();

respCode = huc.getResponseCode();

if(respCode >= 400){
System.out.println(url+" is a broken link");
}
else{
System.out.println(url+" is a valid link");
}

} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

driver.quit();

}
}

Run Selenium Tests on Real Devices for Free

Let’s go through the code for a closer understanding of its functionality.

1. Import Packages

Import the package below along with default packages:

import java.net.HttpURLConnection;

The methods in this package allow the tester to send HTTP requests and capture HTTP response codes when they are returned by the code.

2. Collect all links in the web page

Find all the links on the webpage and place them in a list:

List<WebElement> links = driver.findElements(By.tagName("a"));

Obtain Iterator to move through the list of links:

Iterator<WebElement> it = links.iterator();

3: Identify and Validate URLs

This step is about checking if a certain URL belongs to a third-party domain or if it is empty/null.

The code below will retrieve the href of the anchor tag and store it in the URL variable.

url = it.next().getAttribute("href");

If the URL is null or Empty, skip the steps after this.

if(url == null || url.isEmpty()){
System.out.println("URL is either not configured for anchor tag or it is empty");
continue;
}

If the URL belongs to the main domain, continue. If it belongs to a third party domain, skip the steps after this.

if(!url.startsWith(homePage)){
System.out.println("URL belongs to another domain, skipping it.");
continue;
}

4. Send HTTP request

Methods in the HttpURLConnection class will send HTTP requests and capture the HTTP response code. Therefore, the output of openConnection() method (URLConnection) is type casted to HttpURLConnection.

huc = (HttpURLConnection)(new URL(url).openConnection());

If testers set Request type as “HEAD” instead of “GET”, only headers will be returned, not the document body.

huc.setRequestMethod("HEAD");

When the tester invokes the connect() method, the actual connection to the URL is established and the HTTP request is sent.

huc.connect();

5. Validate Links

Use the getResponseCode() method to get the HTTP response code for the previously sent HTTP request.

respCode = huc.getResponseCode();

Check link status (broken or not) based on the response code

if(respCode >= 400){
System.out.println(url+" is a broken link");
}
else{
System.out.println(url+" is a valid link");
}

Checking and fixing broken links is an integral part of website development and testing. By using the method described in this article, testers can identify malfunctioning links quickly and correctly. Allowing broken links to pass into the production stage would severely damage user experience, and must be prevented with extreme thoroughness. This is where Selenium becomes a tester’s best friend.

BrowserStack Logo Run Selenium Tests on 2000+ Browsers & Devices Get Started Free