Skip to main content

Extract URLs from a sitemap

Learn how to extract URLs from a sitemap and then download it to a CSV file with the Extract sitemap utility

With Website Scanner, you can scan specific URLs, a large list of URLs, or even entire websites. Manually adding all URLs for an entire website can be impractical and time-consuming.

If you already have the list of URLs in a CSV file, you can directly upload the file.

If you do not have the list, utilize the Extract sitemap utility in Website Scanner to automatically extract all URLs from your sitemap, which you can then save to a CSV file. Then, upload the file for scanning.

How does Extract sitemap work?

You can access the Extract sitemap utility when setting up a website scan. It opens in a new tab within Accessibility Testing. To use the utility, just provide a valid URL, and then click a button to extract the URLs.

At present, the Extract sitemap utility extracts up to 10,000 URLs in a sitemap. If your sitemap contains more URLs, those are ignored.

What is a valid URL?

Any one of the following is a valid URL:

  • XML sitemap URL (example.com/sitemap.xml)
  • Domain URL (example.com)
  • Subdomain URL (subdomain.example.com)
  • File path (example.com/subfolder)
  • Any page in the domain or subdomain (example.com/page.html)

When you provide the sitemap URL, the utility directly extracts all URLs in the sitemap.

When you provide any of the other valid URLs, the utility:

  1. Identifies the domain or the subdomain URL
  2. For the given domain or subdomain, locates the robots.txt file
  3. Uses the sitemap details in robots.txt to navigate to the XML sitemap URL
  4. Extracts all URLs in the sitemap

Steps to extract URLs

Follow these steps to extract URLs from a sitemap using Extract sitemap:

In the Set up a website scan window, click Extract URLs from sitemap.
The Extract sitemap utility opens in a new browser tab.

Enter the domain URL or the sitemap URL, and then click Fetch sitemap. Extract URLs from sitemap
All the pages in your website are displayed in a hierarchical, tree-like format under Filter pages by path name. Next to it, all the pages are shown under n of N pages selected. By default, all the pages are selected.
Extracted URLs

To exclude some pages or subfolders from the scan, deselect them.
The pages selected count goes down.

When ready, click Download List.
The URLs you selected are downloaded to a CSV file.

Go to the tab that has the Set up a website scan window open.

Upload the CSV file you downloaded.

We're sorry to hear that. Please share your feedback so we can do better

Contact our Support team for immediate help while we work on improving our docs.

We're continuously improving our docs. We'd love to know what you liked






Thank you for your valuable feedback

Is this page helping you?

Yes
No

We're sorry to hear that. Please share your feedback so we can do better

Contact our Support team for immediate help while we work on improving our docs.

We're continuously improving our docs. We'd love to know what you liked






Thank you for your valuable feedback!

Talk to an Expert
Download Copy