How to get HTML source of a Web Element in Selenium WebDriver
Shreya Bose, Technical Content Writer at BrowserStack - December 11, 2019
Before diving into the code, let’s take a moment to understand the terms involved in the question above.
- What is HTML Source? This refers to the HTML code underlying a certain web element on a web page. Do not confuse this with the HTML <source> tag.
- What is a Web element? Anything that appears on a web page is a web element. Most obviously, this refers to text boxes, checkboxes, buttons or any other fields that display or require data from the user.
Web elements can also mean the tags that reside within the HTML code underlying the web page. Essentially, interaction with the HTML code is interaction with a web element. Such elements usually have unique identifiers, such as ID, name or unique classes.
For example, in order to highlight text on a page, one would have to interact with the “body”, a “div” and perhaps even a “p” element.
It is common for web elements to occur within other web elements. To locate them, one can use mechanisms such as XPath or CSS Selectors.
How to retrieve the HTML source of a web element using Python?
To start with, download the Python bindings for Selenium WebDriver.
- One can do this from the PyPI page for Selenium package.
- Alternatively, one can use pip to install the Selenium package. In fact, Python 3.6 provides the pip in the standard library. Install Selenium with pip with the following syntax:
pip install selenium
It is also possible to use virtualenv in order to create isolated Python environments. Python 3.6 offers pyvenv which is quite similar to virtualenv.
* Notes for Windows users
- Install Python 3.6 with the MSI provided in the python.org download page.
- Start a command prompt using the cmd.exe program. Then run the pip command with the syntax given below to install Selenium.
C:\Python35\Scripts\pip.exe install selenium
Now, here’s how to get a web element:
elem = wd.find_element_by_css_selector('#my-id')
Here’s how to get the HTML source for the full page:
There are two ways to get the HTML source of a web element using Selenium:
Method #1 – Read the innerHTML attribute to get the source of the content of the element. innerHTML is a property of a DOM element whose value is the HTML that exists in between the opening tag and ending tag.
For example, the innerHTML property in the code below carries the value “text”
<p> a text </p>
This property can use to retrieve or dynamically insert content in a web page. However, if it is used to do anything beyond inserting simple text, some differences may occur in how it operates across different browsers.
innerHTML was first implemented in Internet Explorer 5. It has been part of the standard and exists as a property of HTMLElement and HTMLDocument since HTML 5.
Implement the innerHTML attribute to get the HTML source in Selenium with the following syntax:
Method #2 – Read the outerHTML to get the source with the current element. outerHTML is an element property whose value is the HTML between the opening and closing tags as well as HTML of the selected element itself.
For example, the outerHTML property in the code carries a value that contains div and span inside that.
<div> <span>Hello there!</span> </div>
Implement the outerHTML attribute to get the HTML source in Selenium with the following syntax:
By implementing the code detailed above, automated selenium testing becomes more efficient and result-driven. Detect, with ease, the HTML source of designated web elements so that they may be examined for anomalies. Needless to say, identifying anomalies quickly leads to equally quick debugging, thus pushing out websites that provide optimal user experiences in minimal timelines.