What's new in selenium 4

Table of contents

Reading Time: 5 minutes

Selenium has been in the market for the past many years and has been dominating its competitors. This blog will show us the new feature in Selenium 4 and how we can utilize them in our code.

Firstly, let’s have a look at the architecture of both Selenium 3 & Selenium 4 and see what are the differences between them.

Architecture: Selenium 3 Vs. Selenium 4

Selenium 3 Architecture

As you can see in the diagram. We have a selenium client library that consists of the supported languages, such as Java, Ruby, C sharp, Python.
After we write our tests and execute them the complete Selenium code (Client) which we have written will be converted to JSON format. This is done through JSON wire protocol over HTTP.
Generated JSON is sent to the Browser Driver (Server) through HTTP Protocol. Each browser has its driver (chromedriver, gecko driver, etc).
Browser drivers communicate with their respective browsers and execute the commands by interpreting JSON which It received on the browser.
Browser Driver receives responses from the web browsers and it sends the JSON response back to the Client.

Selenium 4 Architecture

Let’s have a look at the architecture of Selenium 4 you will see only one difference between them which is that it now does not have any JSON wire protocol over HTTP.
This means that now we are not converting every request and response in JSON format.
In selenium 4 the information is transferred directly back and forth from the client to the server.
As of now, we are using W3C (World Wide Web Consortium) which has created web standards that promote compatibility beyond webDriver implementation.
As protocol suggests there are now a set of rules and regulations which are to be followed which makes testing applications more consistent between browsers.

New features in Selenium 4

Relative Locators
Handling Multiple windows and Tabs
Partial screenshots
Capturing Height and Width of WebElement (UX validation)
Chrome DevTools
W3C WebDriver Protocol

Relative Locators

Locators are to uniquely identify web elements on a web page. If we want to interact with any element on the web page we need to uniquely identify it using these locators.
Locators are difficult to maintain. If we don’t use appropriate locators every time, they might break after some time.
Relative locators are easy to use and implement. As they do not require the exact locator but use the nearby webElements to identify them.
The main advantage of relative locators is finding elements that are difficult to locate i.e elements with no unique attributes which helps us to uniquely identify them.
Relative locators come into play when a certain element does not have any uniquely identifying attributes. For example, we might need to identify a certain text box among 5 other text boxes. And they all have the same class name. In this case, we can use the relative locator of any other web element to identify that particular textbox.
If the textbox is above the submit button we will give the locator of submit button and driver.findElement(withTagName(“label”).above(nameEditBox)).

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.Test;
import static org.openqa.selenium.support.locators.RelativeLocator.withTagName;
public class Selenium4RelativeLocators {
    @Test
    public void RelativeLocators() throws InterruptedException {
        System.setProperty("webdriver.chrome.driver", "/home/ankur/Documents/Selenium/chromedriver");
        WebDriver driver = new ChromeDriver();
        driver.manage().window().maximize();
        driver.get("https://rahulshettyacademy.com/angularpractice/");
        WebElement nameEditBox = driver.findElement(By.cssSelector("[name='name']"));
        System.out.println(driver.findElement(withTagName("label").above(nameEditBox)).getText());
        WebElement iceCreamLabel = driver.findElement(By.xpath("//label[text()='Check me out if you Love IceCreams!']"));
        driver.findElement(withTagName("input").toLeftOf(iceCreamLabel)).click();
        WebElement dateOfBirth = driver.findElement(By.cssSelector("[for='dateofBirth']"));
        driver.findElement(withTagName("input").below(dateOfBirth)).sendKeys("02/02/1992");
        WebElement rb = driver.findElement(By.id("inlineRadio1"));
        System.out.println(driver.findElement(withTagName("label").toRightOf(rb)).getText());
        driver.close();
    }
}

Handling Multiple Windows and Tabs

Handling and opening new tabs or windows is now possible in selenium. Previously it was not possible to open new tabs or new windows using selenium. Now we can open a new tab or window and also hit a new URL in them.
This can be done with the newWindow() method.
This helps us in certain situations where we need to automate multiple tabs or multiple windows in a single test case.
It can help in scenarios where we want to execute multiple tests after a single login and open multiple tabs to run tests in parallel.

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WindowType;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;
import org.testng.annotations.Test;
import java.util.Iterator;
import java.util.Set;
public class Selenium4MultipleWindowsOrTabsTest {
    @Test
    public void multipleWindowHandler() throws InterruptedException {
        System.setProperty("webdriver.chrome.driver", "/home/ankur/Documents/Selenium/chromedriver");
        WebDriver driver = new HtmlUnitDriver();
        driver.manage().window().maximize();
        driver.get("https://rahulshettyacademy.com/angularpractice/");
        driver.switchTo().newWindow(WindowType.TAB);
        Set<String> windowHandler = driver.getWindowHandles();
        Iterator<String> it = windowHandler.iterator();
        String parentId = it.next();
        String childId = it.next();
        driver.switchTo().window(childId);
        Thread.sleep(5000);
        driver.get("https://rahulshettyacademy.com");
        String sampleInput = driver.findElements(By.cssSelector("a[href*='https://courses.rahulshettyacademy.com/p/']")).get(1).getText();
        driver.switchTo().window(parentId);
        driver.findElement(By.cssSelector("input[class='form-control ng-untouched ng-pristine ng-invalid']")).sendKeys(sampleInput);
        driver.close();
    }
}

Partial screenshots

Using Selenium 4 we can now take partial screenshots of the web elements. We could take full screenshots previously but partial ones were not possible.
The screenshot is stored by default in the project’s root level.
This can be useful for debugging. So if a test fails we can take a screenshot of that moment and store it.
By using a partial screenshot we can verify that the text on the UI is correct or not after an automated run.

import org.apache.commons.io.FileUtils;
import org.openqa.selenium.By;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.Test;
import java.io.File;
import java.io.IOException;
public class SeleniumPartialScreenshotTest {
    @Test
    public void PartialScreenshot() throws IOException {
        System.setProperty("webdriver.chrome.driver", "/home/ankur/Documents/Selenium/chromedriver");
        WebDriver driver = new ChromeDriver();
        driver.manage().window().maximize();
        driver.get("https://rahulshettyacademy.com/angularpractice/");
        WebElement userName = driver.findElement(By.cssSelector("input[class='form-control ng-untouched ng-pristine ng-invalid']"));
        userName.sendKeys("sampleText");
        File screenshotFile = userName.getScreenshotAs(OutputType.FILE);
        FileUtils.copyFile((File) screenshotFile, new File("userNameScreenshot.png"));
        driver.close();
    }
}

Capturing Height and Width of WebElement(UX validation)

Using Selenium 4 we can now use perform UX validation. We can capture the height and width of web elements. We can perform it on almost all web elements.
To reduce the manual effort we can put up assertions that the web elements should have a certain size according to the wireframes or the client’s requirements.
We can use it with method getRect().getHeight() and getRect().getWidth().

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.testng.annotations.Test;
import java.io.IOException;
public class Selenium4CapturingHeightWidthOfWebelementTest {
    @Test
    public void captureWidthAndHeight() throws IOException {
        System.setProperty("webdriver.chrome.driver", "/home/ankur/Documents/Selenium/chromedriver");
        WebDriver driver = new ChromeDriver();
        driver.manage().window().maximize();
        driver.get("https://rahulshettyacademy.com/angularpractice/");
        WebElement userName = driver.findElement(By.cssSelector("input[class='form-control ng-untouched ng-pristine ng-invalid']"));
        userName.sendKeys("SampleText");
        System.out.println(userName.getRect().getHeight());
        System.out.println(userName.getRect().getWidth());
        driver.close();
    }
}

Chrome DevTools

Chrome DevTools or Developer tools are inbuilt in the chrome web browser. Previously if we wanted to access the developer tools we needed to do that manually. But now we can do that using selenium.
DevTools can help us track what’s going on in the browser and diagnose any problems. While testing the frontend there might be a scenario where the service might be failing because of a backend call.
We can catch it using chrome dev tools.
DevTools capabilities include:
- Inspecting Network Activity – for looking at the network calls
- Handling Developer Options – You can enable or disable developer options in the browser
- Viewing the DOM – You can also see the Document object model
- Measuring Performance – You can measure the performance of your web application.

W3C WebDriver Protocol

W3C WebDriver protocol now allows the client and server to interact with each other without JSON wire protocol over HTTP.
JSON wire protocol over HTTP converts Selenium code to be converted in JSON format and sends it to browser drivers.
Now the Client and server communicate directly without this protocol.

So this was a small introduction for What’s new in selenium 4. See you in the next one.