Selenium with Tesseract - Image Testing

Table of contents

Reading Time: 4 minutes

Hi folks,

In this blog, we will try to explore how to test the content of an image(basically, text in the image) using a combination of selenium and tesseract. So, let’s get started.

What is Tesseract?

Tesseract OCR is an open-source optical character reading engine developed by HP laboratories. It contains two OCR engines for image processing – an LSTM (Long Short Term Memory) OCR engine and a legacy OCR engine that works by recognizing character patterns. To dive deeper, check out the official documentation here, https://tesseract-ocr.github.io/.

I’m pretty sure that we all know what selenium is. So let’s move ahead.

Why test the content of image?

In our functional test, it is good to cover all the aspects and attributes present on the webpage. By default, I mean without tesseract, we can test whether the image is there or not. But what if, we need to test the content of the image? Let us consider a scenario where there is an image that says, “In case of an outage, reach out to this number”. We can always check the text on the image manually, but we should aim to cover most of the testing in an automated fashion.

This is where tesseract comes in to picture, we’ll explore it later in the blog.

And also, I believe, having tesseract integrated with selenium is one of the many ways that we can opt to accomplish this. But for this blog tutorial, let’s, move ahead with tesseract only.

So without much din, let’s see how to do it.

Setup.

Assuming that we are working with maven as a build tool. Firstly, we need to add the tesseract dependency in our pom.xml file.

        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>5.0.0</version>
            <scope>test</scope>
        </dependency>

You can check for the latest version for this here, https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j.

And once you build your maven project after adding this dependency, tesseract libraries should now be accessible within your project. For your reference, these are the dependencies that I have used in my project.

    <dependencies>
        <dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-java</artifactId>
            <version>4.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.testng</groupId>
            <artifactId>testng</artifactId>
            <version>7.4.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>5.0.0</version>
            <scope>test</scope>
        </dependency>

It is recommended to have tesseract installed in our local machine as well, but this is an optional step. To install it in a Linux system, execute the following command.

sudo apt-get install tesseract-ocr

And that’s it, the setup is now complete. Moving on to the implementation part.

Implementation

The approach that I’ve taken is pretty simple. Firstly, we need to locate the web element of the image. After that, take a screenshot of the image stored in a web element. Now, using “ImageIo” write that screenshot to a png or jpeg file and that’s it. Half of the work is now done.

To do so, I’ve created a java method that takes a web Element as an input. For your reference sake, I’ve attached the code snippet below.

public class TesseractExtender {

    public static void capturePicture(WebElement element) throws IOException {

        //cast element to wrapsDriver
        WrapsDriver wrapsDriver = (WrapsDriver) element;
        
        // get the entire screenshot from the driver of passed WebElement
        File screen = ((TakesScreenshot) wrapsDriver.getWrappedDriver())
                .getScreenshotAs(OutputType.FILE);

        // create an instance of buffered image from captured screenshot
        BufferedImage image = ImageIO.read(screen);

        // get the width and height of the WebElement using getSize()
        int width = element.getSize().getWidth();
        int height = element.getSize().getHeight();

        // create a rectangle using width and height
        Rectangle rect = new Rectangle(width, height);

        // get the location of WebElement in a Point.
        // this will provide X & Y co-ordinates of the WebElement
        Point point = element.getLocation();

        // create image  for element using its location and size.
        // this will give image data specific to the WebElement
        BufferedImage dest = image.getSubimage(point.getX(), point.getY(), rect.width,
                rect.height);

        // write back the image data for element in new File
        ImageIO.write(dest, "png", new File("src/test/resources/testImage.png"));
    }
}

The next step is to read that image through tesseract and extract the content out of it. For this, we need to create an instance of tesseract first. And on top of that, we need to add the path of “tessdata” as well.

“tessdata” contains some configuration files which are needed by tesseract to perform smoothly. You can find these files by extracting the tesseract jar/ tess4j.jar itself.

Once this is all done, we need to extract the image content by using the tesseract instance along with “dOCR” method. This method takes an input of the image from which we need to extract the text/content. Once this is all done, we can always assert the text extracted to validate the image text/content.

The test script that I’ve used is given below if you want to take a peek at the implementation.

public class ImageTesting extends TesseractExtender {
    WebDriver driver;

    @BeforeClass
    public void setup() {
        System.setProperty("webdriver.chrome.driver", "src/test/resources/chromedriver_linux64/chromedriver");
        driver = new ChromeDriver();
        driver.manage().window().maximize();
        driver.get("https://challengepost-s3-challengepost.netdna-ssl.com/photos/production/software_photos/001/205/265/datas/original.png");
    }

    @AfterClass
    public void tearDown(){
        driver.quit();
    }

    @Test(testName = "DUMMY_TEST")
    public void dummyTest() throws IOException, TesseractException {
        WebElement image = driver.findElement(By.tagName("img"));

        // call the method to write the image to resource folder
        TesseractExtender.capturePicture(image);

        // get the Tesseract direct interace
        Tesseract tesseract = new Tesseract();
        tesseract.setDatapath("src/test/resources/tessdata");

        // the doOCR method of Tesseract will retrive the text
        // from image captured by Selenium
        String result = tesseract.doOCR(new File("src/test/resources/testImage.png"));
        System.out.println(result);
        Assert.assertTrue(result.contains("TEST"));
    }
}