OpenCV known as Open Source Computer Vision, is a library in Python used for Computer Vision and Image Processing Tasks. It has a modular structure and includes several shared and static libraries. OpenCV can be used in languages such as Python, C++, Java etc. Some of the applications of OpenCV include Edge Detection, Face Detection, Object Detection, Face Recognition etc.
Using OpenCV we can process images to extract the desired information from them. OpenCV makes use of Numpy (Numerical Python) which is a highly optimized library for numerical operations. One can easily install OpenCV from here.
If one wants to use opencv it can be installed using below command on anaconda prompt –
conda install -c anaconda opencv
Basics of OpenCV in Python
1. Reading an Image
## Import OpenCV library import cv2 import numpy as np # Load an image img = cv2.imread(r"dogs.jpg", 1) # Show image cv2.imshow('image',img) #1st arg - window name, 2nd - our image cv2.waitKey(0) cv2.destroyAllWindows()
The imread() function reads the image in different forms :
|cv2.IMREAD_COLOR||Loads a color image. (by default)||1|
|cv2.IMREAD_GRAYSCALE||Loads image in grayscale mode||0|
|cv2.IMREAD_UNCHANGED||Loads image as such including alpha channel||-1|
cv2.imshow() function has Name of image screen and image to display paramters.
cv2.waitKey() is a keyboard binding function. Its argument is the time in milliseconds. The function waits for specified milliseconds for any keyboard event. If you press any key in that time, the program continues. If 0 is passed, it waits indefinitely for a key stroke.
cv2.destroyAllWindows() simply destroys all the windows we created. If you want to destroy any specific window, use the function cv2.destroyWindow() where you pass the exact window name as the argument.
2. Get details of image
print("Shape of image : ",img.shape) # returns a tuple of number of rows,columns, and channels print("Sizeof Image : ",img.size) # returns total number of pixels is accessed print("Datatype of image : ",img.dtype) # returns image dtype obtained
3. Resize and Rescale image
We resize and rescale images and files to prevent computational strain. Large media files tends to store a lot of information. Displaying it takes a lot of processing. Here we are actually try to get rid of some of that information.
Usually we try to downscale the width and height of images. In below code the image size is rescaled to 75% of its total size.
import cv2 # Load an image img = cv2.imread(r"dogs.jpg", 1) def rescaleFrame(frame,scale = 0.75): # Rescale Image width = int(frame.shape * scale) height = int(frame.shape * scale) dimensions = (width, height) return cv2.resize(frame,dimensions,interpolation= cv2.INTER_AREA) # Resize Image frame_resized = rescaleFrame(img) # Show image cv2.imshow('Original image',img) cv2.imshow('Resized image', frame_resized) cv2.waitKey(0) cv2.destroyAllWindows()
4.Draw shapes and add text to images
import numpy as np import cv2 img = np.zeros([512, 512, 3], np.uint8) # draw a blank image img = cv2.rectangle(img, (384, 0), (510, 128), (0, 0, 255),thickness= 4) # Draw a rectangle font = cv2.FONT_HERSHEY_SIMPLEX cv2.putText(img, 'OpenCV', (10, 500), font, 4, (255, 255, 255), 2, cv2.LINE_AA) # Add text in image cv2.imshow("shapes", img) cv2.waitKey(0) cv2.destroyAllWindows()
The image drawn using np.zeros() by default draws a blank image.
cv2.rectangle() function takes the below parameters to draw a rectangle.
Image name(img), Vertex of angle (point1), Vertex of rectangle opposite to point1 (point2), Color of rectangle (0,0,255) represents red color.
2. cv2.putText() function adds text in image. It takes the below parameters.
Image name (img), Text to add in image, Position where to add text & others are fontFace, fontScale, Color, Thickness and LineType for text.
Similarly can also draw a circle, eclipse, line and polylines. Here is the description for them.
Essential Functions in OpenCV
In this section, we will look at some of the functions that are usually applied on image data. These are preprocessing step for building Machine Learning and Deep Learning model.
Image Blurring refers to making the image less clear or distinct. It is done with the help of various low pass filter kernels. It helps in removing noise in image and smoothing image. There are three types of blurring. They are Gaussian Blur, Median Blur and Bilateral Blur. Gaussian blur is the result of blurring an image by Gaussian function.
Canny edge detection is a popular edge detection algorithm. It is applied on blurred images to gain the edges or features from image.
Image Dilating increases the object area. With dilation we can get better noise removal results. It helps join some broken parts of an object in image. Below code displays the above operations performed on image :
img = cv2.imread(r"dog_1.jpg", 1) # Blur Image blur = cv2.GaussianBlur(img,(5,5),cv2.BORDER_DEFAULT) # Edge Cascade cascade = cv2.Canny(img,125,175) # Edge cascade of Blurred image canny = cv2.Canny(blur,125,175) # Dilating the image dilated = cv2.dilate(canny, (7,7), iterations = 4) # (7,7) - kernel size & iterations can be changed cv2.imshow('Original image',img) cv2.imshow('Edge cascade',cascade) cv2.imshow('Blurred image Edge cascade',canny) cv2.imshow('Dilated image',dilated) cv2.waitKey(0) cv2.destroyAllWindows()
Basic image transformation techniques includes Image Translation, Rotation, Flipping, Cropping and Resizing of images.
Translation is basically shifting of an image along the x and y axis. Using translation you can shift an image up(-y), down(y), left(-x), right(+x) or with any combinations.
Rotation rotates an image by some angle. OpenCV allows you to specify any rotation point that you would like to rotate the image around.
Flipping is mirroring an image. We can flip an image vertically(0), horizontally(1), vertically and horizontally(-1). The values 0,1, -1 specify the flip code for flipping an image.
Below code shows all the transformations applied on image:
import cv2 as cv import numpy as np img = cv.imread(r"dog_1.jpg", 1) cv.imshow('Dog', img) # Translation def translate(img, x, y): # x,y specify the axis for translation transMat = np.float32([[1,0,x],[0,1,y]]) # translation matrix dimensions = (img.shape, img.shape) # image width, height return cv.warpAffine(img, transMat, dimensions) translated = translate(img, -50, 50) cv.imshow('Translated', translated) # Rotation def rotate(img, angle, rotPoint=None): # angle of roation, rotation point as input (height,width) = img.shape[:2] if rotPoint is None: rotPoint = (width//2,height//2) # rotate around center rotMat = cv.getRotationMatrix2D(rotPoint, angle, 1.0) dimensions = (width,height) return cv.warpAffine(img, rotMat, dimensions) rotated = rotate(img, -45) # clockwise rotation cv.imshow('Rotated', rotated) rotated_rotated = rotate(img, -90) # rotated image can again be rotated cv.imshow('Rotated Rotated', rotated_rotated) # Flipping flip = cv.flip(img, -1) cv.imshow('Flip', flip) cv.waitKey(0)
We have learned the basics of how to work with images in python using OpenCV. Most importantly, these operations can be applied on video data because OpenCV can be used for processing videos.
Keep exploring OpenCV !