SALE!Use codeBF40 for 40% off everything!
Hurry, sale ends soon!Click to see the full catalog.

Navigation

Making developers awesome at machine learning

Click Here to Take the FREE Machine Learning with OpenCV Crash-Course

How to Transform Images and Create Video with OpenCV

By Adrian TamonJanuary 30, 2024in OpenCV 3

When you work with OpenCV, you most often work with images. However, you may find it useful to create animation from multiple images. Chances are that showing images in rapid succession may give you different insight or it is easier to visualize your work by introducing a time axis.

In this post, you will see how to create a video clip in OpenCV. As an example, you will also learn some basic image manipulation techniques to create the images. In particular, you will learn:

How to manipulate images as numpy array
How to manipulate images using OpenCV functions
How to create video file in OpenCV

Kick-start your project with my bookMachine Learning in OpenCV. It providesself-study tutorials withworking code.

Let’s get started.

How to Transform Images and Create Video with OpenCV
Photo byKAL VISUALS. Some rights reserved.

Overview

This post is divided into two parts; they are:

Ken Burns Effect
Writing Video

Ken Burns Effect

You are going to create a lot of images by following other posts. Maybe it is to visualize some progress of your machine learning project, or to show how a computer vision technique is manipulating your image. To make things simpler, you are going to do the simplest manipulation to an input image: cropping.

The task in this post is to createKen Burns effect. It is a panning and zooming technique named after the filmmaker Ken Burns:

Instead of showing a large static photo on screen, the Ken Burns effect crops to a detail, then pans across the image.
— Wikipedia, “Ken Burns effect”

Let’s see how you can create the Ken Burns effect in Python code using OpenCV. We start with an image, for example the bird picture below that you can download from Wikipedia:

https://upload.wikimedia.org/wikipedia/commons/b/b7/Hooded_mountain_tanager_%28Buthraupis_montana_cucullata%29_Caldas.jpg

A picture ofButhraupis montana cucullata. Photo by Charles J. Sharp. (CC-BY-SA)

This picture is in 4563×3042 pixels. Opening this picture with OpenCV is easy:

import cv2imgfile = "Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"img = cv2.imread(imgfile, cv2.IMREAD_COLOR)cv2.imshow("bird", img)cv2.waitKey(0)

importcv2

imgfile="Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"

img=cv2.imread(imgfile,cv2.IMREAD_COLOR)

cv2.imshow("bird",img)

cv2.waitKey(0)

The image read by OpenCV,img, is indeed a numpy array of shape (3042, 4563, 3) and in the data typeuint8 (8-bit unsigned integer) for it is a colored image which each pixel is represented as BGR values between 0 and 255.

Ken Burns effect is to zoom and pan. Each frame in the video is a crop of the original image (and then zoom to fill the screen). To crop the image given a numpy array is easy, since numpy already provided the slicing syntax for you:

cropped = img[y0:y1, x0:x1]

1	cropped=img[y0:y1,x0:x1]

The image is a three-dimensional numpy array. The first two dimensions are for height and width, respectively (same as how to set a coordinate for a matrix). Hence you can use the numpy slicing syntax to take pixels $y_0$ to $y_1$ in the vertical direction and pixels $x_0$ to $x_1$ in the horizontal direction (remember that in matrix, coordinates are numbered from top to bottom and from left to right).

Cropping a picture means to take a picture of dimension $W\times H$ into a smaller dimension $W’\times H’$. In order to make a video, you want to create frames of a fixed dimension. The cropped dimension $W’\times H’$ would need to be resized. Moreover, to avoid distortion, the cropped image also needs to be at a predefined aspect ratio.

To resize an image, you can define a new numpy array, then calculate and fill in the pixel values one by one. There are many ways to calculate pixel value, such as using linear interpolation or simply copy over the nearest pixel. If you try to implement the resize operation, you will find it not hard but still quite cumbersome. Hence the easier way is to use OpenCV’s native function, such as the following:

resized = cv2.resize(cropped, dsize=target_dim, interpolation=cv2.INTER_LINEAR)

1	resized=cv2.resize(cropped,dsize=target_dim,interpolation=cv2.INTER_LINEAR)

The functioncv2.resize() takes an image and the target dimension as a tuple of (width, height) in pixel size and returns a new numpy array. You can specify the algorithm for resizing. The above is using linear interpolation and it looks good in most cases.

These are basically all ways you can manipulate an image in OpenCV, namely:

Manipulate the numpy array directly. This works well for simple tasks where you want to work at the pixel level
Using OpenCV functions. This is more suitable for complex tasks where you need to consider the entire image or it is too inefficient to manipulate each pixel.

With these, you can build your Ken Burns animation. The flow is as follows:

Given an image (preferably a high-resolution one), you want to define pan by specifying the starting and ending focus coordinates. You also want to define the starting and ending zoom ratio.
You have a predefined video duration and the FPS (frame per second). The total number of frames in the video is the duration multiplied by the FPS.
For each frame, calculate the crop coordinates. Then resize the cropped image to the target resolution of the video
With all the frames prepared, you write to the video file.

Let’s start with the constants: Assume we are going to create a two-second 720p video (resolution 1280×720) at 25 FPS (which is quite low but visually acceptable). The pan will start at the center at 40% from left and 60% from top of the image, and end at the center at 50% from left and 50% from top of the image. The zoom will be start from 70% of the original image, then zoom out to 100%.

imgfile = "Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"video_dim = (1280, 720)fps = 25duration = 2.0start_center = (0.4, 0.6)end_center = (0.5, 0.5)start_scale = 0.7end_scale = 1.0

imgfile="Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"

video_dim=(1280,720)

fps=25

duration=2.0

start_center=(0.4,0.6)

end_center=(0.5,0.5)

start_scale=0.7

end_scale=1.0

You are going to crop the image a lot of times to create frames (precisely, there are 2×25=50 frames). Therefore it is beneficial to create a function for cropping:

def crop(img, x, y, w, h):    x0, y0 = max(0, x-w//2), max(0, y-h//2)    x1, y1 = x0+w, y0+h    return img[y0:y1, x0:x1]

defcrop(img,x,y,w,h):

x0,y0=max(0,x-w//2), max(0, y-h//2)

x1,y1=x0+w,y0+h

returnimg[y0:y1,x0:x1]

This cropping function takes an image, the tentative center position in pixel coordinate, and the width and height in number of pixels. The cropping will ensure it will not start beyond the image border, hence the twomax() functions are used. Cropping is done using numpy slicing syntax.

If you consider that the current point of time is at $\alpha$% of the entire duration, you can use affine transform to calculate the exact level of zoom and the position of pan. In terms of the relative position of the pan center (in terms of percentage of original width and height), the affine transform gives

rx = end_center[0]*alpha + start_center[0]*(1-alpha)ry = end_center[1]*alpha + start_center[1]*(1-alpha)

1 2	rx=end_center[0]alpha+start_center[0](1-alpha) ry=end_center[1]alpha+start_center[1](1-alpha)

wherealpha is between 0 and 1. Similarly, the zoom level is

scale = end_scale*alpha + start_scale*(1-alpha)

1	scale=end_scalealpha+start_scale(1-alpha)

Given the original image size and the scale, you can calculate the size of cropped image by multiplication. But since the aspect ratio of the image may not be the same as the video, you should adjust the cropped dimension to fit the video aspect ratio. Assume the image numpy array isimg, and the zoom level asscale calculated above, the cropped size can be calculated as:

orig_shape = img.shape[:2]if orig_shape[1]/orig_shape[0] > video_dim[0]/video_dim[1]:    h = int(orig_shape[0]*scale)    w = int(h * video_dim[0] / video_dim[1])else:    w = int(orig_shape[1]*scale)    h = int(w * video_dim[1] / video_dim[0])

orig_shape=img.shape[:2]

iforig_shape[1]/orig_shape[0]>video_dim[0]/video_dim[1]:

h=int(orig_shape[0]*scale)

w=int(h *video_dim[0]/video_dim[1])

else:

w=int(orig_shape[1]*scale)

h=int(w *video_dim[1]/video_dim[0])

The above is to compare the aspect ratio (width divided by height) between the image and video, and the zoom level is used for the more limited edge and calculates the other edge based on the target aspect ratio.

Once you know how many frames you needed, you can use a for-loop to create each frame with a different affine parameteralpha, which can be obtained using a numpy functionlinspace(). The complete code is as follows:

import cv2import numpy as npimgfile = "Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"video_dim = (1280, 720)fps = 25duration = 2.0start_center = (0.4, 0.6)end_center = (0.5, 0.5)start_scale = 0.7end_scale = 1.0img = cv2.imread(imgfile, cv2.IMREAD_COLOR)orig_shape = img.shape[:2]def crop(img, x, y, w, h):    x0, y0 = max(0, x-w//2), max(0, y-h//2)    x1, y1 = x0+w, y0+h    return img[y0:y1, x0:x1]num_frames = int(fps * duration)frames = []for alpha in np.linspace(0, 1, num_frames):    rx = end_center[0]*alpha + start_center[0]*(1-alpha)    ry = end_center[1]*alpha + start_center[1]*(1-alpha)    x = int(orig_shape[1]*rx)    y = int(orig_shape[0]*ry)    scale = end_scale*alpha + start_scale*(1-alpha)    # determined how to crop based on the aspect ratio of width/height    if orig_shape[1]/orig_shape[0] > video_dim[0]/video_dim[1]:        h = int(orig_shape[0]*scale)        w = int(h * video_dim[0] / video_dim[1])    else:        w = int(orig_shape[1]*scale)        h = int(w * video_dim[1] / video_dim[0])    # crop, scale to video size, and save the frame    cropped = crop(img, x, y, w, h)    scaled = cv2.resize(cropped, dsize=video_dim, interpolation=cv2.INTER_LINEAR)    frames.append(scaled)# write to MP4 filevidwriter = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, video_dim)for frame in frames:    vidwriter.write(frame)vidwriter.release()

importcv2

importnumpyasnp

imgfile="Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg"

video_dim=(1280,720)

fps=25

duration=2.0

start_center=(0.4,0.6)

end_center=(0.5,0.5)

start_scale=0.7

end_scale=1.0

img=cv2.imread(imgfile,cv2.IMREAD_COLOR)

orig_shape=img.shape[:2]

defcrop(img,x,y,w,h):

x0,y0=max(0,x-w//2), max(0, y-h//2)

x1,y1=x0+w,y0+h

returnimg[y0:y1,x0:x1]

num_frames=int(fps *duration)

frames=[]

foralphainnp.linspace(0,1,num_frames):

rx=end_center[0]*alpha+start_center[0]*(1-alpha)

ry=end_center[1]*alpha+start_center[1]*(1-alpha)

x=int(orig_shape[1]*rx)

y=int(orig_shape[0]*ry)

scale=end_scale*alpha+start_scale*(1-alpha)

# determined how to crop based on the aspect ratio of width/height

iforig_shape[1]/orig_shape[0]>video_dim[0]/video_dim[1]:

h=int(orig_shape[0]*scale)

w=int(h *video_dim[0]/video_dim[1])

else:

w=int(orig_shape[1]*scale)

h=int(w *video_dim[1]/video_dim[0])

# crop, scale to video size, and save the frame

cropped=crop(img,x,y,w,h)

scaled=cv2.resize(cropped,dsize=video_dim,interpolation=cv2.INTER_LINEAR)

frames.append(scaled)

# write to MP4 file

vidwriter=cv2.VideoWriter("output.mp4",cv2.VideoWriter_fourcc(*"mp4v"),fps,video_dim)

forframeinframes:

vidwriter.write(frame)

vidwriter.release()

The last few lines is how you use OpenCV to write a video. You create aVideoWriter object with the FPS and resolution specified. Then you write the frames one by one, and release the object to close the written file.

The created video is one like thethis. A preview is as follows:

Preview of the created video. Viewing this requires a supported browser.

Writing Video

From the example in the previous section, you saw how we create aVideoWriter object:

vidwriter = cv2.VideoWriter("output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, video_dim)

1	vidwriter=cv2.VideoWriter("output.mp4",cv2.VideoWriter_fourcc(*"mp4v"),fps,video_dim)

Unlike how you may write an image file (such as JPEG or PNG), the format of the video that OpenCV created is not inferred from the filename. It is the second parameter to specify the video format, namely, theFourCC, which is a code of four characters. You can find the FourCC code and the corresponding video format from the list at the following URL:

https://fourcc.org/codecs.php

However, not all FourCC code can be used. It is because OpenCV create the video using the FFmpeg tool. You can find the list of supported video format using the command:

ffmpeg -codecs

1	ffmpeg-codecs

Be sure that theffmpeg command is same as what OpenCV used. Also note that, the output of the above command only tells you what format ffmpeg supported, not the corresponding FourCC code. You need to lookup the code elsewhere, such as from the abovementioned URL.

To check if you can use a particular FourCC code, you must try it out and see if OpenCV raise an exception:

try:    fourcc = cv2.VideoWriter_fourcc(*"mp4v")    writer = cv2.VideoWriter('temp.mkv', fourcc, 30, (640, 480))    assert writer.isOpened()    print("Supported")except:    print("Not supported")

try:

fourcc=cv2.VideoWriter_fourcc(*"mp4v")

writer=cv2.VideoWriter('temp.mkv',fourcc,30,(640,480))

assertwriter.isOpened()

print("Supported")

except:

print("Not supported")

Want to Get Started With Machine Learning with OpenCV?

Take my free email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Summary

In this post, you learned how to create a video in OpenCV. The video created is built from a sequence of frames (i.e., no audio). Each frame is an image of a fixed size. As an example, you learned how to apply the Ken Burns effect to a picture, which in particular, you applied:

The technique of cropping an image using numpy slicing syntax
The technique of resizing an image using OpenCV functions
Using affine transform to calculate the parameters of zoom and pan, and create frames of the video

And finally, you write the frames into a video file using theVideoWriter object in OpenCV.

Get Started on Machine Learning in OpenCV!

Learn how to use machine learning techniques in image processing projects

...using OpenCV in advanced ways and work beyond pixels

Discover how in my new Ebook:
Machine Learing in OpenCV

It providesself-study tutorials withall working code in Python to turn you from a novice to expert. It equips you with
logistic regression,random forest,SVM,k-means clustering,neural networks, and much more...all using the machine learning module in OpenCV

Kick-start your deep learning journey with hands-on exercises

See What's Inside

3 Responses toHow to Transform Images and Create Video with OpenCV

Nuno PereiraDecember 2, 2023 at 6:22 am#
Excellent tutorial One question: for the sake of symmetry wouldn’t it be more appropriate to use
y = int(orig_shape[0]*ry)
instead of
y = int(orig_shape[0]*rx)?
Reply
- James CarmichaelDecember 2, 2023 at 11:37 am#
  Hi Nuno…Thank you for your feedback! You may be correct! Please proceed with your recommendation and let us know what you find.
  Reply
BinyaminFebruary 23, 2024 at 2:01 am#
Thank you so much!
A beautiful post.
Reply

Movatterモバイル変換

Navigation

How to Transform Images and Create Video with OpenCV

Overview

Ken Burns Effect

Writing Video

Want to Get Started With Machine Learning with OpenCV?

Summary

Get Started on Machine Learning in OpenCV!

Learn how to use machine learning techniques in image processing projects

Kick-start your deep learning journey with hands-on exercises

More On This Topic

About Adrian Tam

3 Responses toHow to Transform Images and Create Video with OpenCV

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:

Picked for you:

Loving the Tutorials?