Making developers awesome at machine learning
Making developers awesome at machine learning
When you work with OpenCV, you most often work with images. However, you may find it useful to create animation from multiple images. Chances are that showing images in rapid succession may give you different insight or it is easier to visualize your work by introducing a time axis.
In this post, you will see how to create a video clip in OpenCV. As an example, you will also learn some basic image manipulation techniques to create the images. In particular, you will learn:
Kick-start your project with my bookMachine Learning in OpenCV. It providesself-study tutorials withworking code.

How to Transform Images and Create Video with OpenCV
Photo byKAL VISUALS. Some rights reserved.
This post is divided into two parts; they are:
You are going to create a lot of images by following other posts. Maybe it is to visualize some progress of your machine learning project, or to show how a computer vision technique is manipulating your image. To make things simpler, you are going to do the simplest manipulation to an input image: cropping.
The task in this post is to createKen Burns effect. It is a panning and zooming technique named after the filmmaker Ken Burns:
Instead of showing a large static photo on screen, the Ken Burns effect crops to a detail, then pans across the image.
— Wikipedia, “Ken Burns effect”
Let’s see how you can create the Ken Burns effect in Python code using OpenCV. We start with an image, for example the bird picture below that you can download from Wikipedia:

A picture ofButhraupis montana cucullata. Photo by Charles J. Sharp. (CC-BY-SA)
This picture is in 4563×3042 pixels. Opening this picture with OpenCV is easy:
1 2 3 4 5 6 7 | importcv2 imgfile="Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg" img=cv2.imread(imgfile,cv2.IMREAD_COLOR) cv2.imshow("bird",img) cv2.waitKey(0) |
The image read by OpenCV,img, is indeed a numpy array of shape (3042, 4563, 3) and in the data typeuint8 (8-bit unsigned integer) for it is a colored image which each pixel is represented as BGR values between 0 and 255.
Ken Burns effect is to zoom and pan. Each frame in the video is a crop of the original image (and then zoom to fill the screen). To crop the image given a numpy array is easy, since numpy already provided the slicing syntax for you:
1 | cropped=img[y0:y1,x0:x1] |
The image is a three-dimensional numpy array. The first two dimensions are for height and width, respectively (same as how to set a coordinate for a matrix). Hence you can use the numpy slicing syntax to take pixels $y_0$ to $y_1$ in the vertical direction and pixels $x_0$ to $x_1$ in the horizontal direction (remember that in matrix, coordinates are numbered from top to bottom and from left to right).
Cropping a picture means to take a picture of dimension $W\times H$ into a smaller dimension $W’\times H’$. In order to make a video, you want to create frames of a fixed dimension. The cropped dimension $W’\times H’$ would need to be resized. Moreover, to avoid distortion, the cropped image also needs to be at a predefined aspect ratio.
To resize an image, you can define a new numpy array, then calculate and fill in the pixel values one by one. There are many ways to calculate pixel value, such as using linear interpolation or simply copy over the nearest pixel. If you try to implement the resize operation, you will find it not hard but still quite cumbersome. Hence the easier way is to use OpenCV’s native function, such as the following:
1 | resized=cv2.resize(cropped,dsize=target_dim,interpolation=cv2.INTER_LINEAR) |
The functioncv2.resize() takes an image and the target dimension as a tuple of (width, height) in pixel size and returns a new numpy array. You can specify the algorithm for resizing. The above is using linear interpolation and it looks good in most cases.
These are basically all ways you can manipulate an image in OpenCV, namely:
With these, you can build your Ken Burns animation. The flow is as follows:
Let’s start with the constants: Assume we are going to create a two-second 720p video (resolution 1280×720) at 25 FPS (which is quite low but visually acceptable). The pan will start at the center at 40% from left and 60% from top of the image, and end at the center at 50% from left and 50% from top of the image. The zoom will be start from 70% of the original image, then zoom out to 100%.
1 2 3 4 5 6 7 8 | imgfile="Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg" video_dim=(1280,720) fps=25 duration=2.0 start_center=(0.4,0.6) end_center=(0.5,0.5) start_scale=0.7 end_scale=1.0 |
You are going to crop the image a lot of times to create frames (precisely, there are 2×25=50 frames). Therefore it is beneficial to create a function for cropping:
1 2 3 4 | defcrop(img,x,y,w,h): x0,y0=max(0,x-w//2), max(0, y-h//2) x1,y1=x0+w,y0+h returnimg[y0:y1,x0:x1] |
This cropping function takes an image, the tentative center position in pixel coordinate, and the width and height in number of pixels. The cropping will ensure it will not start beyond the image border, hence the twomax() functions are used. Cropping is done using numpy slicing syntax.
If you consider that the current point of time is at $\alpha$% of the entire duration, you can use affine transform to calculate the exact level of zoom and the position of pan. In terms of the relative position of the pan center (in terms of percentage of original width and height), the affine transform gives
1 2 | rx=end_center[0]*alpha+start_center[0]*(1-alpha) ry=end_center[1]*alpha+start_center[1]*(1-alpha) |
wherealpha is between 0 and 1. Similarly, the zoom level is
1 | scale=end_scale*alpha+start_scale*(1-alpha) |
Given the original image size and the scale, you can calculate the size of cropped image by multiplication. But since the aspect ratio of the image may not be the same as the video, you should adjust the cropped dimension to fit the video aspect ratio. Assume the image numpy array isimg, and the zoom level asscale calculated above, the cropped size can be calculated as:
1 2 3 4 5 6 7 8 | orig_shape=img.shape[:2] iforig_shape[1]/orig_shape[0]>video_dim[0]/video_dim[1]: h=int(orig_shape[0]*scale) w=int(h *video_dim[0]/video_dim[1]) else: w=int(orig_shape[1]*scale) h=int(w *video_dim[1]/video_dim[0]) |
The above is to compare the aspect ratio (width divided by height) between the image and video, and the zoom level is used for the more limited edge and calculates the other edge based on the target aspect ratio.
Once you know how many frames you needed, you can use a for-loop to create each frame with a different affine parameteralpha, which can be obtained using a numpy functionlinspace(). The complete code is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | importcv2 importnumpyasnp imgfile="Hooded_mountain_tanager_(Buthraupis_montana_cucullata)_Caldas.jpg" video_dim=(1280,720) fps=25 duration=2.0 start_center=(0.4,0.6) end_center=(0.5,0.5) start_scale=0.7 end_scale=1.0 img=cv2.imread(imgfile,cv2.IMREAD_COLOR) orig_shape=img.shape[:2] defcrop(img,x,y,w,h): x0,y0=max(0,x-w//2), max(0, y-h//2) x1,y1=x0+w,y0+h returnimg[y0:y1,x0:x1] num_frames=int(fps *duration) frames=[] foralphainnp.linspace(0,1,num_frames): rx=end_center[0]*alpha+start_center[0]*(1-alpha) ry=end_center[1]*alpha+start_center[1]*(1-alpha) x=int(orig_shape[1]*rx) y=int(orig_shape[0]*ry) scale=end_scale*alpha+start_scale*(1-alpha) # determined how to crop based on the aspect ratio of width/height iforig_shape[1]/orig_shape[0]>video_dim[0]/video_dim[1]: h=int(orig_shape[0]*scale) w=int(h *video_dim[0]/video_dim[1]) else: w=int(orig_shape[1]*scale) h=int(w *video_dim[1]/video_dim[0]) # crop, scale to video size, and save the frame cropped=crop(img,x,y,w,h) scaled=cv2.resize(cropped,dsize=video_dim,interpolation=cv2.INTER_LINEAR) frames.append(scaled) # write to MP4 file vidwriter=cv2.VideoWriter("output.mp4",cv2.VideoWriter_fourcc(*"mp4v"),fps,video_dim) forframeinframes: vidwriter.write(frame) vidwriter.release() |
The last few lines is how you use OpenCV to write a video. You create aVideoWriter object with the FPS and resolution specified. Then you write the frames one by one, and release the object to close the written file.
The created video is one like thethis. A preview is as follows:
Preview of the created video. Viewing this requires a supported browser.
From the example in the previous section, you saw how we create aVideoWriter object:
1 | vidwriter=cv2.VideoWriter("output.mp4",cv2.VideoWriter_fourcc(*"mp4v"),fps,video_dim) |
Unlike how you may write an image file (such as JPEG or PNG), the format of the video that OpenCV created is not inferred from the filename. It is the second parameter to specify the video format, namely, theFourCC, which is a code of four characters. You can find the FourCC code and the corresponding video format from the list at the following URL:
However, not all FourCC code can be used. It is because OpenCV create the video using the FFmpeg tool. You can find the list of supported video format using the command:
1 | ffmpeg-codecs |
Be sure that theffmpeg command is same as what OpenCV used. Also note that, the output of the above command only tells you what format ffmpeg supported, not the corresponding FourCC code. You need to lookup the code elsewhere, such as from the abovementioned URL.
To check if you can use a particular FourCC code, you must try it out and see if OpenCV raise an exception:
1 2 3 4 5 6 7 | try: fourcc=cv2.VideoWriter_fourcc(*"mp4v") writer=cv2.VideoWriter('temp.mkv',fourcc,30,(640,480)) assertwriter.isOpened() print("Supported") except: print("Not supported") |
Take my free email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
In this post, you learned how to create a video in OpenCV. The video created is built from a sequence of frames (i.e., no audio). Each frame is an image of a fixed size. As an example, you learned how to apply the Ken Burns effect to a picture, which in particular, you applied:
And finally, you write the frames into a video file using theVideoWriter object in OpenCV.

...using OpenCV in advanced ways and work beyond pixels
Discover how in my new Ebook:
Machine Learing in OpenCV
It providesself-study tutorials withall working code in Python to turn you from a novice to expert. It equips you with
logistic regression,random forest,SVM,k-means clustering,neural networks, and much more...all using the machine learning module in OpenCV







Excellent tutorial One question: for the sake of symmetry wouldn’t it be more appropriate to use
y = int(orig_shape[0]*ry)
instead of
y = int(orig_shape[0]*rx)?
Welcome!
I'mJason Brownlee PhD
and Ihelp developers get results withmachine learning.
Read more
TheMachine Learning in Open CV EBook
is where you'll find theReally Good stuff.