Automated Movie Frame Extracting and Geotagging

This is a short tutorial on an automated method to extract and geotag movie frames.  One specific use case is that you have just flown a survey with your quad copter using a 2-axis gimbal pointing straight down, and a gopro action cam in movie mode.  Now you’d like to create a stitched map from your data using tools like pix4d or agisoft.

The most interesting part of this article is the method I have developed to correlate the frame timing of a movie with the aircraft’s flight data log.  This correlation process yields a result such that for any and every frame of the movie, I can find the exact corresponding time in the flight log, and for any time in the flight log, I can find the corresponding video frame.  Once this relationship is established, it is a simple matter to walk though the flight log and pull frames based on the desired conditions (for example, grab a frame at some time interval, while above some altitude AGL, and only when oriented +/- 10 degrees from north or south.)

Video Analysis

The first step of the process is to analyze the frame to frame motion in the video.


Example of feature detection in a single frame.

  1. For each video frame we run a feature detection algorithm (such as SIFT, SURF, Orb, etc.) and then compute the descriptors for each found feature.
  2. Find all the feature matches between frame “n-1” and frame “n”.  This is done using standard FLANN matching, followed by a RANSAC based homography matrix solver, and then discarding outliers.  This approach has a natural advantage of being able to ignore extraneous features from the prop or the nose of the aircraft because those features don’t fit into the overall consensus of the homography matrix.
  3. Given the set of matching features between frame “n-1” and frame “n”, I then compute a best fit rigid affine matrix transformation from the features locations (u, v) from one frame to the next.  The affine transformation can be decomposed into a rotational component, a translation (x, y) component, and a scale component.
  4. Finally I log the frame number, frame time (starting at t=0.0 for the first frame), and the rotation (deg), x translation (pixels), and y translation (pixels.)

The cool, tricky observation

I haven’t seen anyone else do anything like this before, so I’ll pretend I’ve come up with something new and cool.  I know there is never anything new under the sun, but it’s fun to rediscover things for oneself.

Use case #1: Iris quad copter with a two axis Tarot gimbal, and a go-pro hero 3 pointing straight down.  Because the gimbal is only 2-axis, the camera tracks the yaw motion of the quad copter exactly.  The camera is looking straight down, so camera roll rate should exactly match the quad copter’s yaw rate.  I have shown I can compute the frame to frame roll rate using computer vision techniques, and we can save the iris flight log.  If these two signal channels aren’t too noisy or biased relative to each other, perhaps we can find a way to correlate them and figure out the time offset.


3DR Iris + Tarot 2-axis gimbal

Use case #2:  A Senior Telemaster fixed wing aircraft with a mobius action cam fixed to the airframe looking straight forward.  In this example, camera roll should exactly correlate to aircraft roll.  Camera x translation should map to aircraft yaw, and camera y translation should map to aircraft pitch.


Senior Telemaster with forward looking camera.

In all cases this method requires that at least one of the camera axes is fixed relative to at least one of the aircraft axes.  If you are running a 3 axis gimbal you are out of luck … but perhaps with this method in mind and a bit of ingenuity alternative methods could be devised to find matching points in the video versus the flight log.

Flight data correlation

This is the easy part.  After processing the movie file, we  now have a log of the frame to frame motion.  We also have the flight log from the aircraft.  Here are the steps to correlate the two data logs.


Correlated sensor data streams.

  1. Load both data logs (movie log and flight log) into a big array and resample the data at a consistent interval.  I have found that resampling at 30hz seems to work well enough.  I have experimented with fitting a spline curve through lower rate data to smooth it out.  It makes the plots look prettier, but I’m sure does not improve the accuracy of the correlation.
  2. I coded this process up in python.  Luckily python (numpy) has a function that takes two time sequences as input and does brute force correlation.  It slides the one data stream forward against the other data stream and computes a correlation value for every possibly overlap.  This is why it is important to resample both data streams at the same fixed sample rate.
    ycorr = np.correlate(movie_interp[:,1], flight_interp[:,1], mode='full')
  3. When you plot out “ycorr”, you will hopefully see a spike in the plot and that should correspond to the best fit of the two data streams.


Plot of data overlay position vs. correlation.

Geotagging move frames



Raw Go-pro frame grab showing significant lens distortion.  Latitude = 44.69231071, Longitude = -93.06131655, Altitude = 322.1578

The important result of the correlation step is that we have now determined the exact offset in seconds between the movie log and the flight log.  We can use this to easily map a point in one data file to a point in the other data file.

Movie encoding formats are sequential and the compression algorithms require previous frames to generate the next frame.  Thus the geotagging script steps through the movie file frame by frame and finds the point in the flight log data file that matches.

For each frame that matches the extraction conditions, it is a simple matter to lookup the corresponding longitude, latitude, and altitude from the flight log.  My script provides an example of selecting movie frames based on conditions in the flight log.  I know that the flight was planned so the transacts were flown North/South and the target altitude was about 40m AGL.  I specifically coded the script to extract movie frames at a specified interval in seconds, but only consider frames taken when the quad copter was above 35m AGL and oriented within +/- 10 degrees of either North or South.  The script is written in python so it could easily be adjusted for other constraints.

The script writes each selected frame to disk using the opencv imwrite() function, and then uses the python “pyexiv2” module to write the geotag information into the exif header for that image.


A screen grab from Pix4d showing the physical location of all the captured Go-pro movie frames.


Aerial surveying and mapping

The initial use case for this code was to automate the process of extracting frames from a go-pro movie and geotagging them in preparation for handing the image set over to pix4d for stitching and mapping.


Final stitch result from 120 geotagged gopro movie frames.

Using video as a truth reference to analyze sensor quality

It is interesting to see how accurately the video roll rate corresponds to the IMU gyro roll rate (assume a forward looking camera now.)  It is also interesting in plots to see how the two data streams track exactly for some periods of time, but diverge by some slowly varying bias for other periods of time.  I believe this shows the variable bias of MEMS gyro sensors.  It would be interesting to run down this path a little further and see if the bias correlates to g force in a coupled axis?

Visual odometry and real time mapping

Given feature detection and matching from one frame to the next, knowledge of the camera pose at each frame, opencv pnp() and triangulation() functions, and a working bundle adjuster … what could be done to map the surface or compute visual odometry during a gps outage?

Source Code

The source code for all my image analysis experimentation can be found at the University of Minnesota UAV Lab github page.  It is distributed under the MIT open-source license:

Comments or questions?

I’d love to see your comments or questions in the comments section at the end of this page!

Image Stitching Tutorial Part #1: Introduction



During the summer of 2014 I began investigating image stitching techniques and technologies for a NOAA sponsored UAS marine survey project.  In the summer of 2015 I was hired by the University of Minnesota Department of Aerospace Engineering and Mechanics to work on a Precision Agriculture project that also involves UAS’s and aerial image stitching.

Over the past few months I have developed a functional open-source image stitching pipeline written in python and opencv.  It is my intention with this series of blog postings to introduce this work and further explain our approach to aerial image processing and stitching.

Any software development project is a journey of discovery and education, so I would love to hear your thoughts, feedback, and questions in the comments area of any of these posts.  The python code described here will be released under the MIT open-source software license (one of my to-do list items is to publish this project code, so that will happen “soon.”)


The world already has several high quality commercial image stitching tools as well as several cloud based systems that are free to use.  Why develop yet another image stitching pipeline?  There are several reasons we began putting this software tool chain together.

  • We are looking at the usefulness and benefits of ‘direct georeferencing.’  If we have accurate camera pose information (time, location, and camera orientation of each image), then how can this improve the image analysis, feature detection, feature matching, stitching, and rendering process?
  • One of the the core strengths of the UMN Aerospace Engineering Department is a high quality 15-state kalman filter attitude determination system.  This system uses inertial sensors (gyros and accelerometers) in combination with a GPS to accurately estimate an aircraft’s ‘true’ orientation and position.  Our department is uniquely positioned to provide a high quality camera pose estimate and thus examine ‘direct georeferencing’ based image processing.
  • Commercial software and closed source cloud solutions do not enable the research community to easily ask questions and test ideas and theories.
  • We hope to quantify the sensor quality required to perform useful direct georeferencing as well as the various sources of uncertainty that can influence the results.
  • We would love to involve the larger community in our work, and we anticipate there will be some wider interest in free/open image processing and stitching tools that anyone can modify and run on their own computer.


I will be posting new tutorials in this series as they are written.  Here is a quick look ahead at what topics I plan to cover:

  • Direct Georeferencing
  • Image stitching basics
  • Introduction to the open-source software tool chain
  • Aircraft vs. camera poses and directly visualizing your georeferenced data set
  • Feature detection
  • Feature matching
  • Sparse Bundle Adjustment
  • Seamless 3D and 2D image mosaics, DEM’s, Triangle meshes, etc.

Throughout the image collection and image stitching process there is art, science, engineering, math, software, hardware, aircraft, skill, and a maybe bit of luck once in a while (!) that all come together in order to produce a successful aerial imaging result.

Software Download

The software referenced in this tutorial series is licensed with the MIT license and available on the University of Minnestoa UAV Lab public github page under the ImageAnalysis repository.


Adventures in Aerial Image Stitching

A small UAV + a camera = aerial pictures.




This is pretty cool just by itself.  The above images are downsampled, but at full resolution you can pick out some pretty nice details.  (Click on the following image to see the full/raw pixel resolution of the area.)


The next logical step of course is to stitch all these individual images together into a larger map.  The questions are: What software is available to do image stitching?  How well does it work?  Are there free options?  Do I need to explore developing my own software tool set?


Various aerial imaging sites have set the bar at near visual perfection.  When we look at google maps (for example), the edges of runways and roads are exactly straight, and it is almost impossible to find any visible seam or anomaly in their data set.  However, it is well known that google imagery can be several meters off from it’s true position, especially away from well travelled areas.  Also, their imagery can be a bit dated and is lower resolution than we can achieve with our own cameras … these are the reasons we might want to fly a camera and get more detailed, more current , and perhaps more accurately placed imagery.


Of course the first goal is to meet our expectations. 🙂  I am very adverse to grueling manual stitching processes, so the second goal is to develop a highly automated process with minimal manual intervention needed.  A third goal is to be able to present the data in a way that is useful and manageable to the end user.

Attempt #1: Hugin

Hugin is a free/open-source image stitching tool.  It appears to be well developed, very capable, and supports a wide variety of stitching and projection modes.  At it’s core it uses SIFT to identify features and create a set of keypoints.  It then builds a KD tree and uses fastest nearest neighbor to find matching features between image pairs.  This is pretty state of the art stuff as far as my investigation into this field has shown.

Unfortunately I could not find a way to make hugin deal with a set of pictures taken mostly straight down and from a moving camera position.  Hugin seems to be optimized for personal panormas … the sort of pictures you would take from the top of a mountain when just one shot can’t capture the full vista.  Stitching aerial images together involves a moving camera vantage point and this seems to confuse all of hugin’s built in assumptions.

I couldn’t find a way to coax hugin into doing the task.  If you know how to make this work with hugin, please let me know!  Send me an email or comment at the bottom of this post!

Attempt #2: + Visual SFM + CMPMVS

Someone suggested I checkout  This looks like a great resource and they have outlined a processing path using a number of free or open-source tools.

Unfortunately I came up short with this tool path as well.  From the pictures and docs I could find on these software packages, it appears that the primary goal of this site (and referenced software packages) is to create a 3d surface model from the aerial pictures.  This is a really cool thing to see when it works, but it’s not the direction I am going with my work.   I’m more interested in building top down maps.

Am I missing something here?  Can this software be used to stitch photos together into larger seamless aerial maps?  Please let me know!

Attempt #3: Microsoft ICE (Image Composite Editor)

Ok, now we are getting somewhere.  MS ICE is a slick program.  It’s highly automated to the point of not even offering much ability for user intervention.  You simply throw a pile of pictures at it, and it finds keypoint matches, and tries to stitch a panorama together for you.  It’s easy to use, and does some nice work.  However, it does not take any geo information into consideration.  As it fits images together you can see evidence of progressively increased scale and orientation distortion.  It has trouble getting all the edges to line up just right, and occasionally it fits an image into a completely wrong spot.  But it does feather the edges of the seams so the final result has a nice look to it.  Here is an example.  (Click the image for a larger version.)


The result is rotated about 180 degrees off, and the scale at the top is grossly exaggerated compared to the scale at the bottom of the image.  If you look closely, it has a lot of trouble matching up the straight line edges in the image.  So ICE does a commendable job for what I’ve given it, but I’m still way short of my goal.

Here is another image set stitched with ICE.  You can see it does a better job avoiding progressive scaling errors on this set.  However, linear features still are crooked, there are many visual discontinuities, and it one spot it has completely bungled the fit and inserted a fragment completely wrong.  So it still falls pretty short of my goal of making a perfectly scaled, positioned, and seamless map that would be useful for science.


Attempt #4: Write my own stitching software

How hard could it be … ? 😉

  1. Find the features/keypoints in all the images.
  2. Compute a descriptor for each keypoint.
  3. Match keypoint descriptors between all possible pairs of images.
  4. Filter out bad matches.
  5. Transform each image so that it’s keypoint position matches exactly (maybe closely? maybe roughly on the same planet as ….?) that same keypoint as it is found in all other matching images.

I do have an advantage I haven’t mentioned until now:  I have pretty accurate knowledge of where the camera was when the image was taken, including the roll, pitch, and yaw (“true” heading).  I am running a 15-state kalman filter that estimates attitude from the gps + inertials.  Thus it converges to “true” heading, not magnetic heading, not ground track, but true orientation.  Knowing true heading is critically important for accurately projecting images into map space.

The following image shows the OpenCV “ORB” feature detector in action along with the feature matching between two images.


Compare the following fit to the first ICE fit above.  You can see a myriad of tiny discrepancies.  I’ve made no attempt to feather the edges of the seams, and in fact I’m drawing every image in the data set using partial translucency.   But this fit does a pretty good job at preserving overall all geographically correct scale, position, and orientation of all the individual images.


Here is a second data set taken of the same area.  This corresponds to the second ICE picture above.  Hopefully you can see that straight line edges, orientations, and scaling is better preserved.


Perhaps you might also notice that because my own software tool set understands the camera location when the image is taken, the projection of the image into map space is more accurately warped (none of the images have straight edge lines.)

Do you have any thoughts, ideas, or suggestions?

This is my first adventure with image stitching and feature matching.  I am not pretending to be an expert here.  The purpose of this post is to hopefully get some feedback from people who have been down this road before and perhaps found a better or different path through.  I’m sure I’ve missed some good tools, and good ideas that would improve the final result.  I would love to hear your comments and suggestions and experiences.  Can I make one of these data sets available to experiment with?

To be continued….

Expect updates, edits, and additions to this posting as I continue to chew on this subject matter.