10
Sep
2008
 

Core Animation Tutorial: Rendering QuickTime Movies In A CAOpenGLLayer

by Matt Long

I’ve been experimenting a great deal lately with OpenGL and QuickTime trying to see how the two technologies work together. It’s been a bit challenging, but fortunately Apple provides two really great resources–number one, sample code. I’ve been able to learn a lot just from the samples they provide in the development tools examples as well as online. And second, the cocoa-dev and quicktime-api lists are great. Lot’s of brilliant people there who are willing to share their knowledge. It’s very helpful, prohibition to discuss the iPhone SDK notwithstanding.

Getting two technologies to work together can be a challenge especially when the bridges between the two are not necessarily clearly laid out in documentation. As I pointed out Apple provides some excellent sample code to help you along, but there is no hand-holding approach to any of it. I actually appreciate that having come from the Windows world as it seems that there all you get sometimes is hand-holding where Microsoft doesn’t trust you, the developer to figure things out and really own what you’re doing. But I digress (wouldn’t be a CIMGF post if I didn’t dig on MS a bit).

Like peanut butter and chocolate, what you get when you put together QuickTime and OpenGL is something greater than either of them left on their own (ok, this is subjective. Not everyone likes peanut butter and chocolate together, but again, I digress).

If you read the Core Video Programming Guide from Apple, you see they provide the reasons for using Core Video:

CoreVideo is necessary only if you want to manipulate individual video frames. For example, the following types of video processing would require CoreVideo:

  • Color correction or other filtering, such as provided by Core Image filters
  • Physical transforms of the video images (such as warping, or mapping on to a surface)
  • Adding video to an OpenGL scene
  • Adding additional information to frames, such as a visible timecode
  • Compositing multiple video streams

If all you need to do is display a movie, you should simply use either a QTMovieView or, if you want to stick with the Core Animation route, use a QTMovieLayer. They both function similarly, however, the view provides a lot of features that you won’t have to implement in the UI yourself such as a scrubber or play/pause buttons. Plus the view is very fast and efficient. I’m in the process of exploring performance differences between the two, but I will save my comments about that for another post.

For our example code we are most interested in the third point above–adding video to an OpenGL scene. It seems that new QuickTime developers often want to know how to manipulate movie images before displaying them. Often this leads them to pursue adding sub-views to the movie view which can become a big mess. Because we are using OpenGL, doing other drawing on the scene is very fast. I won’t kid you. OpenGL is a pain. I don’t know anybody who loves it, but everybody respects it because of its raw speed.

Point number five above–compositing multiple video streams–is also interesting. While I won’t be covering it in this post, I will say that it makes a world of difference performance wise if you composite the movies into a OpenGL scene. If you’ve ever tried to run multiple videos simultaneously in two different views or layers, it can get pretty herky jerky. You can see why it is necessary to use OpenGL instead.

The OpenGL QuickTime Two Step

Ok, it actually will take more than two steps, however, when you are working with Core Animation layers things get a whole lot easier than they are for rendering a movie in an NSOpenGLView. Here is what you get for free, as the kids say.

  • You don’t have to set up the OpenGL context. It is already available for you to send your OpenGL calls to.
  • The viewport for display is already configured
  • You don’t need to set up a display link callback

What took over 400 lines of code when rendering a QuickTime movie with no filters to an NSOpengGLView now only takes around 150 lines. Any time you can reduce code to something simpler, it makes life easier. It also makes it much easier to grok, in my opinion.

There really are two primary steps you take when using a CAOpenGLLayer. First you check to see if you should draw. Then, depending upon the answer, drawInCGLContext gets called or doesn’t. Really thats it. Determining whether or not you should draw depends upon what you are trying to do. In our case, we only want to draw if all of the following are true:

  • The movie is actually playing back
  • The visual context for the movie has been initialized
  • The visual context has a new image ready to be rendered

If all of these are true, then our call to canDrawInCGLContext returns YES. Here is the code I use to check these contraints in canDrawInCGLContext:

- (BOOL)canDrawInCGLContext:(CGLContextObj)glContext 
                pixelFormat:(CGLPixelFormatObj)pixelFormat 
               forLayerTime:(CFTimeInterval)timeInterval 
                displayTime:(const CVTimeStamp *)timeStamp
{ 
    // There is no point in trying to draw anything if our
    // movie is not playing.
    if( [movie rate] <= 0.0 )
        return NO;
    
    if( !qtVisualContext )
    {
        // If our visual context for our QTMovie has not been set up
        // we initialize it now
        [self setupVisualContext:glContext withPixelFormat:pixelFormat];
    }

    // Check to see if a new frame (image) is ready to be draw at
    // the time specified.
    if(QTVisualContextIsNewImageAvailable(qtVisualContext,timeStamp))
    {
        // Release the previous frame
        CVOpenGLTextureRelease(currentFrame);
        
        // Copy the current frame into our image buffer
        QTVisualContextCopyImageForTime(qtVisualContext,
                                        NULL,
                                        timeStamp,
                                        &currentFrame);
        
        // Returns the texture coordinates for the part of the image that should be displayed
        CVOpenGLTextureGetCleanTexCoords(
                            currentFrame, 
                            lowerLeft, 
                            lowerRight, 
                            upperRight, upperLeft);
        return YES;
    }
    
    return NO;
} 

The call to setup the visual context is where we are associating the QuickTime movie itself with a QTVisualContextRef object which is what OpenGL needs to draw the current frame. We will then use this object to load image data into a CVImageBufferRef which can be used for rendering with OpenGL. Here is the code to set up the visual context.

- (void)setupVisualContext:(CGLContextObj)glContext 
           withPixelFormat:(CGLPixelFormatObj)pixelFormat;
{
    OSStatus			    error;
    
    NSDictionary	    *attributes = nil;
    attributes = [NSDictionary dictionaryWithObjectsAndKeys:
                  [NSDictionary dictionaryWithObjectsAndKeys:
                   [NSNumber numberWithFloat:[self frame].size.width],
                   kQTVisualContextTargetDimensions_WidthKey,
                   [NSNumber numberWithFloat:[self frame].size.height],
                   kQTVisualContextTargetDimensions_HeightKey, nil], 
                  kQTVisualContextTargetDimensionsKey, 
                  [NSDictionary dictionaryWithObjectsAndKeys:
                   [NSNumber numberWithFloat:[self frame].size.width], 
                   kCVPixelBufferWidthKey, 
                   [NSNumber numberWithFloat:[self frame].size.height], 
                   kCVPixelBufferHeightKey, nil], 
                  kQTVisualContextPixelBufferAttributesKey,
                  nil];
    
    // Create our quicktimee visual context
    error = QTOpenGLTextureContextCreate(NULL,
                                         glContext,
                                         pixelFormat,
                                         (CFDictionaryRef)attributes,
                                         &qtVisualContext);

    // Associate it with our movie.
    SetMovieVisualContext([movie quickTimeMovie],qtVisualContext);
}

Next we check to see if there is an image ready using:

if(QTVisualContextIsNewImageAvailable(qtVisualContext,timeStamp))

And then we copy the image to our CVImageBufferRef with:

// Copy the current frame into our image buffer
QTVisualContextCopyImageForTime(qtVisualContext,
                                NULL,
                                timeStamp,
                                ¤tFrame);

Now it's all a matter of rendering the frame for the current time stamp.

But Wait! What TimeStamp?

If you asked this question, then you are a very astute reader. In order to obtain the next image, we simply passed the CVTimeStamp parameter, timeStamp to our call to QTVisualContextCopyImageForTime. But how do we even have a timestamp? Isn't that something we need to get from a display link? If you're asking what is a display link at this point, take a look at the Core Video Programming Guide which states:

To simplify synchronization of video with a display’s refresh rate, Core Video provides a special timer called a display link. The display link runs as a separate high priority thread, which is not affected by interactions within your application process.

In the past, synchronizing your video frames with the display’s refresh rate was often a problem, especially if you also had audio. You could only make simple guesses for when to output a frame (by using a timer, for example), which didn’t take into account possible latency from user interactions, CPU loading, window compositing and so on. The Core Video display link can make intelligent estimates for when a frame needs to be output, based on display type and latencies.

I will provide a more complete answer to the question in the future as I am still studying it myself, however, I will mention that a display link callback is unnecessary in this context as the CAOpenGLLayer is providing this for us. The timestamp field is all we need in order to get the current frame assuming that the movie is playing back.

Drawing The Frame

There is a special group of people who really get OpenGL. I salute all of you to whom this applies. You are amazing. I, however, only write as much of it as necessary and you'll see that most of the code I have here is simply a copy and paste from sample code I got from Apple. I am starting to understand it more and more, however, it makes my brain hurt. Here is my drawing code for when a frame is ready to be rendered.

- (void)drawInCGLContext:(CGLContextObj)glContext 
             pixelFormat:(CGLPixelFormatObj)pixelFormat 
            forLayerTime:(CFTimeInterval)interval 
             displayTime:(const CVTimeStamp *)timeStamp
{
    NSRect    bounds = NSRectFromCGRect([self bounds]);
    
    GLfloat 	minX, minY, maxX, maxY;        
    
    minX = NSMinX(bounds);
    minY = NSMinY(bounds);
    maxX = NSMaxX(bounds);
    maxY = NSMaxY(bounds);
    
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    glOrtho( minX, maxX, minY, maxY, -1.0, 1.0);
    
    glClearColor(0.0, 0.0, 0.0, 0.0);	     
    glClear(GL_COLOR_BUFFER_BIT);
    
    CGRect imageRect = [self frame];
    // Enable target for the current frame
    glEnable(CVOpenGLTextureGetTarget(currentFrame));
    // Bind to the current frame
    // This tells OpenGL which texture we are wanting 
    // to draw so that when we make our glTexCord and 
    // glVertex calls, our current frame gets drawn
    // to the context.
    glBindTexture(CVOpenGLTextureGetTarget(currentFrame), 
                  CVOpenGLTextureGetName(currentFrame));
    glMatrixMode(GL_TEXTURE);
    glLoadIdentity();
    glColor4f(1.0, 1.0, 1.0, 1.0);
    glBegin(GL_QUADS);

    // Draw the quads
    glTexCoord2f(upperLeft[0], upperLeft[1]);
    glVertex2f  (imageRect.origin.x, 
                 imageRect.origin.y + imageRect.size.height);
    glTexCoord2f(upperRight[0], upperRight[1]);
    glVertex2f  (imageRect.origin.x + imageRect.size.width, 
                 imageRect.origin.y + imageRect.size.height);
    glTexCoord2f(lowerRight[0], lowerRight[1]);
    glVertex2f  (imageRect.origin.x + imageRect.size.width, 
                 imageRect.origin.y);
    glTexCoord2f(lowerLeft[0], lowerLeft[1]);
    glVertex2f  (imageRect.origin.x, imageRect.origin.y);
    
    glEnd();
    
    // This CAOpenGLLayer is responsible to flush
    // the OpenGL context so we call super
    [super drawInCGLContext:glContext 
                pixelFormat:pixelFormat 
               forLayerTime:interval 
                displayTime:timeStamp];

    // Task the context
    QTVisualContextTask(qtVisualContext);
    
}

If you're not familiar with OpenGL it would help you to know that it's all about the current state. What does this mean? Well, simply put, it means that the call you are making right now, applies to whatever state the context is in. These two calls are what are the most important for our purposes.

glEnable(CVOpenGLTextureGetTarget(currentFrame));
glBindTexture(CVOpenGLTextureGetTarget(currentFrame), 
                  CVOpenGLTextureGetName(currentFrame));

With these two calls we have told OpenGL which texture to use. Now, every subsequent call applies to this texture until the state is changed to something else. So now, when we set a color or draw a quad, it applies to the texture that has been set here.

Conclusion

I love both of these technologies, QuickTime and OpenGL. They are so powerful. It's harnessing the power that's the trick. I've got some other ideas for some related posts that I plan to cover in the weeks to come, but this was a real breakthrough for me. With the help of John Clayton, Jean-Daniel Dupas, and David Duncan on the cocoa-dev list, I was able to get the sample code put together for this post. Feel free to ask questions in the comments. I will do my best to answer, but I'm still pretty new to these technologies. Write some code yourself and have fun. This is really exciting stuff. Until next time.

About The Demo Code

John Clayton has some issues getting the code to work on his Mac Pro. I successfully ran it on my MacBook Pro, and the family iMac without any issues. Anyhow, we're not sure what the problem is, so if you do run into trouble, let me know. Maybe we can figure it out. Meanwhile we're investigating it as we have time.

Update: John Clayton figured it out. Apparently the visual context needs to be reset because the pixel format is not correct on the first run. We just reset the visual context now in the call to -copyCGLContextForPixelFormat and everything seems happy. The demo code has been updated to reflect the change.

Quicktime CAOpenGLLayer Demo
Quicktime CAOpenGLLayer Demo

Comments

nickludlam says:

Trying this on a C2D Macbook Pro, I’m getting a window which flicks briefly from black to white and stops. It appears to do nothing, and takes up about 50% of my CPU time. There’s nothing revealing in the log output either.

Matt Long says:

@nickludlam

Lovely! The CPU part doesn’t surprise me so much, but I’m not sure why you are not getting the video to display. Will see what I can find out.

-Matt

vade says:

Hi.

I use a similar path for my Quartz Composer power realtime video mixer app in development. However I dont use Core Animation and am not very familiar with it, but, I’d imagein you need to flush after you push the frame to the CAOpenGLLayer, unless the super implementation flushes for you? I know I have to flush with my code..

As far as display link is concerned, its surprisingly easy to set up, and for those looking to power some effects easily, I highly suggest using Quartz Composer and QCrenderers. You can simply pass your currentFrame variable into a QCRenderers published image input port. Thats it! So awesome.

vade says:

Oh. I forgot to thank you for all your posts, they’ve been helpful to a sprouting cocoa developer. Thank you!

Matt Long says:

@vade

Just a quick not on your comment, the call to super drawInCGLContext does do the flush for you. I didn’t know that before either, but someone pointed it out on the Cocoa-dev list.

Thanks and I’m glad the site is helpful to you.

-Matt

maru says:

Hi,

I’m wondering how can I render captured video via QTCapture in a CAOpenGLLayer. I guess I need Core Video feature to compose some captured video into single stream, but I cannot find any clue… Any idea?

Matt Long says:

@maru

I’m not sure exactly what you are trying to do. If video has already been captured, you can just load it into a QTMovie and use the tutorial on this page to render into a CAOpenGLLayer. If you are talking about displaying the video in real time, then you should look into using QTCaptureLayer.

Best Regards,

-Matt

jwwalker says:

The demo isn’t working for me… it displays the initial image of the movie, but doesn’t play. It looks like QTVisualContextIsNewImageAvailable is never returning true. This is on a Mac Pro running 10.6.3.

JetForMe says:

I have the same problem as jwwalker, 10.6.4.

pause says:

Hey,
What if you’re decoding the frames yourself and you don’t have a timestamp? All I have is an NSData containing pixel values. I’ve saved them out to a ppm file and sure enough, there is a picture in there. All the tutorials I have found from Apple use Quicktime as a source using all these handy dandy QT functions. CoreVideo documentation says I can display my own decoded video. Any thoughts?