Importing and displaying large data sets in Core Data

Aug

2011

Importing and displaying large data sets in Core Data

by Marcus Zarra

*Note: This is a re-print from the Mac Developer Network.*

I tend to frequent the StackOverflow website and watch for questions about Core Data. Generally if it is a question I can answer I will and if it is a question I can’t answer then I really dig in for a fun session of exploration and attempt to find the answer.

One question that I have been seeing with a tremendous amount of regularity is importing and displaying data on CocoaTouch devices. Generally the question comes from one of a few problems:

* Trying to import a large amount of data causes the UI to hang.
* Trying to save a large block of data causes the UI to hang.
* Saving on exit causes the OS to kill the app.
* Importing on launch causes the OS to kill the app.

All of these “problems” are symptoms of the same two issues.

## Issues with large imports ##

Importing a large amount of data, either from disk or from a network takes time and memory. If this is done on the main/UI thread then it becomes obvious to the user because the user interface is slow or unresponsive. This leads to a poor experience for the user.

On the other side of this, once you get the data imported into Core Data then you need to save it. If you are using a SQLite database then it can take a considerable amount of time. If you are using a binary store then it will take even longer. Again a poor user experience.

The answer to this problem involves multi-threading and breaking the job down into smaller pieces.

## Multi-threading

There is a persistent rumor that you cannot use Core Data in a multi-threaded environment. This is patently false. Core Data works very well in a multi-threaded environment but you need to play by the rules:

* One `NSManagedObjectContext` per thread.
* `NSManagedObject` instances cannot pass between threads.

As long as we follow those two rules, we can use Core Data in a multi-threaded environment.

## Breaking It Apart

The biggest issue with imports is that they are generally huge. That is going to kill performance even if it is done on a background thread. This is especially apparent on Cocoa Touch right now because of the single core devices that it runs on. Therefore, the first thing we **must** do is break that import apart.

I recommend splitting the incoming data (XML/JSON/whatever) into multiple files that you save on disk. Number these files sequentially. When the data is fully transferred from the network then start processing them. In addition, keep track of which file you are currently working on. When that file is complete and saved, move on to the next file and update the marker. You can store the marker either in the `NSUserDefaults` or inside of the metadata of the `NSPersistentStore` itself. In either case the idea is that if the user quits in the middle of the import you do not need to start over at zero. You pick up right where you left off.

## Code Example

In the attached sample project I have created a very trivial example of an import occurring on a background thread. If you were actually pulling in data from the net as I described above, you would actually have numerous `NSOperation` objects that are queued up to run. However in this example we have a single `NSOperation` that sleeps periodically.

The heart of this example is the periodic saves. We want to save frequently enough that each save is very quick. In addition, when the save occurs, we want to update the main `NSManagedObjectContext` so that it can refresh which will in turn update the `NSFetchedResultsController` which will in turn update the UI. Crazy enough? Let’s dive in.

### RootViewController `-startImport:`

If you are familiar with Core Data on Cocoa Touch then most of this class will be very familiar. Other than code clean up it is straight from the current Xcode template. Having said that, there are a couple of differences.

The `-startImport:` method is fired from tapping the start button in the `NSNavigationItem` shown at the top of the screen. This method will create the `NSOperationQueue` if it has not already been created and then it will create our `NSOperation` which will do the simulated work in the background.

– (void)startImport:(id)sender
{
[[[self navigationItem] rightBarButtonItem] setEnabled:NO];
[[[self navigationItem] leftBarButtonItem] setEnabled:NO];

if (![self operationQueue]) {
operationQueue = [[NSOperationQueue alloc] init];
}

ZSImportOperation *op = [[ZSImportOperation alloc] init];
[op setPersistentStoreCoordinator:[[self managedObjectContext] persistentStoreCoordinator]];
[op setRunSpeed:0.25];
[op setEntriesToCreate:1000];
[op setSaveFrequency:10];

[operationQueue addOperation:op];
[op release], op = nil;
}

Note that we have several parameters that we are setting on the `ZSImportOperation` instance. This dependency injection allows us to pass along the `NSPersistentStoreCoordinator` to the operation so that it can construct its own `NSManagedObjectContext` once it starts. We cannot create the `NSManagedObjectContext` ourselves and pass it along because that will break the thread barrier and the results are undetermined (*i.e.* very bad).

We also tell it how many objects to create, how long to pause between each creation and how often to save (*i.e.* switch files). Once everything is configured we pass it along to the `NSOperationQueue` and it starts.

### RootViewController `-contextChanged:`

This is where the magic happens as they say. This method is called whenever any `NSManagedObjectContext` in the entire application saves. We configured this in the `-viewDidLoad` method by calling:

[[NSNotificationCenter defaultCenter] addObserver:self selector:@selector(contextChanged:) name:NSManagedObjectContextDidSaveNotification object:nil];

When this method is fired we know that the data has changed somehow, somewhere. So we need to narrow it down and confirm it is a data change we care about.

– (void)contextChanged:(NSNotification*)notification
{
if ([notification object] == [self managedObjectContext]) return;

if (![NSThread isMainThread]) {
[self performSelectorOnMainThread:@selector(contextChanged:) withObject:notification waitUntilDone:YES];
return;
}

[[self managedObjectContext] mergeChangesFromContextDidSaveNotification:notification];
}

Fortunately we are not in a document based application so we really just need to check that the object pointer in the notification is not pointing to our `NSManagedObjectContext`. If it is, we bail.

We also need to make sure that all work done is on the main thread. Now I could just perform the last line of code in the `-performSelectorOnMainThread:withObject:waitUntilDone:` method and in this case I probably should. However this is another option in case the code you need to perform on the main thread is complex and you don’t want to write 12 `-performSelectorOnMainThread:withObject:waitUntilDone:` calls or write yet another method to be called. In this example, if we are not on the main thread we recursively call ourselves from the main thread and wait. This will guarantee that what happens next will always be on the main thread.

The final line is what triggers everything. This line of code is telling the main `NSManagedObjectContext` that the underlying data has changed and that it needs to update itself. This will in turn cause the `NSFetchedResultsController` to get woken up which finally updates the UI.

### ZSImportOperation `-main`

This `NSOperation` subclass is the most trivial example I could come up with.

– (void)main
{
ZAssert([self persistentStoreCoordinator], @”PSC is nil”);
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];

[[NSRunLoop currentRunLoop] addPort:[NSPort port] forMode:NSRunLoopCommonModes];

NSManagedObjectContext *moc = [[NSManagedObjectContext alloc] init];
[moc setPersistentStoreCoordinator:[self persistentStoreCoordinator]];

NSError *error = nil;

for (NSInteger index = 0; index < [self entriesToCreate]; ++index) { id object = [NSEntityDescription insertNewObjectForEntityForName:@"Event" inManagedObjectContext:moc]; [object setValue:[NSString stringWithFormat:@"User %i", index] forKey:@"name"]; [object setValue:[NSNumber numberWithInteger:(arc4random() % 99)] forKey:@"age"]; [[NSRunLoop currentRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:[self runSpeed]]]; if (index % [self saveFrequency] != 0) continue; ZAssert([moc save:&error], @"Error saving context on operation: %@\n%@", [error localizedDescription], [error userInfo]); DLog(@"saving background context"); [moc reset]; [pool release]; pool = [[NSAutoreleasePool alloc] init]; } ZAssert([moc save:&error], @"Error saving context on operation: %@\n%@", [error localizedDescription], [error userInfo]); [moc release], moc = nil; [[NSNotificationCenter defaultCenter] postNotificationName:kImportRoutineComplete object:self]; [pool release], pool = nil; } It starts with creating a separate `NSManagedObjectContext` as per the rules. We then add an empty port so that we can lazily sleep this thread. From there we start looping based on the set `entriesToCreate` value. Inside of the loop we create a new `NSManagedObject` and give it some random values. We then sleep based on the `runSpeed` value. From there we check to see if it is time to save and if not we move on to the next iteration of the loop. If it is time to save we then perform a `-save:` on the `NSManagedObjectContext` instance wrapped inside of a `ZAssert`. The `ZAssert` allows us to do a conditional check on the results and then spit out either a `NSAssert` or `NSLog` depending on whether we are compiling in debug (which would throw an assertion) or production (which will spit out an NSLog). Once the save is complete we reset the `NSManagedObjectContext` instance to release the memory it is holding onto and then drain the `NSAutoreleasePool`. Once all of the requested objects have been created we perform one final save of the `NSManagedObjectContext` and drain the `NSAutoreleasePool` one final time. We also send out a `NSNotification` just so that we can update the UI and re-enable the buttons. ## Wrap Up ## If you run the sample application you will notice a few things: * The app runs smoothly even while processing data. * The app does not crash on launch due to the OS killing it. * THe app does not crash on exit because the save takes too long. Naturally this is more work for us as the developers but the end result is an application that can handle any amount of data without fear of freezing the UI or having the OS kill our application. You can download the sample project here.

Categories

Archives

Blogroll

Importing and displaying large data sets in Core Data