Performance Profiling in Android

Or, how to quickly pinpoint bottlenecks with systrace.

The What
  • Application lets you shoot videos.
  • Videos are chunked & uploaded to a server.
  • Videos with progress are shown in a listview.
The Context

An app I'm working on makes heavy use of Android's SyncAdapter framework. This is an exceptionally neat piece of the system, but that's a story for another day.

I use the framework to upload large files in 512kb chunks to a RESTful API using Retrofit & a chunked uploader. After each successful chunk upload, the progress of the file is saved to disk. This is so that you can safely drop out of a network connection, run out of battery or turn off your phone and resume the upload when you get back to a solid connection.

A SimpleCursorAdapter-backed listview shows each file, as well as a progress bar. One of the neat freebies you get with the SimpleCursorAdapter is that your list items will automatically be refreshed in bindView() whenever the underlying data is changed, which means we can just do;

(a contrived example)

@Override
public void bindView(View v, Context context, Cursor cursor) {  
    ...
    long progress 
        = cursor.getLong(cursor.getColumnIndex("progress"));
    progressBar.setProgress( progress / file.length() * 100)    
    ...
}

Next time the progress is stored to disk, bindView() is called again and the progress bar is reset. No messing about with threads or observables, really neat. This may seem like a tangent - but we'll come back to this point.

So what's wrong?

However - when uploading large files, our UI was becoming noticeably sluggish. For a number of irrelevant reasons, we immediately thought it wasn't to do with the UI, rather something to do with the SyncAdapter framework and potentially the heavy I/O operations that were happening.

Since I thought it could be a thorny issue, it was time to do some initial profiling with TraceView on the SyncAdapter process rather than waste time making guesses by commenting bits & pieces out. Nothing immediately popped out, so it was time to boot up DDMS and get a bigger picture with systrace (that's this guy).

You'll be prompted with the following settings screen, which lets you select which processes should be captured by systrace. Obviously we want to profile the Sync Manager, but a few processes were thrown in, just in case.

As soon as you hit OK, systrace will start profiling the device for the desired duration.

Opening up trace.html in a browser will give something like the following.

You can navigate around with WASD to dial in the desired granularity.

It's quite straightforward. You've got the processes being run on the CPU displayed in the top pane, and you can fold/unfold the lower panels to show or hide certain particular processes. Each block shown on the lower panel is a 'chunk' of processing time, and they map directly to the CPU section.

Here's how the SyncAdapter process looked.

(the 'Did Not Finish' blocks can be ignored - they're the result of the trace buffer overflowing, hence the ending of that process never being recorded. Increasing the trace buffer size will solve this in most cases.)

Weird - everything here looks perfect, no excessive computation at all. The small 'blips' show a file chunk being pushed over the API, which is exactly what we expect.

This completely rules out anything to do with the network - so what does the main (UI) process look like?

Boom. UI thread is totally overloaded. What's causing it exactly though? Well, if we line it up & zoom into the CPU view, we should see a large number of similarly-colored-blocks, which represents the same thing being called excessively.

Those dark blue ones ? le.h264.decoder - which is OMX.google.h264.decoder. This implies that there's some heavy video decoding happening unknowingly on the UI thread.

A quick look back over the CursorAdapter's code confirmed suspicions. A thumbnail was being created from each video item. Presumably, this thumbnail was being sampled by the h264.decoder and was being run in every call to bindView(). This happened every time the backing data was updated - every time the progress was saved, which was chewing the UI thread up quite heavily.

The Fix

Simple, cache the generated thumbnails. This is straightforward with bitmaps if you're using something like Picasso, but we're generating video thumbnails, which isn't supported by Picasso's default library.

Fortunately, we didn't have to implement any custom memory/disk cache system, since we can implement some of our own image loading logic while still utilising Picasso's fantastic caching by using the RequestHandler.

While the details aren't entirely relevant to performance profiling, a real world example with some background context nicely shows how useful the systrace tool can be in quickly pinpointing suspicious behaviour.

comments powered by Disqus