February 01, 2012

How the ECDSA algorithm works

Youness Alaoui

To popular demand, I have decided to try and explain how the ECDSA algorithm works. I’ve been struggling a bit to understand it properly and while I found a lot of documentation about it, I haven’t really found any “ECDSA for newbies” anywhere. So I thought it would be good to explain in simple terms how it works so others can learn from my research. I have found some websites that explain the basic principles but nowhere near enough to actually understand it, others that explains things without any basics, making it incomprehensible, and others that go way too deep into the the mathematics behind it.

ECDSA stands for “Elliptic Curve Digital Signature Algorithm”, it’s used to create a digital signature of data (a file for example) in order to allow you to verify its authenticity without compromising its security. Think of it like a real signature, you can recognize someone’s signature, but you can’t forge it without others knowing. The ECDSA algorithm is basically all about mathematics.. so I think it’s important to start by saying : “hey kids, don’t slack off at school, listen to your teachers, that stuff might be useful for you some day!” :) But these maths are fairly complicated, so while I’ll try to vulgarize it and make it understandable for non technical people, you will still probably need some knowledge in mathematics to understand it properly. I will do this in two parts, one that is a sort of high level explanation about how it works, and another where I dig deeper into its inner workings to complete your understanding. Note however that I’ve just recently learned this stuff, so I’m definitely not an expert on the matter.

So the principle is simple, you have a mathematical equation which draws a curve on a graph, and you choose a random point on that curve and consider that your point of origin. Then you generate a random number, this is your private key, you do some magical mathematical equation using that random number and that “point of origin” and you get a second point on the curve, that’s your public key. When you want to sign a file, you will use this private key (the random number) with a hash of the file (a unique number to represent the file) into a magical equation and that will give you your signature. The signature itself is divided into two parts, called R and S. In order to verify that the signature is correct, you only need the public key (that point on the curve that was generated using the private key) and you put that into another magical equation with one part of the signature (S), and if it was signed correctly using the the private key, it will give you the other part of the signature (R). So to make it short, a signature consists of two numbers, R and S, and you use a private key to generate R and S, and if a mathematical equation using the public key and S gives you R, then the signature is valid. There is no way to know the private key or to create a signature using only the public key.

Alright, now for the more in depth understanding, I suggest you take an aspirin right now as this might hurt! :P

Let’s start with the basics (which may be boring for people who know about it, but is mandatory for those who don’t) : ECDSA uses only integer mathematics, there are no floating points (this means possible values are 1, 2, 3, etc.. but not 1.5..),  also, the range of the numbers is bound by how many bits are used in the signature (more bits means higher numbers, means more security as it becomes harder to ‘guess’ the critical numbers used in the equation), as you should know, computers use ‘bits’ to represent data, a bit is a ‘digit’ in binary notation (0 and 1) and 8 bits represent one byte. Every time you add one bit, the maximum number that can be represented doubles, with 4 bits you can represent values 0 to 15 (for a total of 16 possible values), with 5 bits, you can represent 32 values, with 6 bits, you can represent 64 values, etc.. one byte (8 bits) can represent 256 values, and 32 bits can represent 4294967296 values (4 Giga).. Usually ECDSA will use 160 bits total, so that makes… well, a very huge number with 49 digits in it…

ECDSA is used with a SHA1 cryptographic hash of the message to sign (the file). A hash is simply another mathematical equation that you apply on every byte of data which will give you a number that is unique to your data. Like for example, the sum of the values of all bytes may be considered a very dumb hash function. So if anything changes in the message (the file) then the hash will be completely different. In the case of the SHA1 hash algorithm, it will always be 20 bytes (160 bits). It’s very useful to validate that a file has not been modified or corrupted, you get the 20 bytes hash for a file of any size, and you can easily recalculate that hash to make sure it matches. What ECDSA signs is actually that hash, so if the data changes, the hash changes, and the signature isn’t valid anymore.

Now, how does it work? Well Elliptic Curve cryptography is based on an equation of the form :

y^2 = (x^3 + a * x + b) mod p

First thing you notice is that there is a modulo and that the ‘y‘ is a square. This means that for any x coordinate, you will have two values of y and that the curve is symmetric on the X axis. The modulo is a prime number and makes sure that all the values are within our range of 160 bits and it allows the use of “modular square root” and “modular multiplicative inverse” mathematics which make calculating stuff easier (I think). Since we have a modulo (p) , it means that the possible values of y^2 are between  0 and p-1, which gives us p total possible values. However, since we are dealing with integers, only a smaller subset of those values will be a “perfect square” (the square value of two integers), which gives us N possible points on the curve where N < p (N being the number of perfect squares between 0 and p). Since each x will yield two points (positive and negative values of the square-root of y^2), this means that there are N/2 possible ‘x‘ coordinates that are valid and that give a point on the curve. So this elliptic curve has a finite number of points on it, and it’s all because of the integer calculations and the modulus. Another thing you need to know about Elliptic curves, is the notion of “point addition“. It is defined as adding one point P to another point Q will lead to a point S such that if you draw a line from P to Q, it will intersect the curve on a third point R which is the negative value of S (remember that the curve is symmetric on the X axis). In this case, we define R = -S to represent the symmetrical point of R on the X axis. This is easier to illustrate with an image : So you can see a curve of the form y^2 = x^3 + ax + b (where a = -4 and b = 0), which is symmetric on the X axis, and where P+Q is the symmetrical point through X of the point R which is the third intersection of a line going from P to Q. In the same manner, if you do P + P,  it will be the symmetrical point of R which is the intersection of the line that is a tangent to the point P.. And P + P + P is the addition between the resulting point of P+P with the point P since P + P + P can be written as (P+P) + P.. This defines the “point multiplication” where k*P is the addition of the point P to itself k times… here are two examples showing this :  

Here, you can see two elliptic curves, and a point P from which you draw the tangent, it intersects the curve with a third point, and its symmetric point it 2P, then from there, you draw a line from 2P and P and it will intersect the curve, and the symmetrical point is 3P. etc… you can keep doing that for the point multiplication. You can also already guess why you need to take the symmetric point of R when doing the addition, otherwise, multiple additions of the same point will always give the same line and the same three intersections.

One particularity of this point multiplication is that if you have a point R = k*P, where you know R and you know P, there is no way to find out what the value of ‘k‘ is. Since there is no point subtraction or point division, you cannot just resolve k = R/P. Also, since you could be doing millions of  point additions, you will just end up on another point on the curve, and you’d have no way of knowing “how” you got there. You can’t reverse this operation, and you can’t find the value ‘k‘ which was multiplied with your point P to give you the resulting point R.

This thing where you can’t find the multiplicand even when you know the original and destination points is the whole basis of the security behind the ECDSA algorithm, and the principle is called a “trap door function“.

Now that we’ve handled the “basics”, let’s talk about the actual ECDSA signature algorithm. For ECDSA, you first need to know your curve parameters, those are a, b, p, N and G. You already know that ‘a‘ and ‘b‘ are the parameters of the curve function (y^2 = x^3 + ax + b), that ‘p‘ is the prime modulus,  and that ‘N‘ is the number of points of the curve, but there is also ‘G‘ that is needed for ECDSA, and it represents a ‘reference point’ or a point of origin if you prefer. Those curve parameters are important and without knowing them, you obviously can’t sign or verify a signature. Yes, verifying a signature isn’t just about knowing the public key, you also need to know the curve parameters for which this public key is derived from.

So first of all, you will have a private and a public key.. the private key is a random number (of 20 bytes) that is generated, and the public key is a point on the curve generated from the point multiplication of G with the private key. We set ‘dA‘ as the private key (random number) and ‘Qa‘ as the public key (a point), so we have : Qa = dA * G (where G is the point of reference in the curve parameters).

So how do you sign a file/message ? First, you need to know that the signature is 40 bytes and is represented by two values of 20 bytes each, the first one is called R and the second one is called S.. so the pair (R, S) together is your ECDSA signature.. now here’s how you can create those two values in order to sign a file.. first you must generate a random value ‘k‘ (of 20 byes), and use point multiplication to calculate the point P=k*G. That point’s x value will represent ‘R‘. Since the point on the curve P is represented by its (x, y) coordinates (each being 20 bytes long), you only need the ‘x‘ value (20 bytes) for the signature, and that value will be called ‘R‘. Now all you need is the ‘S‘ value.

To calculate S, you must make a SHA1 hash of the message, this gives you a 20 bytes value that you will consider as a very huge integer number and we’ll call it ‘z‘. Now you can calculate S using the equation :

S = k^-1 (z + dA * R) mod p

Note here the k^-1 which is the ‘modular multiplicative inverse‘ of k… it’s basically the inverse of k, but since we are dealing with integer numbers, then that’s not possible, so it’s a number such that (k^-1 * k ) mod p is equal to 1. And again, I remind you that k is the random number used to generate R, z is the hash of the message to sign, dA is the private key and R is the x coordinate of k*G (where G is the point of origin of the curve parameters).

Now that you have your signature, you want to verify it, it’s also quite simple, and you only need the public key (and curve parameters of course) to do that. You use this equation to calculate a point P :

P=  S^-1*z*G + S^-1 * R * Qa

If the x coordinate of the point P is equal to R, that means that the signature is valid, otherwise it’s not.

Pretty simple, huh? now let’s see why and how… and this is going to require some mathematics to verify :

We have :

P = S^-1*z*G + S^-1 * R *Qa

but Qa = dA*G, so:

P = S^-1*z*G + S^-1 * R * dA*G = S^-1 (z + dA* R) * G

But the x coordinate of P must match R and R is the x coordinate of k * G, which means that :

k*G = S^-1 (z + dA * R) *G

we can simplify by removing G which gives us :

k = S^-1(z + dA * R)

by inverting k and S, we get :

S = k^-1 (z + dA *R)

and that is the equation used to generate the signature.. so it matches, and that is the reason why you can verify the signature with it.

You can note that you need both ‘k‘ (random number) and ‘dA‘ (the private key) in order to calculate S, but you only need R and Qa (public key) to validate the signature. And since R=k*G and Qa = dA*G and because of the trap door function in the ECDSA point multiplication (explained above), we cannot calculate dA or k from knowing Qa and R, this makes the ECDSA algorithm secure, there is no way of finding the private keys, and there is no way of faking a signature without knowing the private key.

The ECDSA algorithm is used everywhere and has not been cracked and it is a vital part of most of today’s security.

Now I’ll discuss on how and why the ECDSA signatures that Sony  used in the PS3 were faulty and how it allowed us to gain access to their private key.

So you remember the equations needed to generate a signature.. R = k*G and S= k^-1(z + dA*R) mod p.. well this equation’s strength is in the fact that you have one equation with two unknowns (k and dA) so there is no way to determine either one of those. However, the security of the algorithm is based on its implementation and it’s important to make sure that ‘k‘ is randomly generated and that there is no way that someone can guess, calculate, or use a timing attack or any other type of attack in order to find the random value ‘k‘. But Sony made a huge mistake in their implementation, they used the same value for ‘k‘ everywhere, which means that if you have two signatures, both with the same k, then they will both have the same R value, and it means that you can calculate k using two S signatures of two files with hashes z and z’ and signatures S and S’ respectively :

S – S’ = k^-1 (z + dA*R) – k^-1 (z’ + da*R) = k^-1 (z + da*R – z’ -dA*R) = k^-1 (z – z’)

So : k = (z – z’) / (S – S’)

Once you know k, then the equation  for S because one equation with one unknown and is then easily resolved for dA :

dA = (S*k – z) / R

Once you know the private key dA, you can now sign your files and the PS3 will recognize it as an authentic file signed by Sony. This is why it’s important to make sure that the random number used for generating the signature is actually “cryptographically random”.  This is also the reason why it is impossible to have a custom firmware above 3.56, simply because since the 3.56 version, Sony have fixed their ECDSA algorithm implementation and used new keys for which it is impossible to find the private key.. if there was a way to find that key, then the security of every computer, website, system may be compromised since a lot of systems are relying on ECDSA for their security, and it is impossible to crack.

Finally! I hope this makes the whole algorithm clearer to many of you.. I know that this is still very complicated and hard to understand. I usually try to make things easy to understand for non technical people, but this algorithm is too complex to be able to explain in any simpler terms. After all that’s why I prefer to call it the MFET algorithm (Mathematics For Extra Terrestrials) :)

But if you are a developer or a mathematician or someone interested in learning about this because you want to help or simple gain knowledge, then I’m sure that this contains enough information for you to get started or to at least understand the concept behind this unknown beast called “ECDSA”.

That being said, I’d like to thank a few people who helped me understand all of this, one particularly who wishes to remain anonymous, as well as the many wikipedia pages I linked to throughout this article, and Avi Kak thanks to his paper explaining the mathematics behind ECDSA, and from which I have taken those graph images aboves.

P.s: In this article, I used ’20 bytes’ in my text to talk about the ECDSA signature because that’s what is usually used as it matches the SHA1 hash size of 20 bytes and that’s what the PS3 security uses, but the algorithm itself can be used with any size of numbers. There may be other inaccuracies in this article, but like I said, I’m not an expert, I just barely learned all of this in the past week.

by kakaroto at February 01, 2012 02:05 AM

January 29, 2012

Summary of GStreamer Hackfest

Christian Fredrik Kalager Schaller

So as I talked about in my last blog post we had a great GStreamer hackfest. A lot of things got done and quite a few applications got an initial port over to 0.11. For instance Edward Hervey ended up working on porting the Totem video player, or rather trying to come up with a more optimized design for the Clutter-gst as the basis port was already done.

Another cool effort was by Philippe Normand from Igalia who put a lot of effort into porting WebKit to use 0.11. His efforts where rewarded with success as you can see in this screenshot.

Jonathan Matthew had flown up all the way from Australia and made great progress in porting Rhythmbox over to the 0.11 API, a port which became hugely more useful after Wim Taymans and Tim-Phillip Muller fixed a bug that caused mp3 playback not to work :) .

Peteris Krisjanis made huge strides in porting Jokosher to 0.11. Although like Jason DeRose from Novacut and myself on Transmageddon he did end up spending a lot of time on debugging issues related to gobject-introspection. The challenge for non-C applications like Jokosher, Novacut, Transmageddon and PiTiVi is a combination of the API having changed quite significantly due to the switch to gobject-introspection generated bindings, some general immaturity challenges with the gobject-introspection library and finally missing or wrong annotations in the GStreamer codebase. So once all these issues are sorted things should look a lot brighter for language bindings, but as we discovered there is a lot of heavy lifting to get there. For instance I thought I had Transmageddon running quite smoothly before I suddenly triggered this gobject-introspection bug.

There was a lot of activity around PiTiVi too, with Jean-François Fortin Tam, Thibault Saunier and Antigoni Papantoni working hard on porting PiTiVi to 0.11 and the GStreamer Editing Services library. And knowing Jean-François Fortin I am sure there will soon be a blog with a lot more details about that :) .

Thomas Vander Stichele, who also wrote a nice blog entry about the event, was working with Andoni Morales Alastruey, both from Flumotion, on porting Flumotion to 0.11, but due to some of the plugins needed not having been ported yet most of their effort ended up being on porting the needed plugins in GStreamer and not so much application porting, but for those of you using plugins such as multifdsink, this effort will be of great value and Andoni also got started on porting some of the non-linux plugins, like the directsoundsink for Windows.

Josep Torra from Fluendo ended up working with Edward Hervey on hammering out the design for the clutter-gst sink at the conference, but he also found some time to do a port of his nice little tuner tool as you can see from the screenshot below.

Tuner tool for GStreamer 0.11

George Kiagiadakis kept hammering away at the qtGStreamer bindings, working both on a new release of the bindings for the GStreamer 0.10 series, but also some preparatory work for 0.11.

In addition to the application work, Wim Taymans, Tim-Phillip Müller and Sebastian Dröge from Collabora did a lot of core GStreamer clean ups and improvements in addition to providing a lot of assistance, bugfixing and advice for the people doing application porting. All critical items are now sorted in 0.11 although there are some nice to have’s still on the radar, and Wim plans on putting out some new releases next week, to kickstart the countdown to the first 1.0 release.

As for my own little pet project Transmageddon, it is quite far along now, with both manually configured re-encodes and profile re-encodes working. Still debugging remuxing though and I am also waiting for the deinterlacer to get ported to re-enable deinterlacing in the new version. For a screenshot take a look at the one I posted in my previous blogpost.

by uraeus at January 29, 2012 05:17 PM

January 26, 2012

GStreamer Hackfest in Malaga update

Christian Fredrik Kalager Schaller

Things have been going really well here at the GStreamer Hackfest in Malaga. Thanks to the help of Ara and Yaiza from Nido Malaga, we have a great venue in downtown Malaga and they have also helped us greatly with sorting out food.
We have a great group of people here and are making great progress, and by tomorrow I hope we will have screenshots of quite a few applications running with GStreamer 0.11, for instance both Rhythmbox and Jokosher for instance is already screen shootable, if not fully functional :)

GStreamer Hackfest Malaga 2012

GStreamer Hackfest Malaga 2012

Also making good progress on Transmageddon, even if the move to GObject Introspection bindings are making things a bit more complicated. Screenshot below of the progress so far.

Transmageddon at Hackfest in Malaga 2012

Transmageddon at Hackfest in Malaga 2012

Also a big thanks to Fluendo who is sponsoring the lunches at the hackfest and Collabora who is sponsoring tonight’s dinner. Ensuring that no hacker is left hungry during this hackfest.

Update: Yaiza took these photos from the hackfest

by uraeus at January 26, 2012 02:12 PM

January 17, 2012

AdaCamp and Haecksen

Danielle Madeley

AdaCamp

Saturday was the very first AdaCamp, an unconference set up by the Ada Initiative to talk about issues facing women in open stuff, here in Melbourne.

The event was well attended by awesome feminists from a breadth of fields and some geographic diversity within Australia (including a surprising number of people from Perth). There were open source people, Wikimedia people, librarians and academics.

There were 21 sessions in total, in 3 streams, so I don’t know everything awesome that was discussed. I think the most important session I attended was the first one of the morning on Imposter syndrome. Also the session on getting women involved with open source, where I talked about the GNOME Outreach Programme for Women.

[Unfortunately I forgot to bring a notebook with me, and tiredness has caused my memory to let me down.]

Brianna and I supplied the morning and afternoon tea, which was various combinations of vegan, gluten-free and fructose-free. Some people asked for the recipes used, which I blogged here.

Haecksen

I drove to Ballarat yesterday to give my panel at the Haecksen miniconf at linux.conf.au1 on gender-focused outreach in open source.

It was, in my opinion, an extremely good panel. I was originally daunted by appearing on a panel with Pia Waugh, Leslie Hawthorn and Selena Deckmann (all of whom are amazingly talented), but I think I did okay. We covered a breadth of mentoring from school age girls, to open source mentoring with Summer of Code and specific programmes like GOPW with a little bit about mentoring in organisations. Unfortunately I feel like the panel ran out of time just when it was getting interesting, and I didn’t get to talk about optimising your mentoring and the differences I’ve found in mentoring people from high-context culture vs low-context cultures [1, 2].

We also talked a bit about improving your diversity (which was a bit off-topic, more of a carry-on from the talk preceding the panel). Key point for me was: people from minorities tend not to ask (or apply), if you want diversity, you have to seek it out. Furthermore, people tend to know people who are like them, to seek out diversity, ask for recommendations from people who are least like the people you already have.

~

Unfortunately, I’m not at linux.conf.au itself, so if you’re looking for me, I’m not there. If you want to talk to a Collaboran about what we can do for you or how awesome it is to work for us, you can find Daniel Stone or Dario Freddi at the conference.

  1. Many thanks to my employer Collabora for letting me be there.

by Danielle at January 17, 2012 12:29 AM

January 16, 2012

GNOME Beer event at FOSDEM 2012

Guillaume Desmottes

Despite what some stats may say, my biggest contribution to GNOME is not in bugs or code but in the organization of beer related events!

So I'm pleased to announce that, like each year, we'll have a GNOME Beer party on the Saturday night of FOSDEM (4th Feb). People seemed happy of the location of last year, so we decided to stay at "La Bécasse" in the city center. Feel free to add yourself to the wiki if you are planning to attend.

See you at FOSDEM!

by Guillaume Desmottes at January 16, 2012 03:58 PM

PulseAudio vs. AudioFlinger: Fight!

Arun Raghavan

I’ve been meaning to try this for a while, and we’ve heard a number of requests from the community as well. Recently, I got some time here at Collabora to give it a go — that is, to get PulseAudio running on an Android device and see how it compares with Android’s AudioFlinger.

The Contenders

Let’s introduce our contenders first. For those who don’t know, PulseAudio is pretty much a de-facto standard part of the Linux audio stack. It sits on top of ALSA which provides a unified way to talk to the audio hardware and provides a number of handy features that are useful on desktops and embedded devices. I won’t rehash all of these, but this includes a nice modular framework, a bunch of power saving features, flexible routing, and lots more. PulseAudio runs as a daemon, and clients usually use the libpulse library to communicate with it.

In the other corner, we have Android’s native audio system — AudioFlinger. AudioFlinger was written from scratch for Android. It provides an API for playback/recording as well as a control mechanism for implementing policy. It does not depend on ALSA, but instead allows for a sort of HAL that vendors can implement any way they choose. Applications generally play audio via layers built on top of AudioFlinger. Even if you write a native application, it would use OpenSL ES implementation which goes through AudioFlinger. The actual service runs as a thread of the mediaserver daemon, but this is merely an implementation detail.

Note: all my comments about AudioFlinger and Android in general are based on documentation and code for Android 4.0 (Ice Cream Sandwich).

The Arena

My test-bed for the tests was the Galaxy Nexus running Android 4.0 which we shall just abbreviate to ICS. I picked ICS since it is the current platform on which Google is building, and hopefully represents the latest and greatest in AudioFlinger development. The Galaxy Nexus runs a Texas Instruments OMAP4 processor, which is also really convenient since this chip has pretty good support for running stock Linux (read on to see how useful this was).

Preparations

The first step in getting PulseAudio on Android was deciding between using the Android NDK like a regular application or integrate into the base Android system. I chose the latter — even though this was a little more work initially, it made more sense in the long run since PulseAudio really belongs to the base-system.

The next task was to get the required dependencies ported to Android. Fortunately, a lot of the ground work for this was already done by some of the awesome folks at Collabora. Derek Foreman’s androgenizer tool is incredibly handy for converting an autotools-based build to Android–friendly makefiles. With Reynaldo Verdejo and Alessandro Decina’s prior work on GStreamer for Android as a reference, things got even easier.

The most painful bit was libltdl, which we use for dynamically loading modules. Once this was done, the other dependencies were quite straightforward to port over. As a bonus, the Android source already ships an optimised version of Speex which we use for resampling, and it was easy to reuse this as well.

As I mentioned earlier, vendors can choose how they implement their audio abstraction layer. On the Galaxy Nexus, this is built on top of standard ALSA drivers, and the HAL talks to the drivers via a minimalist tinyalsa library. My first hope was to use this, but there was a whole bunch of functions missing that PulseAudio needed. The next approach was to use salsa-lib, which is a stripped down version of the ALSA library written for embedded devices. This too had some missing functions, but these were fewer and easy to implement (and are now upstream).

Now if only life were that simple. :) I got PulseAudio running on the Galaxy Nexus with salsa-lib, and even got sound out of the HDMI port. Nothing from the speakers though (they’re driven by a TI twl6040 codec). Just to verify, I decided to port the full alsa-lib and alsa-utils packages to debug what’s happening (by this time, I’m familiar enough with androgenizer for all this to be a breeze). Still no luck. Finally, with some pointers from the kind folks at TI (thanks Liam!), I got current UCM configuration files for OMAP4 boards, and some work-in-progress patches to add UCM support to PulseAudio, and after a couple of minor fixes, wham! We have output. :)

(For those who don’t know about UCM — embedded chips are quite different from desktops and expose a huge amount of functionality via ALSA mixer controls. UCM is an effort to have a standard, meaningful way for applications and users to use these.)

In production, it might be handy to write light-weight UCM support for salsa-lib or just convert the UCM configuration into PulseAudio path/profile configuration (bonus points if it’s an automated tool). For our purposes, though, just using alsa-lib is good enough.

To make the comparison fair, I wrote a simple test program that reads raw PCM S16LE data from a file and plays it via the AudioTrack interface provided by AudioFlinger or the PulseAudio Asynchronous API. Tests were run with the brightness fixed, wifi off, and USB port connected to my laptop (for adb shell access).

All tests were run with the CPU frequency pegged at 350 MHz and with 44.1 and 48 kHz samples. Five readings were recorded, and the median value was finally taken.

Round 1: CPU

First, let’s take a look at how the two compare in terms of CPU usage. The numbers below are the percentage CPU usage taken as the sum of all threads of the audio server process and the audio thread in the client application using top (which is why the granularity is limited to an integer percentage).

44.1 kHz 48 kHz
AF PA AF PA
1% 1% 2% 0%

At 44.1 kHz, the two are essentially the same. Both cases are causing resampling to occur (the native sample rate for the device is 48 kHz). Resampling is done using the Speex library, and we’re seeing minuscule amounts of CPU usage even at 350 MHz, so it’s clear that the NEON optimisations are really paying off here.

The astute reader would have noticed that since the device’ native sample rate is 48 kHz, the CPU usage for 48 kHz playback should be less than for 44.1 kHz. This is true with PulseAudio, but not with AudioFlinger! The reason for this little quirk is that AudioFlinger provides 44.1 kHz samples to the HAL (which means the stream is resampled there), and then the HAL needs to resample it again to 48 kHz to bring it to the device’ native rate. From what I can tell, this is a matter of convention with regards to what audio HALs should expect from AudioFlinger (do correct me if I’m mistaken about the rationale).

So round 1 leans slightly in favour of PulseAudio.

Round 2: Memory

Comparing the memory consumption of the server process is a bit meaningless, because the AudioFlinger daemon thread shares an address space with the rest of the mediaserver process. For the curious, the resident set size was: AudioFlinger — 6,796 KB, PulseAudio — 3,024 KB. Again, this doesn’t really mean much.

We can, however, compare the client process’ memory consumption. This is RSS in kilobytes, measured using top.

44.1 kHz 48 kHz
AF PA AF PA
2600 kB 3020 kB 2604 kB 3020 kB

The memory consumption is comparable between the two, but leans in favour of AudioFlinger.

Round 3: Power

I didn’t have access to a power monitor, so I decided to use a couple of indirect metrics to compare power utilisation. The first of these is PowerTOP, which is actually a Linux desktop tool for monitoring various power metrics. Happily, someone had already ported PowerTOP to Android. The tool reports, among other things, the number of wakeups-from-idle per second for the processor as a whole, and on a per-process basis. Since there are multiple threads involved, and PowerTOP’s per-process measurements are somewhat cryptic to add up, I used the global wakeups-from-idle per second. The “Idle” value counts the number of wakeups when nothing is happening. The actual value is very likely so high because the device is connected to my laptop in USB debugging mode (lots of wakeups from USB, and the device is prevented from going into a full sleep).

44.1 kHz 48 kHz
Idle AF PA AF PA
79.6 107.8 87.3 108.5 85.7

The second, similar, data point is the number of interrupts per second reported by vmstat. These corroborate the numbers above:

44.1 kHz 48 kHz
Idle AF PA AF PA
190 266 215 284 207

PulseAudio’s power-saving features are clearly highlighted in this comparison. AudioFlinger causes about three times the number of wakeups per second that PulseAudio does. Things might actually be worse on older hardware with less optimised drivers than the Galaxy Nexus (I’d appreciate reports from running similar tests on a Nexus S or any other device with ALSA support to confirm this).

For those of you who aren’t familiar with PulseAudio, the reason we manage to get these savings is our timer-based scheduling mode. In this mode, we fill up the hardware buffer as much as possible and go to sleep (disabling ALSA interrupts while we’re at it, if possibe). We only wake up when the buffer is nearing empty, and fill it up again. More details can be found in this old blog post by Lennart.

Round 4: Latency

I’ve only had the Galaxy Nexus to actually try this out with, but I’m pretty certain I’m not the only person seeing latency issues on Android. On the Galaxy Nexus, for example, the best latency I can get appears to be 176 ms. This is pretty high for certain types of applications, particularly ones that generate tones based on user input.

With PulseAudio, where we dynamically adjust buffering based on what clients request, I was able to drive down the total buffering to approximately 20 ms (too much lower, and we started getting dropouts). There is likely room for improvement here, and it is something on my todo list, but even out-of-the-box, we’re doing quite well.

Round 5: Features

With the hard numbers out of the way, I’d like to talk a little bit about what else PulseAudio brings to the table. In addition to a playback/record API, AudioFlinger provides mechanism for enforcing various bits of policy such as volumes and setting the “active” device amongst others. PulseAudio exposes similar functionality, some as part of the client API and the rest via the core API exposed to modules.

From SoC vendors’ perspective, it is often necessary to support both Android and standard Linux on the same chip. Being able to focus only on good quality ALSA drivers and knowing that this will ensure quality on both these systems would be a definite advantage in this case.

The current Android system leaves power management to the audio HAL. This means that each vendor needs to implement this themselves. Letting PulseAudio manage the hardware based on requested latencies and policy gives us a single point of control, greatly simplifying the task of power-management and avoiding code duplication.

There are a number of features that PulseAudio provides that can be useful in the various scenarios where Android is used. For example, we support transparently streaming audio over the network, which could be a handy way of supporting playing audio from your phone on your TV completely transparently and out-of-the-box. We also support compressed formats (AC3, DTS, etc.) which the ongoing Android-on-your-TV efforts could likely take advantage of.

Edit: As someone pointed out on LWN, I missed one thing — AudioFlinger has an effect API that we do not yet have in PulseAudio. It’s something I’d definitely like to see added to PulseAudio in the future.

Ding! Ding! Ding!

That pretty much concludes the comparison of these two audio daemons. Since the Android-side code is somewhat under-documented, I’d welcome comments from readers who are familiar with the code and history of AudioFlinger.

I’m in the process of pushing all the patches I’ve had to write to the various upstream projects. A number of these are merely build system patches to integrate with the Android build system, and I’m hoping projects are open to these. Instructions on building this code will be available on the PulseAudio Android wiki page.

For future work, it would be interesting to write a wrapper on top of PulseAudio that exposes the AudioFlinger audio and policy APIs — this would basically let us run PulseAudio as a drop-in AudioFlinger replacement. In addition, there are potential performance benefits that can be derived from using Android-specific infrastructure such as Binder (for IPC) and ashmem (for transferring audio blocks as shared memory segments, something we support on desktops using the standard Linux SHM mechanism which is not available on Android).

If you’re an OEM who is interested in this work, you can get in touch with us — details are on the Collabora website.

I hope this is useful to some of you out there!

by Arun at January 16, 2012 12:22 PM

Last updated:
February 05, 2012 11:46 AM
All times are UTC.

Subscriptions