Friday, February 17, 2012

Crypto is Not Broken

By Patrick Ball

On 14 February 2012, the New York Times reported that a Swiss team had found a weakness in a key algorithm used to make secure connections online. We were worried because the algorithm (called RSA) is also a central part of Martus, our self-encrypting database that backs itself up to a network of servers.

The bottom line: We've consulted with cryptographers and studied the Martus code, and we do not believe that there is a weakness affecting Martus users. The flaw turns out to be related to a design error in the implementation of RSA in specific "embedded" devices, specifically firewalls and routers. It's not a general problem with RSA, and there's no current risk to Martus users. The way this flaw emerged has motivated us to review Martus's security model, and we are pleased with how well it has stood up.

I've organized the detailed discussion as a series of questions.


What exactly is the problem?

In public key cryptography, there are two keys that complement each other. If one of the pair is used to encrypt a message, a website, or a Martus bulletin, the other key can be used to decrypt it. This is useful because a user can freely distribute one of her keys (called the public key) so that others can encrypt information that only the other key (called her private key) can decrypt. In Martus, sometimes users share bulletins by giving each other copies of their public keys (which we call "headquarters keys"). For websites and secure email, millions of public keys are published online so that people can make secure connections (Martus keys are not published, but they are exchanged directly among users who want to collaborate).

The kind of public key math most commonly used for most applications, including Martus, is called RSA. In this algorithm, the public key consists of two very large numbers (e, n), and the private key is three very large numbers (d, p, q), where n = pq. P and q are prime numbers, and much of the strength of the cryptography depends on the mathematical fact that even if you know n, it is hard for a computer to figure out what p and q are. This means that even if n is public (which it is because it's part of the public key), the private key (including p and q) is still a secret.

Calculating p and q from n is called factoring n, and it is very difficult. A key with an n with 232 digits (768 bits) was factored in 2009 using many years of computing time and dozens of computers spread around the world. The smallest keys in use now, 1024 bits, are approximately one thousand times harder to factor than 768 bit keys. Most secure projects use much larger keys; Martus keys are 2048 bits long.

However, in order for the keys to be secure, they must be impossible to guess; being very large is only part of the security. That means that p and q must be chosen randomly. On Tuesday, a paper by researchers from a Swiss team (Lenstra et al.) showed that thousands of public keys published on the Internet in key servers shared a divisor, that is, p or q. This represents thousands of keys out of the millions tested: approximately 0.2% of the keys they harvested shared either p or q. If two public keys can be found to share a divisor, it's easy to figure out the other divisor, and thereby have factored both keys' n's. If this can be done, neither key is secure, and that's very bad news.

On 15 February 2012, another team (Heninger et al.) released similar results. However, the Heninger et al. results were focused on keys generated by very specific models of electronic devices, especially routers and firewalls. This contained the beginning of an explanation of what the Lenstra et al. study had really found.

How did this happen?

Heninger et al. showed that specific kinds of devices produced keys for which p and q were chosen from a limited range. Although p and q are very, very large numbers hundreds of digits long, only a relative small range of possible numbers were being used to choose p and q. Experts we've talked with have told us that the analysis by Heninger et al. is of essentially the same data that Lenstra et al. are examining. This means the findings from both projects are pointing to the same weakness.

Both Heninger et al. and Lenstra et al. determined that many public keys published online include n's that shared p or q. This was detectable, and the keys were factored to yield their p and q (there are good, not-so-technical explanations of how this works at BoingBoing and at Ben Adida's blog).

The problem comes back to the selection of p and q, and it involves how computers choose random numbers. There are two parts of serious, computer-based random number generation: a "seed," or starting point, and a pseudo-random number generator. I'll take the latter first.

A pseudo-random number generator (PRNG) takes some number (the seed) and makes it into a sequence of new numbers (the random numbers we want to use). The trick is that a good PRNG produces a sequence of numbers such that even if you know the numbers it has generated recently, the next number is unpredictable.

However, PRNGs are deterministic. If you start from exactly the same seed, they will generate the same sequence of numbers. If you start from a very slightly different seed, the sequence will be totally different. Nonetheless, an identical starting seed will produce an identical sequence.

Therefore, we need to start from a really hard to predict place, and here's where the experts think those little devices failed. The seed for random number generation is usually created by looking at parts of a computer that are under constant change: how the hard disk is reading or writing; keystroke patterns; mouse movements; noise from a microphone or images from a web cam. Mix all this stuff up together in an equation (called a hash function), and you have a seed with high "entropy," stated simply, a lot of randomness.

The little devices Heninger et al. studied didn't get much entropy into their seeds. Their problem was that they were programmed to generate keys the very first time they were unwrapped and booted up. They're not user-driven machines, instead they're designed to be plugged in and basically run themselves. They didn't wait to collect enough churning info to seed their PRNG. On many occasions, they had only the same few bits of information in their seed, and that led to a limited range of results for p and q. We know this because in more than 12,000 cases (out of millions), the devices chose the same p or q values -- leading Lenstra et al. and Heninger et al. to be able to factor the n's of the public keys into the p's and q's of the corresponding private keys.

Does it affect Martus?

In short, this problem does not affect Martus. We use the Java programming language's mechanism for getting a high-entropy seed (SecureRandom) from the operating system, whether it's GNU/Linux, Windows, or Mac OS X. Then we pass the seed through Java's PRNG, and generate the key from the result. (Ok, we don't really do all that, the cryptography library we use -- BouncyCastle -- does most of it for us, but that's the process.) No weak seeds, no repeated p's or q's.

One more detail for security and programmer types: long ago, Java had some problems with seeding the PRNG, though the bug has been fixed since the Java Runtime Engine (JRE) version 1.4.1. We shipped the first production version of Martus (1.0) in December 2002 with JRE version 1.4.1.

What's the big picture?

We think the Times's conclusions, and those of Lenstra et al., are too strong. It's clear that the keys that Heninger et al. and Lenstra et al. identified have weaknesses; solid analysis is becoming available. The flaws come from the way certain devices created their keys, not in the RSA algorithm itself. We do not think that RSA should be discarded in favor of other public key algorithms, as some have suggested. The cryptographic software most commonly used for email security (PGP and gpg) does not seem to be compromised. We think this was a very useful lesson to all of us who publish crypto software ("mind your p's and q's," Heninger wrote), and we are taking the opportunity to review the last ten years of Martus security thinking. It stands up remarkably well, and we're proud of the code: since 1.0, there has not been even one security-related bug. We're pushing ahead, and we look forward to helping human rights groups keep their data safe in more ways and on more platforms.

Acknowledgements

The Martus team is grateful for helpful pointers and insight from cryptographers Phil Zimmermann (Zfone), Ben Adida (Mozilla), Ian Goldberg (University of Waterloo), and Seth Schoen (EFF). Any errors of fact or reasoning that remain are my (Patrick's) responsibility.

Friday, February 10, 2012

Interning in Guatemala on the Archive Project

A guest Beneblog by Max Schneider

People don’t typically associate boisterous merengue music with high-tech statistical analysis. Then again, I shouldn’t have been surprised when I heard the playful notes of a street band wafting through the window while in Guatemala with Benetech's Rights Data Analysis Group (HRDAG). You see, HRDAG is anything but a typical organization.

But first: why was I, a fresh graduate of UCLA, in Guatemala in the first place? (It’s funny, my parents asked the same question.) As an intern with HRDAG, I was part of the team analyzing data drawn from documents in the National Police Archive, a cache of approximately 80 million sheets of paper kept by the police system during the Guatemalan Civil War, a 36-year long conflict that ended in 1996. Our statistical analysis is being used as evidence for an upcoming trial charging former Police Chief Colonel Hector Bol de la Cruz with crimes related to the disappearance of a trade union and student leader in 1984.

Last August, I traveled with HRDAG statistician Dr. Megan Price to the Archive in Guatemala City to prepare the statistical evidence with our partners. The impact that this one report could make is momentous. In October 2010, two former police officers were sentenced to 40 years in prison for this disappearance and the expert testimony of HRDAG statistician Daniel Guzmán was a key factor in the judges’ decision. The court directed the prosecutors to investigate higher up the police’s chain of command and as a result, Bol de la Cruz was arrested last June.
Max in a room full of documents
After returning to Palo Alto, I continued to process and clean the data drawn in our latest sample of documents, which will be used to fortify our analysis for the trial. As my internship ended several weeks ago, I’ve reflected on lessons learned over the last 6 months, chief among them a sensitivity to how data gets handled and the importance of flexibility in this line of work.

My HRDAG internship introduced me to the common pitfalls and mistakes that can surface in data-driven human rights research, such as selection bias and unsubstantiated generalizations. Some of these issues stem from our desire to provide the communities our research serves with immediate conclusions, even those not necessarily supported by the data; however, though its process can be drawn-out, rigorous statistical research is worth its time investment because of the heightened transparency of its methods and accuracy of its results. Indeed, much of our work in Guatemala centered on framing our partners’ substantive questions in a way we could actually answer using Archive data.

Being flexible in finding new solutions was also critical to my success in the internship. On my first day in Guatemala, a freak hard-drive meltdown relegated my work laptop to a paperweight. Luckily, our partners loaned me a computer to use on the trip and despite using a new keyboard and system, I tried to not miss a beat in contributing to our expert testimony. In addition, I’ve learned alternate approaches to resolving the technical challenges that abound in statistical coding from the talented team of HRDAG statisticians I worked alongside. I’ve picked up more tricks of the trade in six months at HRDAG than I did in all of college.

Aside from these valuable lessons for my future in this field, the use of principled and scientific data analysis in securing long-overdue justice for police brutality has personal significance to me. As a Ukrainian Jew, I grew up hearing stories of unfair and often-brutal police discrimination. To help stamp out such practices in another country that’s been devastated by them feels more than meaningful – it feels imperative.

Being able to experience Guatemala during and after my time there with HRDAG helped me understand the effect that the service of justice could bring. When I was in Santiago Atitlán, a small lakeside village where military troops murdered 25 people in 1980, I visited a cemetery for victims of the massacre, where I met a young man who lost his family in the war. Bursting into tears as he described his memories, he exclaimed to me “Soy un guerrillero ahora” – “I am now a guerrilla.” Seeing justice brought forth, even several decades too late, could help soothe scars acquired during the war and bolster trust in the government and police for Guatemalans like him.

I had to bid hasta luego to the HRDAG team and Benetech family in December but am grateful for six months of seriously hands-on work on such a unique project. While I’ve grown exponentially as a statistician and a coder, the biggest benefit has been learning from the fantastic people here. The dedication of my colleagues to find facts about human rights violations that can stand up in both the court of law and peer-reviewed academic journals has inspired me to make statistical analysis for the greater good my life pursuit.

Tuesday, February 07, 2012

Bookshare International Now Serves Thirty Countries

People with print disabilities around the world have a right to high-quality ebooks that they can read with assistive technology. Benetech’s Bookshare library continues to expand its international service providing accessible books and publications to members in more than 30 countries. Our international Bookshare service recently announced new partnerships with three organizations that are reaching out to readers with print disabilities in their home countries. These partners include the Norwegian Library of Talking Books and Braille (NLB), the Hoerbuecherei des OSBV Talking Book Library in Austria, and the Dorina Nowill Foundation in Brazil.
 Benetech looks forward to working with all these groups to provide the latest books, especially textbooks (primarily in English). 
These ebooks can quickly be turned into Braille, large print or be read aloud by a synthetic voice synthesizer.

Bookshare International members now have access to more than 50,000 titles, including books in Spanish, German, French, Hindi, and Tamil, and a collection of textbooks in Afrikaans. Our Bookshare team has also been working on a special project with Qatar’s Mada Assistive Technology Center to add Arabic-language books to the collection. These texts will be available in early 2012.


After books are scanned and uploaded into the digital Bookshare library, they are still used in their printed form by readers around the world. In the rural region of Ranipet in Tamil Nadu, India, 500 Bookshare texts, which had their spines removed for scanning, have been acquired by the Vedavalli Vidyalaya school. Vedavalli Vidyalaya serves more than 20 villages in and around Ranipet and is one of the few schools in a 100-mile radius to deliver high-quality education to students who cannot travel to the nearest city of Chennai. This is the area’s very first library and we are told that the kids there very happy to have access to these books. The texts have all been nicely rebound, thanks to Bookshare’s local book processing partner, Worth Trust.

Saturday, February 04, 2012

Reverse Engineering the Nissan Leaf's Range Display

Geeks are textbook early adopters. We can't resist a new gadget, or pushing the envelope to find out what's really possible!

I'm no exception to this rule. Recently, our family took delivery of a new Nissan Leaf, a completely electric vehicle. I now breeze by gas stations with $4 per gallon prices and think how I'll never pull into one of these places and spend dozens of dollars to fill up.

Of course, the downside to all-electric is that when your battery runs out, you don't have a little gas engine to get you home. This causes range anxiety, which is the fear of being stranded on the side of the road because of exceeding the range of the battery charge.

Nissan Leaf with Benetech license plate

The Leaf range display is somewhat twitchy. It's trying to predict how many miles you have left in your range (as a one-way measurement), depending on battery charge, your driving habits, speed, temperature, use of climate control and so on. As a geek, I knew that the engineers designing the Leaf would design in a safety factor in this display. So, I had a lingering question: how far can you drive a Leaf beyond when it says it's out of juice?

Recently, I decided to test this. My wife and I had been invited to a nice party in Berkeley thrown by a great couple. Score! This was my chance to test out my theories. However, my wife, Virginia, was not enthusiastic. She was feeling that range anxiety and wasn't interested in some of the possible failure modes of this experiment.

Luckily, one of Benetech's board members, Jim Kleckner, and his wife Jennifer was also invited to the same party. Like me, Jim also went to Caltech, and his geek credentials are even stronger than mine. Jim was eager to join me in this geek quest to reverse engineer the display question by experiment. So, a deal was struck. Virginia and Jennifer would go in the Kleckner's Prius (which is a gas-electric hybrid with plenty of range) and the two Jim's would take the Leaf.

The party was great, and we stayed up in Berkeley for the night. The next morning after breakfast, Jim and I set forth to test the Leaf's range. When we started, the range display showed Palo Alto as out of reach. So, we turned off the air conditioner, and drove more carefully to try to stretch the range. Jim found another cool display option for the Leaf's power system, and we were geeking out trying different things to twitch the separate range display.

About halfway to Palo Alto, Jim did a quick calculation with his smartphone using Google Maps: he figured we would run out of juice about at the middle of the Dumbarton Bridge, the southernmost bridge over the San Francisco Bay. "Great!" I exclaimed, "we can coast home from there!"

Soon, the car started all sorts of warnings that we were running out of energy. No chance missing that we were reaching the twilight zone of our range! Low and behold, just after passing the apex of the bridge, the display flashed and our range went dark. The route mapping said we had four more miles to go: would we make it, or would we need to call our doubting spouses and admit defeat?! How much safety margin had the Leaf engineers built in?

We took the least distance route home, trying to minimize our risks. Soon, we passed by my house, where Virginia and Jennifer were still sitting in the Prius, waiting for us to show (they had gone faster because of their lack of range anxiety). We drove around the neighborhood, passing joggers and cyclists (also a group that doesn't suffer much from range anxiety), as we kept track of the miles we'd gone with no range showing on the display. Five, six and seven miles all passed as we circled the area, freely burning electrons. Finally, at the 8.8 miles, the famed "Turtle mode" icon showed up on the dashboard. While in turtle mode, the Leaf barely accelerates, trying to encourage you to seek a friendly power outlet (the car has a special 120V linking cord in the trunk, in case you need a charge from a normal outlet rather than the 240V outlets that are typically used).

I had pushed my luck while circling, and we were half a mile from my house. Would we make it? Would we end up pushing the car at the end? But, the gentle speed of turtle mode conserved the last kiloJoules of electrical energy and we glided to a stop in front of the charging station at my house. We'd made it!

Melodrama aside, many people have asked me how practical an electric car with a range of roughly 100 miles is. And for me, it's turned out great. It's rare for me to drive further away than San Jose, San Francisco or Berkeley. So, I can pretty much count on being able to make a round-trip to these destinations when I'm not taking the train into San Francisco. I've had terrible luck with public charging stations in the area: I think I'm 0 for 5 on attempts to use the existing chargers (wrong connectors, broken, or some kind of payment system other than the one that I got with my Leaf and its home charging station). But, just relying on my home charger, I can pretty much do all the driving I need to do in a day with the overnight charge. I believe electric cars are going to be a great tool for most people, especially for families with at least one other car.