Facebook’s evolutionary search for crashing software bugs Ars gets the first look at Facebook's fancy new dynamic analysis tool.
1.3 billion daily users, the Facebook site and its apps are the most-used pieces of software in the world. Only a handful of software companies have ascended to a similar echelon of ubiquity, including Microsoft, Google, and Apple. For better or worse, that is the world we now live in, where a large percentage our waking hours is spent interacting with software—and Facebook leads the pack, with the average user spending 50 minutes per day mostly watching videos and liking photos of babies. Television is the only leisure activity in the world that receives more attention than Facebook. And don't forget that Facebook now owns Instagram and WhatsApp, too.

Enlarge / Look, it's really hard to find stock imagery for "evolutionary algorithm."
Adobe Stock
That's why Facebook has some advanced bug-finding tools—including a devilishly clever dynamic analysis tool, initially devised by students at University College London and then acquired and further developed by Facebook's London office. This is the first time they've shown the inner workings of this new tool, dubbed Sapienz, to the press.
Static vs. dynamic analysis
Each technique serves a different purpose, and a big software company would usually use both. Static analysis is perfect for formally verifying that an algorithm works as intended, or for highlighting bad code that might allow for a buffer overflow or other security vulnerability. Dynamic analysis is better at finding the gnarly edge cases that cause crashes. Humans can manually perform both analyses, of course, but computers are obviously a lot quicker when it comes to testing millions of possible inputs.
Facebook's static analyser is called Infer. The company open-sourced the tool in 2013, and a lot of big names (Uber, Spotify, Mozilla) use it. There isn't a whole lot to say about it, other than it seems to be very popular and effective; download it today!
Sapienz
Sapienz has three main tricks up its sleeve. First, it uses a search-based evolutionary algorithm, rather than a random or model-based approach. Second, the fitness function that guides how the algorithm evolves is incredibly complex: there are multiple objectives, entwined by Pareto optimality, that must be fulfilled for each evolutionary change to be a success. And third, Facebook can run Sapienz on its One World test platform, which lets engineers find crashing bugs on hundreds of different Android devices simultaneously. (Sapienz only supports Android apps currently, though there are plans to expand to other platforms and app types.)

Enlarge / A block diagram of Sapienz. It might make a bit more sense if you finish reading the story, then try to decode this.
The key to a successful evolutionary algorithm is its fitness function. I'm not your college computer science lecturer, so I won't go into too much detail, but a fitness function essentially looks at the result of a test case, and decides how close that result is to a desired outcome/objective. The results that don't fulfil the fitness function are tied up in a burlap sack and thrown in the river; the good ones are bred together, to form the basis of the next round of testing.
According to Facebook's engineers, most of their secret sauce is in Sapienz's fitness function, which has three objectives: to test as many of the app's methods and statements as possible, find as many crashes as possible, and minimise the length of the test sequences required to cause crashes. The first two are all about producing better, crash-free software; the third is to improve the efficiency of the system, so that a decent number of crashes can be found in a reasonable amount of time.
These three objectives are assessed by the fitness function for Pareto efficiency. That is, one objective isn't more important than the others: if the evolutionary algorithm is only producing long test sequences, but they're providing good coverage and finding lots of crashes, then those long tests will be kept alive. Over time the system tries to hit Pareto optimality: where it's impossible to improve one objective without negatively impacting another. So, in this case, the algorithm would attempt to reduce the test sequence length without reducing coverage or the fault revelation.

Enlarge / Here are a couple of graphs/plots showing the code coverage of Sapienz versus Monkey and Dynodroid. As you can see the coverage varies a bit depending on program size.
Sapienz also strays slightly across the border into static analysis: it attempts to reverse-engineer the app (an Android APK in this case) to pull out some strings, which it then uses as natural-language inputs when testing begins. "We found this seeding to be particularly useful when testing apps that require a lot of user-generated content, e.g., Facebook, because it enables Sapienz to post and comment in an apparently more human-meaningful way," say the researchers.