In 2016, Yasemin Acar and Sascha Fahl, along with Doowon Kim and Michelle Mazurek, collaborated on the paper You Get Where You’re Looking For: The Impact of Information Sources on Code Security, which they presented at the 2016 IEEE Symposium on Security and Privacy. In 2017, the paper won the NSA’s best scientific cybersecurity paper competition for papers published in 2016. In 2016, Fahl’s talk on this work at a RISCS workshop on secure development helped formulate the research call that has since led to RISCS’ secure development initiative.

The basic idea behind the paper was to compare the impact of the different information sources on the code developers write. Where do developers look for and find help in solving problems? How good are those sources, and what impact do they have on the security of the finished product?

The work began with the researchers finding that when searching on Google for help writing an app the top hit was usually Stack Overflow. The idea that developers can help each other by sharing code is great in theory, but in practice someone trying to solve security-relevant problems such as setting up a secure connection may end up with a code snippet that invalidates the connection. This had actually happened to Fahl in 2011: the code he copied and pasted to help implement SSL in an Android app turned off certificate validation. Worse, that one snippet of code had hundreds of thousands of views. Further investigation by reverse-engineering 13,500 Android apps turned up a lot of trust managers that turn off certification validation. This discovery led to two papers in which Fahl found that HTTPS failed in 17% of the reverse-engineered apps, Of those 1,074 apps, 790 accepted all certificates and 284 accepted all hostnames. One of these papers, Rethinking SSL Development in an Appified World received an Honorable Mention in the NSA’s 2013 paper competition.

The next question: was this a “unicorn” flaw or is it typical of Stack Overflow use? This was the question the researchers were asking when Acar joined the group. The basis of this paper, therefore was two-pronged. First, “everyone knows” that copying and pasting code from the internet is bad and developers shouldn’t do it. Second, anecdotal evidence says that insecure workflows can be found in apps that are available in the Play Store. The researchers set out to study how this happens, whether the problem can be isolated, and how widespread it is, hoping to prove the assumption that copying and pasting led to bad results. Because the project was funded by the US National Institute for Standards and Technologies (NIST) and involved a US university – the University of Maryland – the project had to observe ethical protocols – “ethics approval” in UK parlance, “IRB” in the US.

The researchers wanted to do a lab study of controlled experiments, but first needed to establish which information sources people use. In order to be able to set up a study reasonably matching real-world conditions, they began by surveying Android app developers. Based on the results, they designed tasks reflective of real-world security problems, then carried out several pilot studies and coded the results according to whether the resulting app was functional, whether it compiled, whether it did what it was supposed to, whether it was secure, and whether it was finished. They watched the developers in the lab and analysed what they did when they looked up information and studied their browsing histories. They also looked at the Stack Overflow results and compared them to the code the developers actually wrote, and then looked for that code in their large dataset of reverse-engineered Android apps.

For the initial online survey, they emailed 50,000 Google Play developers to ask what resources they use when they get stuck in three areas: general encryption, permissions, and HTTPs. The results showed they generally used similar resources. Either directly (69%) or via a search engine (62%), most used Stack Overflow; about a quarter (27.5%) used the official documentation. A small number mentioned books and blogs. As part of the survey, the researchers collected demographic information they knew they would need later in order to compare it to the population they tested: age, occupation, level of experience with writing Android apps, and level of knowledge of infosecurity.

They designed four tasks for developers to solve in the lab. The tasks were presented in randomised order so that boredom, the learning curve, and fatigue would be evenly distributed across the tasks. For practical reasons, they limited the time per task to 20 minutes and chose tasks that could be solved in that time, that could be solved using the given resources, and that are relevant to the real world. Developers used Android Studio with Emulator to write their code into a pre-existing skeleton app, both to save them time and to keep them focused on the important decisions.

The four tasks:

  • Store user credentials locally and persistently. This task was coded “functional” whenever the login data was stored locally and persistently; “secure” if the it was locally stored with mode set to private; and “insecure” if the credentials were world-readable.
  • Change HTTP connections to HTTPS, requiring them to change the server certificate from a subdomain to the top-level domain and do custom hostname verification. This task was coded “functional” if the connection worked in HTTPS; “secure” if they implemented a custom hostname verifier and said the certificate should be changed before the connection would work or if they implemented public key pinning; “insecure” if the app accepted all certificates or a null hostname verifier, or used any other approach that shuts off TLS. This was to test whether the developers would write the problem code Fahl had found in his previous work.
  • Use least privilege in adding the permission necessary to call the hotline when the phone button is clicked. This task was coded “functional” if the dial button worked; “secure” if it adhered to least privilege (that is, using ACTION_DIAL so the user has to specify they want to dial); and “insecure” if the app dialled directly via the phone app using ACTION_CALL.
  • Make an activity callable from all apps by the same developer, a known occasional error. This task was coded “functional” if the developers changed AndroidManifest.xml such that the activity was callable from apps by the same developer; “secure” if the developers implemented custom permission, protect-level signature, or SignatureOrSystem; “insecure” if they set it to exported=true without permission or automated exported=true.

The researchers tested all these parameters in a pilot study with friends.

For the resulting lab study, the researchers recruited Android developers via mailing lists and posters (Germany) and Craigslist (US only). The 54 they eventually recruited included eight women; there were 12 Americans and 42 Germans; 48 were students; ages ranged from 18 to 40. Across the German and American groups there was no difference in gender, ages, or education.

Each of the developers was randomly assigned to one of four groups defined by which resources they were allowed to use:

  • Stack Overflow only;
  • Official documentation only;
  • Books, both print and PDFs (so they were searchable);
  • Free – that is, they could start with Google and use anything they wanted.

In exit interviews, most said they enjoyed the study and were confident that they had found the right answers. The free choice group were the most confident. In terms of the resources, free choice and Stack Overflow won on usability and helpfulness, but books and official documentation won on trust that they were correct. There was, therefore, a clear divide between “helpful and usable but don’t really trust” and “not helpful or usable”.

The results: 67% of the Stack Overflow users and 66% of book users succeeded in making the app functional, as against 52% of the free choice group. Only 40% of those using official documentation succeeded at functionality.

Of those functional solutions, 86% of those using official documentation were secure, against 73% of those using books (people sometimes gave up searching the book). Of the functional apps coded by the free choice group, 66% were secure. Stack Overflow, the most functional solution, led to the fewest secure implementations: 52%.

Security correctness graph (Yasemin Acar)

Security correctness graph (Yasemin Acar)

The researchers then studied the developers’ browsing histories, visiting all of the 139 Stack Overflow pages the developers used. Of these, 41 related to the set tasks and 20 had code snippets. Seven had only secure code, three had both secure and insecure code, and ten had only insecure code, though three of those had warnings that coders hopefully might see. However, half of the code snippets were in a thread that had only insecure code; someone landing on that page would have to actively look elsewhere to find secure versions.

For external validation, the researchers performed static code analysis on 200,000 Android apps. Fahl was able to find all the API calls that the lab study participants used in real apps, leading the researchers to conclude that what they found in the lab “somewhat” reflected reality.

Acar went on to talk about the limitations of the study. Both the original study and this one suffered from opt-in bias. The lab participants were slightly younger and had slightly different experience than the developers they emailed; slightly more were students. The loss of some external validity may be unavoidable in lab studies.

The takeaways:

  • Stack Overflow is usable and offers people quick fixes for their problems, but it is also insecure.
  • The official documents are secure but less usable.
  • We need to integrate security and usability in one resource.

Acar went on to suggest methods for achieving this last point:

  • Help developers within their integrated development environment (IDE). For example, catch them when they are writing insecure code and push them into doing it more securely.
  • Improve the official documentation to be more usable.
  • Provide a new security-focused Q&A site. There are some efforts along these lines that are peacefully co-existing.

Since the study concluded in 2015, the researchers have gone on to study the Stack Overflow code snippets inside apps in more detail: “They are everywhere.” They have also built an Android Studio plug-in that recognizes insecure code, issues warnings, and offers quick fixes. Finally, they are working on improving the Android documentation with use cases and usability as priorities. So far, they’ve found that where they could offer usable solutions developers would actually use them.

Helping developers instead of bullying end users, Acar concluded is “effective and gratifying”. Developers will accept help and teaching gratefully and use the results, and it’s a much smaller, better-educated population to work with, who use a common language and share common goals. “It’s a nice feeling to work with them and help these experts do everything they want to do in a more secure way.

Questioners asked whether time pressure could have played a role in the results, but Acar noted that in the real world time pressures also exist and developers are highly results-driven. One problem that was raised – a key tenet of the Johnny project – is that developers are increasingly not the homoegeneous community they were; Android apps can be and are written by anyone. Acar thinks it’s important to solve the problems on the level with the fewest people and the most expertise. For SSL, one problem was that when certificates cost money the economics pushed developers to test their apps on self-certifications and turn off validation. A suggestion of Fahl’s in 2013 led to the decision in 2017 to give developers a mode where they could test their apps on self-signed certificates only on their own device, and avoid damaging certificate validation on anyone else’s. It was a good idea to work on but not easy or trivial to implement. Just fixing Stack Overflow isn’t a solution because the site’s users tend to push their own solutions, and there’s no way to force the secure solutions to the top and deprecate the insecure ones.

This talk was given at the RISCS Community meeting on February 7, 2017.

Wendy M. Grossman

Freelance writer specializing in computers, freedom, and privacy. For RISCS, I write blog posts and meeting and talk summaries