Ethical Analytics, Please

A couple of weeks ago, The Wall Street Journal ran an article entitled: “You Give Apps Sensitive Personal Information. Then They Tell Facebook.”.

More accurately, there are some developers who:

  • Use analytics services offered by firms (e.g., Google, Facebook) that some people believe behave unethically
  • Send personal data — such as whether the user is trying to get pregnant — to those analytics services

I hold out some hope that the developers fought these decisions and were overruled by pointy-haired bosses. Alas, I fear that too many people focus on what is being sent to their own Web services and do not think through what is being handed over to analytics services. And, since many places outsource analytics collection and analysis, privacy and security concerns can get magnified.

Fortunately, or perhaps unfortunately, I have not had to deal with analytics much personally. If I were implementing analytics, I would fight tooth and nail to do so ethically. Here are some of the things that I would be advising.

Own the Analytics Server

The biggest problem that people will have with what the Journal reported isn’t that apps collect private information. The problem is that they are perceived to share that private information with the likes of Facebook and Google. Facebook in particular has been slipshod, at best, over the years in terms of data privacy.

While you as a developer might think that Google and Facebook don’t look at your app’s analytics data, and while it’s possible that this is true, the perception is that you’re sharing the analytics data with Google and Facebook. After all, you’re sending it to their servers.

Ideally, analytics wind up being managed by some server that you control directly. The most likely option for that is for the analytics to be sent to some server that you provision by one means or another (Docker, VPS, etc.). This implies that you license the server and host it yourself, which is more complicated than simply outsourcing it. However, that complexity should be manageable and would greatly reduce the privacy concerns that other parties have with your analytics.

A theoretical option — one that I suspect may not exist — would be end-to-end encryption (E2E) of the analytics data. The machines that allow you to examine the analytics, generate reports, and so on do not have to be the same machines that are collecting the analytics from your apps. In principle, the “middleman” servers that collect the analytics could be collecting encrypted payloads of analytics records. A separate “analytics analyzer” that you operate yourself would collect those records, decrypt them, and let you see the results. The “middleman” servers could then be an outsourced service, with the encryption preventing that provider from getting at the actual analytics data itself.

Encrypt Data In Motion

Speaking of encryption, ensure that the app’s communications to the analytics server is using an adequate level of TLS or other on-the-wire encryption, so that nefarious people do not sniff on your network packets to steal information in transit.

Only Log Constants

I suspect that most users would be reasonably comfortable with you recording information about what “screens” (activities, fragments, etc.) the user visits and including that as part of your analytics data. I suspect that most users would be far less comfortable with you recording their location (e.g., GPS fix). Unfortunately, there is no easy way to automatically detect that the app is logging sensitive data.

However, you could try to enforce that you only log constants. For whatever client-side API the analytics service offers, create a Lint rule that will complain if you try logging something that is not a string literal or string constant.

(and hopefully there is a way for Lint rules to detect Kotlin string interpolation…)

Obviously, this would block far more benign things than logging user locations. However:

  • It is safe to say that hard-coded constants are not going to contain user-sensitive data

  • Automatic detection is much better than relying on manual audits that may never happen

Opt-In (Or At Least Opt-Out)

My guess is that current (e.g., GDPR) or future legislation will require apps to allow users to control whether analytics get collected or not. Ideally, you “get ahead of the curve” and offer this now. Ideally, it would be an opt-in choice, so the default is that analytics are not collected. At worst, make it an opt-out option in your PreferenceFragment or other settings screen.

Decline Unnecessary Metadata

The analytics client library might provide APIs to automatically collect lots of data about the environment: device model, OS version, screen resolution, and so on. We see this a lot with crash logging, but analytics may offer to collect similar stuff.

Try to minimize this. In particular, try to stop the collection of metadata that you are not going to need.

While fixed values like device model are not user-sensitive, too much metadata does start to make it possible to identify users across devices. The same sort of stuff that Panopticlick uses for Web browsers could be collected by analytics libraries in a native Android app.

Use an Open Source Client Library (and Vet It)

Try to use services that open source their SDKs. This allows you or some consultant to examine the library and see if it is doing something that you or your users might regret.

With luck, somebody else has already performed that analysis and has published a report that you can use. Just bear in mind that any such report will be for a specific version of the SDK, and so periodically you will need to find a newer report or vet the updated SDK yourself.

I know that there are some open source analytics options, and so most of what I recommend here should be possible. And, I will admit that this is more work than a lot of organizations will want to deal with. With luck, an ethical analytics service will emerge that emphasizes these sorts of features, and perhaps more, to help you avoid charges of invasions of privacy.

But, in general, treat your analytics data the same way that you treat your “real” data. And, treat your users the same, from a privacy and security standpoint, for both types of data. Do not consider analytics privacy and security to be something that you can ignore… unless you elect to skip analytics outright.

Find out about new posts on the CommonsBlog via the Atom feed, or follow @CommonsWare on Twitter!