Leaky APIs are Leaky

For years, I have been advising developers against doing the sort of thing seen in this Stack Overflow question, where the developer is hacking in arbitrary SQL to a query() on a ContentResolver. This approach is deeply risky, for two reasons:

  1. It assumes that the ContentProvider is using a SQL-compliant database, such as SQLite, and can process arbitrary SQL queries

  2. It assumes a particular way that the ContentProvider is assembling the SQL statement to execute, in terms of how the various query() parameters get concatenated or otherwise combined

In this case, apparently, the rules changed with Android Q, and the previous hacked SQL no longer is valid.

Tactically, the problem lies with the developers who try hacking in SQL. Strategically, the problem lies with the developers of MediaStore and other ContentProvider implementations. If you allow this sort of hack, this sort of hack is going to get used.

In other words, your API is the realm of the possible, which may or may not line up with what you think your API is. Any capabilities that your API leaks may get ruthlessly exploited by developers. If consumers of your component are doing something, and it works, that “something” is part of your API, whether you like it or not, because you leaked that capability.

We see this sort of thing all the time in the Android SDK, such as all the reflection games that developers played to access private classes, methods, and fields. Today, though, I’d like to focus on how the publisher of the API should be dealing with this sort of thing. There are several steps that you should consider taking:

  • Document Your Expectations: The documentation for your API should explain, in detail, what you are expecting and supporting for the inputs to that API. In the case of a ContentProvider, you should be documenting what the valid values are for things like the group-by clause and the order-by clause.

  • Sanitize Your Inputs: Where practical, add runtime support to validate that the inputs match what you documented and reject things that do not (e.g., throw an IllegalArgumentException). Make sure that your form of rejection explains what the developer is doing wrong, and ideally explains how to fix it.

  • Monitor For Transgressions: Keep tabs on how your API is used, by watching for references to it in developer support sites, whether those sites are your own (e.g., support board, GitHub issues) or not (e.g., Stack Overflow). If you see developers misusing your API, point it out, and make notes as to how they are misusing it so you can take further steps.

  • Document (or Support) the Anti-Patterns: Where you find those transgressions, document them, explaining why they are not supported. Or, find some way of supporting what the developers are trying to do, either via their approach or something else, and document that. One way or another, you need an official response to the misuse of your API, lest that misuse spread to other developers, fueled by these support sites.

  • Write Lint Rules: In some cases, you cannot sanitize the inputs at runtime (e.g., would add too much overhead). In those cases, if your API is in the form of an Android library, consider adding a Lint rule to it. Timber users are familiar with this, as Timber ships with Lint rules, such as the legendary “Using ‘Log’ instead of ‘Timber’” message. This can help steer developers away from known anti-patterns proactively.

  • Break Things, Carefully: At some point, you may decide to change your API in a way that breaks some hacks, such as what changed in Android Q to MediaStore. Where possible, try to have custom error messages that explain the change and steer the developer towards supported alternatives, rather than just failing with some generic error message (e.g., near "GROUP": syntax error) that might cause developers to think that your API itself is broken.

The worst thing to do is to ignore all of this. That’s largely what has happened with the system-supplied providers. As a result, lots of developers are relying on leaked capabilities (e.g., hacking in arbitrary SQL) and those developers are screwed when the API changes such that the capabilities are no longer leaked.

Playing “API cop” is not the most fun role of a developer advocate, but for prominent APIs with lots of developers using them, it is a necessary role.