Blending Data Sources

A repository is not limited to only using a single data source.

Sometimes, the repository will use multiple data sources that are holding model objects. For example, the repository might implement a two-level cache (memory plus disk), with the “system of record” being a Web service. For some requests, the repository would need to check some or all of these locations for the data.

Sometimes, while the primary model data resides in one data source, ancillary data resides in another. For example, perhaps the model data contains a URL to an image, or an image URL can be derived from the model data (e.g., an avatar icon). A typical approach is to have the repository ignore that, allowing the UI layer to request the image from an image-loading library (e.g., Picasso, Glide), which serves as its own repository. But if you need to perform transformations on that image, you might find it better to handle that yourself in your own repository. In that case, while most of the model data might come from one place (local database, Web service), other aspects of that model data might come from another place.

The Diceware/Pwned sample project will demonstrate this, by validating our randomly-generated passphrase to confirm that it has not already been “pwned”.

Pwned Passwords

In the parlance of Troy Hunt, a password is “pwned” if it has appeared in a data breach. Sometimes, the breach is of a database where passwords were held in plaintext (inexplicably). Sometimes, the copied database used hashing, but the hash was weak and passwords could be obtained using rainbow tables and other attacks.

Mr. Hunt maintains a database of “pwned passwords”. There are several ways of using this database, including a lightweight Web service API.

In particular, this API provides a way to check a password against the database without disclosing the password itself to the Web service. Instead, the client:

That response will be several hundred entries, but that is still small enough to scan for matches against the rest of the password’s hash. This way, all the Web service knows is that you have a password whose hash starts with those five characters; the password itself is never passed to the Web service. This is a nice way to validate a password against the database, as the candidate password is never disclosed.

In principle, a diceware app should not need to use this Web service. It is very unlikely that a randomly-generated set of words from decently-long word lists will have been used previously as a password. It is even less likely that it will have been used previously as a password that wound up in Mr. Hunt’s database. However, “very unlikely” is not the same as “impossible”, and since the Web service API is easy to use, it may be worthwhile to check the generated passphrase.

There are other cases where using this Web service API is more important, such as:

PwnedCheck

The PwnedCheck class provides a simple RxJava/OkHttp wrapper around the Web service. The Web service does not use JSON, XML, or other conventional data structures for its response, and so a dedicated REST client API like Retrofit will not help much here.

PwnedCheck has a one-parameter constructor that takes an OkHttpClient instance. If you are already using OkHttp, pass in your existing OkHttpClient instance, so you can share the pools that OkHttp maintains (threads, connections, etc.). Or, just create a new OkHttpClient() and pass that in.

PwnedCheck exposes two methods that supply Observable responses related to passphrase validation: score() and validate().

score()

The core one is score(). This will return an Integer — via an Observable — representing the number of times the supplied passphrase appears in the database, or 0 if it is not in the database at all:

  Observable<Integer> score(String passphrase) {
    return Observable.just(passphrase)
      .map(PwnedCheck::getSha1Hex)
      .flatMap(this::fetchCandidates)
      .map(PwnedCheck::findCount);
  }

Here, we start an Observable chain just() on the passphrase. Then, we use map() to convert that passphrase to its SHA-1 hash, using getSha1Hex():

  // based on https://stackoverflow.com/a/33260623/115145

  private static String getSha1Hex(String original) throws Exception {
    MessageDigest messageDigest=MessageDigest.getInstance("SHA-1");

    messageDigest.update(original.getBytes("UTF-8"));

    byte[] bytes=messageDigest.digest();
    StringBuilder buffer=new StringBuilder();

    for (byte b : bytes) {
      buffer.append(Integer.toString((b & 0xff)+0x100, 16).substring(1));
    }

    return buffer.toString();
  }

Then, we flatMap() to get a new Observable, one that wraps around OkHttp to make the REST request of the Web service, via fetchCandidates():

  private Observable<FetchResult> fetchCandidates(String sha1) throws IOException {
    String url="https://api.pwnedpasswords.com/range/"+sha1.substring(0, 5);
    Request request=new Request.Builder().url(url).build();

    return Observable.fromCallable(
      () -> new FetchResult(okHttpClient.newCall(request).execute(), sha1));
  }

The URL used for the Web service is simply a particular base URL with the first five characters of the SHA-1 has appended as a path segment.

fetchCandidates() returns a FetchResult, wrapped in an Observable. A FetchResult is a simple POJO wrapping our hash and the Response object from OkHttp:

  private static class FetchResult {
    final Response response;
    final String sha1;

    private FetchResult(Response response, String sha1) {
      this.response=response;
      this.sha1=sha1;
    }
  }

score() completes the chain by converting the FetchResult into the count of occurrences of the passphrase, using findCount():

  private static int findCount(FetchResult fetch) throws IOException {
    String candidates=fetch.response.body().string();
    String suffix=fetch.sha1.substring(5).toUpperCase();

    for (String line : candidates.split("\r\n")) {
      if (line.startsWith(suffix)) {
        return(Integer.parseInt(line.split(":")[1]));
      }
    }

    return 0;
  }

The response from the server is a series of lines. Each line contains the trailing 35 characters of the SHA-1 hash, after the common five-character prefix. Each line also has the number of occurrences of that hash in the database, separated from the hash suffix via a colon. So, findCount():

So, the chain set up by score() results in you getting that score: the number of times the passphrase appears in the database, or 0 if it is not in the database at all.

Note that the Observable chain set up by score() performs network I/O, so clients will want to use subscribeOn() or something to ensure that the work is performed on a background thread.

validate()

Most of the time, though, you do not need the score. You just need to know if the passphrase appears in the database.

One way to do that would be to have an Observable of Boolean, with false indicating that the passphrase is invalid (i.e., it is in the database and therefore is pwned). Another approach is to have an Observable chain that throws an exception for invalid passphrases. This approach can then be used as part of RxJava’s “retry” options — in the case of this app, we can generate another random set of words and try again.

So, validate() wraps the score() Observable and yields one of two outcomes:

  Observable<String> validate(final String passphrase) {
    return score(passphrase).map(score -> {
      if (score>0) {
        throw new PwnedException();
      }

      return passphrase;
    });
  }

  private static class PwnedException extends RuntimeException {

  }

Adding OkHttp and INTERNET

To make all of this work, the project has the OkHttp dependency, plus it has the INTERNET permission requested in the manifest.

Integrating PwnedCheck

Given the PwnedCheck class, our Repository can now blend it into its work for generating random passphrases based on word lists.

Modifying the Model

In the earlier sample, we considered the “model” to be a list of strings, of a UI-determined length. The UI was responsible for combining those into a passphrase using some delimiter (in this case, a space).

However, the Pwned Passwords API wants a simple passphrase as a String. So, we need to modify the app to have the model be a single String, not a list. And, we will need to have the Repository create the combined string.

This requires a few changes to consumers of the Repository.

PassphraseViewModel now has a MutableLiveData of String:

  private final MutableLiveData<String> livePassphrase=new MutableLiveData<>();

This also affects the method used to get that LiveData, now renamed to be passphraseStream():

  LiveData<String> passphraseStream() {
    return(livePassphrase);
  }

When PassphraseFragment observes that LiveData, it no longer needs to use TextUtils to join the words into a single string. Instead, it gets the already-joined words and can just put them into the TextView:

  @Override
  public void onViewCreated(View view, Bundle state) {
    super.onViewCreated(view, state);

    passphrase=view.findViewById(R.id.passphrase);
    viewModel=ViewModelProviders
      .of(this, new PassphraseViewModel.Factory(getActivity(), state))
      .get(PassphraseViewModel.class);
    updateMenu();
    viewModel.passphraseStream().observe(this,
      newPhrase -> passphrase.setText(newPhrase));
  }

Validating the Passphrase

The original sample’s Repository had a getWords() method that randomly selected the words out of the given source:

  Single<List<String>> getWords(Uri source, final int count) {
    return(getWordsFromSource(source)
      .map(strings -> (randomSubset(strings, count))));
  }

That method is now getPassphrase(), and it involves a slightly longer Observable chain:

  Observable<String> getPassphrase(Uri source, final int count) {
    return(getWordsFromSource(source)
      .map(strings -> (randomSubset(strings, count)))
      .map(pieces -> TextUtils.join(" ", pieces))
      .flatMap(checker::validate)
      .retryWhen(errors -> errors.retry(3)));
  }

The first two steps in the chain — getWordsFromSource() and the map() for randomSubset() — are what we had originally.

Next, we use another map() to join() the words here, rather than in the PassphraseFragment as before.

Then, we can use flatMap() to pull in our PwnedCheck instance, held in a field named checker:

  private final PwnedCheck checker=new PwnedCheck(new OkHttpClient());

At this point, our stream is now an Observable of the passphrase itself… unless it fails validation, in which case a PwnedException is thrown. So, our final step uses RxJava’s retry(), which will try the entire chain from the start if directed to, up to 3 times (as we are calling retry(3)).

So, if you get a passphrase from the Observable returned by getPassphrase(), it has been validated by the Pwned Passwords Web service and is guaranteed to be a fresh, un-pwned passphrase. More importantly, other than the change in data type from a list of strings to a single string, nothing outside of the repository knows or cares about the details of how the random passphrase is generated.


Prev Table of Contents Next

This book is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license.