Blending Data Sources
A repository is not limited to only using a single data source.
Sometimes, the repository will use multiple data sources that are holding model objects. For example, the repository might implement a two-level cache (memory plus disk), with the “system of record” being a Web service. For some requests, the repository would need to check some or all of these locations for the data.
Sometimes, while the primary model data resides in one data source, ancillary data resides in another. For example, perhaps the model data contains a URL to an image, or an image URL can be derived from the model data (e.g., an avatar icon). A typical approach is to have the repository ignore that, allowing the UI layer to request the image from an image-loading library (e.g., Picasso, Glide), which serves as its own repository. But if you need to perform transformations on that image, you might find it better to handle that yourself in your own repository. In that case, while most of the model data might come from one place (local database, Web service), other aspects of that model data might come from another place.
The Diceware/Pwned
sample project will demonstrate this, by validating our randomly-generated passphrase to confirm that it has not already been “pwned”.
Pwned Passwords
In the parlance of Troy Hunt, a password is “pwned” if it has appeared in a data breach. Sometimes, the breach is of a database where passwords were held in plaintext (inexplicably). Sometimes, the copied database used hashing, but the hash was weak and passwords could be obtained using rainbow tables and other attacks.
Mr. Hunt maintains a database of “pwned passwords”. There are several ways of using this database, including a lightweight Web service API.
In particular, this API provides a way to check a password against the database without disclosing the password itself to the Web service. Instead, the client:
- Creates a SHA-1 hash of the password
- Submits the first five characters of that hash
- Receives in response the suffixes of all of the hashes of all of the passwords in the database that start with those five characters
That response will be several hundred entries, but that is still small enough to scan for matches against the rest of the password’s hash. This way, all the Web service knows is that you have a password whose hash starts with those five characters; the password itself is never passed to the Web service. This is a nice way to validate a password against the database, as the candidate password is never disclosed.
In principle, a diceware app should not need to use this Web service. It is very unlikely that a randomly-generated set of words from decently-long word lists will have been used previously as a password. It is even less likely that it will have been used previously as a password that wound up in Mr. Hunt’s database. However, “very unlikely” is not the same as “impossible”, and since the Web service API is easy to use, it may be worthwhile to check the generated passphrase.
There are other cases where using this Web service API is more important, such as:
- User-supplied passwords for local data
- User-supplied passwords for an account to be created on your Web service (where the Web service itself is not checking that password for pwnage)
- Pwnage checking integration into a password safe
PwnedCheck
The PwnedCheck
class provides a simple RxJava/OkHttp wrapper around the Web service. The Web service does not use JSON, XML, or other conventional data structures for its response, and so a dedicated REST client API like Retrofit will not help much here.
PwnedCheck
has a one-parameter constructor that takes an OkHttpClient
instance. If you are already using OkHttp, pass in your existing OkHttpClient
instance, so you can share the pools that OkHttp maintains (threads, connections, etc.). Or, just create a new OkHttpClient()
and pass that in.
PwnedCheck
exposes two methods that supply Observable
responses related to passphrase validation: score()
and validate()
.
score()
The core one is score()
. This will return an Integer
— via an Observable
— representing the number of times the supplied passphrase appears in the database, or 0
if it is not in the database at all:
Observable<Integer> score(String passphrase) {
return Observable.just(passphrase)
.map(PwnedCheck::getSha1Hex)
.flatMap(this::fetchCandidates)
.map(PwnedCheck::findCount);
}
Here, we start an Observable
chain just()
on the passphrase. Then, we use map()
to convert that passphrase to its SHA-1 hash, using getSha1Hex()
:
// based on https://stackoverflow.com/a/33260623/115145
private static String getSha1Hex(String original) throws Exception {
MessageDigest messageDigest=MessageDigest.getInstance("SHA-1");
messageDigest.update(original.getBytes("UTF-8"));
byte[] bytes=messageDigest.digest();
StringBuilder buffer=new StringBuilder();
for (byte b : bytes) {
buffer.append(Integer.toString((b & 0xff)+0x100, 16).substring(1));
}
return buffer.toString();
}
Then, we flatMap()
to get a new Observable
, one that wraps around OkHttp to make the REST request of the Web service, via fetchCandidates()
:
private Observable<FetchResult> fetchCandidates(String sha1) throws IOException {
String url="https://api.pwnedpasswords.com/range/"+sha1.substring(0, 5);
Request request=new Request.Builder().url(url).build();
return Observable.fromCallable(
() -> new FetchResult(okHttpClient.newCall(request).execute(), sha1));
}
The URL used for the Web service is simply a particular base URL with the first five characters of the SHA-1 has appended as a path segment.
fetchCandidates()
returns a FetchResult
, wrapped in an Observable
. A FetchResult
is a simple POJO wrapping our hash and the Response
object from OkHttp:
private static class FetchResult {
final Response response;
final String sha1;
private FetchResult(Response response, String sha1) {
this.response=response;
this.sha1=sha1;
}
}
score()
completes the chain by converting the FetchResult
into the count of occurrences of the passphrase, using findCount()
:
private static int findCount(FetchResult fetch) throws IOException {
String candidates=fetch.response.body().string();
String suffix=fetch.sha1.substring(5).toUpperCase();
for (String line : candidates.split("\r\n")) {
if (line.startsWith(suffix)) {
return(Integer.parseInt(line.split(":")[1]));
}
}
return 0;
}
The response from the server is a series of lines. Each line contains the trailing 35 characters of the SHA-1 hash, after the common five-character prefix. Each line also has the number of occurrences of that hash in the database, separated from the hash suffix via a colon. So, findCount()
:
- Retrieves the entire response as a
String
- Splits that into lines
- Checks to see if the line starts with the trailing 35 characters of the candidate passphrase’s SHA-1 hash
- If it is, pick out the count and return it
So, the chain set up by score()
results in you getting that score: the number of times the passphrase appears in the database, or 0
if it is not in the database at all.
Note that the Observable
chain set up by score()
performs network I/O, so clients will want to use subscribeOn()
or something to ensure that the work is performed on a background thread.
validate()
Most of the time, though, you do not need the score. You just need to know if the passphrase appears in the database.
One way to do that would be to have an Observable
of Boolean
, with false
indicating that the passphrase is invalid (i.e., it is in the database and therefore is pwned). Another approach is to have an Observable
chain that throws an exception for invalid passphrases. This approach can then be used as part of RxJava’s “retry” options — in the case of this app, we can generate another random set of words and try again.
So, validate()
wraps the score()
Observable
and yields one of two outcomes:
- You get an
Observable
that gives you your passphrase back, so you do not necessarily need to hold onto it elsewhere, or - a
PwnedException
, if thescore()
is a positive number
Observable<String> validate(final String passphrase) {
return score(passphrase).map(score -> {
if (score>0) {
throw new PwnedException();
}
return passphrase;
});
}
private static class PwnedException extends RuntimeException {
}
Adding OkHttp and INTERNET
To make all of this work, the project has the OkHttp dependency, plus it has the INTERNET
permission requested in the manifest.
Integrating PwnedCheck
Given the PwnedCheck
class, our Repository
can now blend it into its work for generating random passphrases based on word lists.
Modifying the Model
In the earlier sample, we considered the “model” to be a list of strings, of a UI-determined length. The UI was responsible for combining those into a passphrase using some delimiter (in this case, a space).
However, the Pwned Passwords API wants a simple passphrase as a String
. So, we need to modify the app to have the model be a single String
, not a list. And, we will need to have the Repository
create the combined string.
This requires a few changes to consumers of the Repository
.
PassphraseViewModel
now has a MutableLiveData
of String
:
private final MutableLiveData<String> livePassphrase=new MutableLiveData<>();
This also affects the method used to get that LiveData
, now renamed to be passphraseStream()
:
LiveData<String> passphraseStream() {
return(livePassphrase);
}
When PassphraseFragment
observes that LiveData
, it no longer needs to use TextUtils
to join the words into a single string. Instead, it gets the already-joined words and can just put them into the TextView
:
@Override
public void onViewCreated(View view, Bundle state) {
super.onViewCreated(view, state);
passphrase=view.findViewById(R.id.passphrase);
viewModel=ViewModelProviders
.of(this, new PassphraseViewModel.Factory(getActivity(), state))
.get(PassphraseViewModel.class);
updateMenu();
viewModel.passphraseStream().observe(this,
newPhrase -> passphrase.setText(newPhrase));
}
Validating the Passphrase
The original sample’s Repository
had a getWords()
method that randomly selected the words out of the given source:
Single<List<String>> getWords(Uri source, final int count) {
return(getWordsFromSource(source)
.map(strings -> (randomSubset(strings, count))));
}
That method is now getPassphrase()
, and it involves a slightly longer Observable
chain:
Observable<String> getPassphrase(Uri source, final int count) {
return(getWordsFromSource(source)
.map(strings -> (randomSubset(strings, count)))
.map(pieces -> TextUtils.join(" ", pieces))
.flatMap(checker::validate)
.retryWhen(errors -> errors.retry(3)));
}
The first two steps in the chain — getWordsFromSource()
and the map()
for randomSubset()
— are what we had originally.
Next, we use another map()
to join()
the words here, rather than in the PassphraseFragment
as before.
Then, we can use flatMap()
to pull in our PwnedCheck
instance, held in a field named checker
:
private final PwnedCheck checker=new PwnedCheck(new OkHttpClient());
At this point, our stream is now an Observable
of the passphrase itself… unless it fails validation, in which case a PwnedException
is thrown. So, our final step uses RxJava’s retry()
, which will try the entire chain from the start if directed to, up to 3 times (as we are calling retry(3)
).
So, if you get a passphrase from the Observable
returned by getPassphrase()
, it has been validated by the Pwned Passwords Web service and is guaranteed to be a fresh, un-pwned passphrase. More importantly, other than the change in data type from a list of strings to a single string, nothing outside of the repository knows or cares about the details of how the random passphrase is generated.
Prev Table of Contents Next
This book is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license.