The following is the first few sections of a chapter from The Busy Coder's Guide to Android Development, plus headings for the remaining major sections, to give you an idea about the content of the chapter.

Paging Beyond Room

Room’s support of the Paging library makes for a gentle way of getting into using paging to display large data sets. You just wrap what Room gives you in a LivePagedListBuilder and use the resulting PagedList with things like PagedListAdapter and a RecyclerView.

Often, though, life is more complicated that that.

In this chapter, we will stick with Room as our source of data, but explore other wrinkles with the Paging library that might be more applicable to your scenario.

Writing Your Own DataSource

In the end, the Paging library is based on implementations of DataSource. As the name suggests, DataSource is a common API around sources of data, in particular data that can be paged.

At the present time, there are three major DataSource implementations to use as a foundation. Two of these — PageKeyedDataSource and ItemKeyedDataSource – are designed for network APIs (REST, GraphQL, etc.), where we have to adapt to the API offered by the Web service.

PositionalDataSource is the third DataSource foundation. It is for cases where the DataSource has full random access of the underlying data. The client using the PositionalDataSource can ask for M items starting at position N, for arbitrary values of M and N, and the PositionalDataSource should be able to fulfill that request.

Room, under the covers, uses a PositionalDataSource. The DataSource.Factory that you get back from Room will create PositionalDataSource implementations on demand, backed by the database query that you specified in your @Dao.

The Problem: Life Is More Than Entities

However, Room’s support for PositionalDataSource has a key limitation: it only works with things that Room knows about. You can have it return classes annotated with @Entity, or you could have it return other POJOs for which the columns in the query map cleanly to the fields in the POJOs.

However, that’s pretty much it. Instantiating these objects is Room’s responsibility. You do not get control at any point (other than the constructor), up until the objects show up in your RecyclerView for binding to the rows, cells, or other visual representation.

This approach works for many simpler scenarios. However, it falls down for more complex situations.

Data Beyond Room

Suppose that the database table that you wish to page through represents particular locations of interest: name, description, latitude/longitude, etc. However, what you want to display is not only from that table, but also from a network source, such as the weather forecast for that location.

Room has no way to fetch a weather forecast from a network source of weather data.

The clean way of constructing an app with this sort of combined data source is to use the repository pattern, as will be discussed in an upcoming chapter. The repository is responsible for getting the data from Room, blending it with data from the network, and returning the results. But, in this case, we cannot readily use Room’s PositionalDataSource, as we do not get a chance, in our repository, to make the network calls to get the forecasts.

Models, Not Entities

Entities and other Room-capable POJOs have certain rules that they need to follow, such as having a suitable constructor or setter methods. That may or may not be the API that you want to expose to the UI layer. For example, you may want your UI layer to work with immutable objects, and Room does not work well with the leading immutability implementation for Java: AutoValue.

Or, perhaps your data comes from multiple sources naturally. For example, the real source of data is the network, but you also support offline caching, whether in a Room-fronted SQLite database or in some other form of cache. However, the way that the network API may want to represent the data may not match the way that your caching solution wants to represent the data. You might have one set of POJOs that Retrofit or Apollo-Android uses and a different set of POJOs that Room uses. Your UI layer should be working with a single representation, independent of where the data comes from… and that might be some third set of data structures (e.g., immutable objects). Your repository can handle the work to normalize the data from the disparate sources… but then you need to offer your own DataSource, not one directly from Room.

Derived Data

Room’s direct-to-Paging solution also does not give you a chance to perform any work on the data that comes from Room, prior to that data arriving in your RecyclerView. Perhaps part of what you need to show is derived from Room data but requires some amount of calculation on your part, where that calculation may be slow and not good for the main application thread. In some cases, you can perform the calculation in the database and have it be part of the query. In other cases, the calculation may require more expressive power than you get from SQLite’s available functions, or it may need data that is held in your app, not in SQLite.

Once again, a repository can handle all of this calculation work, but it means that you need your own DataSource, not just the one that you get from Room.

What Doesn’t Work: Decorator/Wrapper

One might think that these scenarios could be handled via the decorator pattern. Create a DataSource wrapper that delegates to the wrapped DataSource and performs conversions or additional work as needed. For example, if Room’s DataSource is returning some form of entity objects, you could create a ModelDataSource that wraps Room’s DataSource and converts the entities to models.

It is conceivable that this could be done, but when the author of this book went down that path, it seemed to be impractical, at least with the current alpha5 implementation of the Paging library.

Partly, that is because some classes and methods that we need are not public.

Partly, that is because the Paging implementation involves callbacks calling callbacks calling callbacks. This “callback-ception” approach makes wrapping difficult, as we cannot necessarily get control at the proper points to provide wrappers for all of the nested callbacks.

It is possible that future editions of the Paging library will offer greater flexibility here.

Inside a Custom Data Source

On the plus side, creating a custom PositionalDataSource, is not that difficult. The API is fairly small, so it is not like there are dozens of methods that you have to implement.

In this section, we will example some classes that build up to a concrete custom PositionalDataSource, for a series of to-do items. While the to-do items are stored in a ToDoDatabase in a ToDoEntity-defined table, the objects we want the UI layer to work with are ToDoModel instances. So, to provide paging with ToDoModel objects, we have a ToDoModelDataSource.

This data source will be used to populate a RecyclerView with the roster of to-do items, as part of an extensive upcoming sample app.

The sample app builds up a ToDoModelDataSource using two other custom ancestor classes: BaseDataSource and SnapshotDataSource.


There are two abstract methods to be implemented on a PositionalDataSource subclass: loadInitial() and loadRange(). loadRange() loads a specified number of items starting at a specified position, while loadInitial() loads some initial page’s worth of content.

If you use paging with Room and have a @Dao method return a DataSource.Factory, the generated code uses an internal class named LimitOffsetDataSource to perform the SQLite operations and fulfill the PositionalDataSource contract. BaseDataSource is based on that LimitOffsetDataSource and converts the loadInitial() and loadRange() methods into countItems() and loadRangeAtPosition() methods:


import android.arch.paging.PositionalDataSource;
import java.util.Collections;
import java.util.List;

abstract class BaseDataSource<T> extends PositionalDataSource<T> {
  abstract protected int countItems();
  abstract protected List<T> loadRangeAtPosition(int position, int size);

  public void loadInitial(@NonNull LoadInitialParams params,
    @NonNull LoadInitialCallback<T> callback) {
    int total=countItems();

    if (total==0) {
      callback.onResult(Collections.emptyList(), 0, 0);
    else {
      final int position=computeInitialLoadPosition(params, total);
      final int size=computeInitialLoadSize(params, position, total);
      List<T> list=loadRangeAtPosition(position, size);

      if (list!=null && list.size()==size) {
        callback.onResult(list, position, total);
      else {

  public void loadRange(@NonNull LoadRangeParams params,
                        @NonNull LoadRangeCallback<T> callback) {
    List<T> list=loadRangeAtPosition(params.startPosition, params.loadSize);

    if (list!=null) {
    else {

countItems() needs to return the current number of items that are available for whatever data source the BaseDataSource uses. Subclasses will need to do something, like query a database, to find out the value. loadInitial(), therefore starts off by getting the count and calling the callback’s onResult() method with an empty list if there are no items.

Assuming that there are items that could be returned, loadInitial() leverages the computeInitialLoadPosition() and computeInitialLoadSize() helper methods – supplied by PositionalDataSource — to determine the position and size to pass to loadRangeAtPosition(). loadRangeAtPosition() takes simple position and size values and is responsible for loading those on demand. If we got a valid list back, loadInitial() passes the list to the callback’s onResult() method; otherwise, we mark our source as invalidated, so it should no longer be used.

loadRange() is even simpler: it merely converts the supplied params into a position and size and calls loadRangeAtPosition(). Once again, the results are passed to the callback’s onResult() method if we have a list, or else we invalidate() ourselves.

Both loadInitial() and loadRange() should be called on background threads (e.g., via a LivePagedListBuilder), and therefore countItems() and loadRange() will be called on background threads.

So, the result of BaseDataSource is to replace one API with a simpler one.


Room’s LimitOffsetDataSource uses the position and size as values for OFFSET and LIMIT clauses in SQL statements.

SnapshotDataSource extends BaseDataSource and takes a different approach: load in all of the primary keys in one shot, then use those keys to load in the real data as needed. This allows us to map between positions and primary keys at any point in time, at pure-memory speed. The downside is that it consumes more memory, since we have all of the keys (long values, UUIDs, etc.) in memory at once, in addition to any fully-populated models. This should not be a huge issue in practice, as even a Paging-enabled list needs to be reasonable in length. Users will not be swiping through tens of thousands of items — they will throw their device against a wall and attack the developer with an machete first. Any UI needs to keep the number of items in the list to be a reasonable level, and so SnapshotDataSource should not introduce a major memory burden.

SnapshotDataSource, like BaseDataSource, is abstract, mapping the countItems() and loadRangeAtPosition() methods to loadKeys() and loadForIds() methods:


import java.util.Collections;
import java.util.List;

public abstract class SnapshotDataSource<T, PK> extends BaseDataSource<T> {
  protected abstract List<PK> loadKeys();
  protected abstract List<T> loadForIds(List<PK> pks);

  private volatile List<PK> keys=null;

  public int findPositionForKey(PK key) {
    if (keys==null) {
      throw new IllegalStateException("Attempted to find position for key without having keys loaded");

    return keys.indexOf(key);

  public PK findKeyForPosition(int position) {
    if (keys==null) {
      throw new IllegalStateException("Attempted to find position for key without having keys loaded");

    return keys.get(position);

  protected List<T> loadRangeAtPosition(int position, int size) {

    return loadForIds(keys.subList(position, position+size));

  protected int countItems() {

    return keys.size();

  synchronized private void initKeys() {
    if (keys==null) {

Subclasses can then use a Room DAO (or whatever) to load all of the primary keys, reflecting whatever filtering is desired. Plus, the subclasses will use the same backing store to get the real objects based on a slice of the overall primary key list.

SnapshotDataSource also provides helper methods to convert between positions and primary key values (findPositionForKey() and findKeyForPosition()). Those fail fast if, for some strange reason, you try to use them before you have really started using the SnapshotDataSource and do not yet have the keys loaded.


So now we can create a SnapshotDataSource subclass that bridges between this constructed DataSource protocol and a @Dao, in this case in the form of a ToDoModel.Store class.

That Store class has Room @Query methods to support what a SnapshotDataSource needs:

    @Query("SELECT id FROM todos ORDER BY description")
    abstract List<String> allKeys();

    @Query("SELECT id FROM todos WHERE isCompleted=:isCompleted ORDER BY description")
    abstract List<String> filteredKeys(boolean isCompleted);

    @Query("SELECT * FROM todos WHERE id IN (:ids) ORDER BY description")
    abstract List<ToDoEntity> forIds(List<String> ids);

The sample app that uses this ToDoModelDataSource has a filtering feature. Users can choose from three filter settings:

Our ToDoModelDataSource will need to take the current filter mode into account. To that end, we have:

All three methods apply our sort order, so that the keys come back in the desired order, plus to ensure that SQLite and Room return our entities (for a given page) in that same order.

ToDoModelDataSource then is a SnapshotDataSource subclass that wraps a ToDoModel.Store and uses those new methods to implement loadKeys() (based on FilterMode) and loadForIds() (converting from entities to models along the way):


import java.util.ArrayList;
import java.util.List;

class ToDoModelDataSource extends SnapshotDataSource<ToDoModel, String> {
  private final ToDoEntity.Store todoStore;
  private final FilterMode filterMode;

  ToDoModelDataSource(ToDoDatabase db, FilterMode filterMode) {

  protected List<String> loadKeys() {
    if (filterMode==FilterMode.ALL) {
      return todoStore.allKeys();
    else if (filterMode==FilterMode.COMPLETED) {
      return todoStore.filteredKeys(true);

    return todoStore.filteredKeys(false);

  protected List<ToDoModel> loadForIds(List<String> strings) {
    ArrayList<ToDoModel> result=new ArrayList<>();

    for (ToDoEntity entity : todoStore.forIds(strings)) {

    return result;

FilterMode is an enum, used by this sample app, to denote the particular filter that the user has requested. FilterMode.ALL, FilterMode.COMPLETED, and FilterMode.OUTSTANDING values determine which of the ToDoEntity.Store methods we call to get our keys and what parameters we supply.

Since the API methods on BaseDataSource should be called on background threads, the methods on ToDoModelDataSource should be called on those same background threads, so it is safe for us to make synchronous queries against the database.

LivePagedListBuilder and Custom DataSources

There is one thing missing with the previous section’s depiction of a custom DataSource: its Factory. Room does not return a DataSource — it returns a DataSource.Factory. LivePagedListBuilder does not use a DataSource — it uses a DataSource.Factory.

There’s a reason for this approach: invalidation.

DataSource Invalidation

Room not only provides data to you up front, but if you use appropriate reactive types (e.g., LiveData, Observable), it can also provide you with updates to the data over time, as that data changes. So, for example, if you have a @Query method on your @Dao class that returns a LiveData for some query, not only will you get the current values, but if you use Room to modify the database, you will get the updated values delivered to you automatically.

With Observable and LiveData, you simply get the new results, and it is up to you to deal with those results as appropriate. Under the covers, the DataSource.Factory returned by Room does something different: it invalidates the previous DataSource returned by the Factory.

The base DataSource class is mostly for invalidation-related methods:

When a DataSource is invalid, in principle, it should stop loading any data. Effectively, “invalid” is used to mean “stale”; invalidating a DataSource tells interested parties that they should get a fresh DataSource… such as from the DataSource.Factory that was used to create the current DataSource.

As a result, when the data changes in your database and your existing DataSource may reflect old results, Room will invalidate that DataSource. A consumer of that DataSource — such as a LivePagedListBuilder — can register an InvalidatedCallback, find out when the current DataSource becomes invalid, and get a fresh DataSource from the associated Factory.

Crafting a DataSource.Factory

A DataSource.Factory needs but one public method: create(). It takes no parameters and returns a DataSource.

However, the important thing is that it needs to be able to do this repeatedly, with a fresh, not-yet-invalidated DataSource each time. This implies that the Factory holds onto the objects necessary to create the desired DataSource. In the case of ToDoModelDataSource, that means that its Factory would need to hold onto the ToDoDatabase and FilterMode necessary to create a fresh ToDoModelDataSource.


The Paging invalidation pattern assumes that the driver of new data is Room or some other DataSource.Factory provider. Changes in data are detected at low levels and push new results to the UI.

That may or may not be appropriate. In particular, it may be tricky to integrate that with your desired GUI architecture.

For example, the sample app employing ToDoModelDataSource and the other classes shown here is an example of the Model-View-Intent (MVI) GUI architecture. There is a specific flow dictated by that architecture. While underlying data changes might necessitate UI updates, that needs to be done as part of the overall GUI architecture. Having it as some secondary channel might lead to conflicts.

As a result, ToDoModelDataSource does not use invalidation.

There is no requirement for a DataSource to use the invalidation mechanism. This is an available communications channel between a DataSource, its Factory, and consumers of those objects. However, “available” does not mean “required”, and if you are getting your UI updates by some other means, that is perfectly fine.

However, in this case, it means that LivePagedListBuilder is overkill. It will be waiting for signs of invalidation that never arrive. As a result, this sample app does not use LivePagedListBuilder to set up its PagedListAdapter for its RecyclerView. Instead, it uses a separate class, called PagedList.Builder.


The preview of this section was stepped on by Godzilla.