Applying FTS to Room

The FTS module of the book’s primary sample project demonstrates the use of FTS4 in Room.

The app is a trivial e-book reader. In assets/, it contains a copy of the Project Gutenberg edition of H. G. Wells’ “The Invisible Man”. This is saved in plain text files, with blank lines serving as paragraph separators. The reader itself simply displays the book in its entirety, one paragraph per row in a RecyclerView:

FTS Sample App, As Initially Launched
FTS Sample App, As Initially Launched

It also has a SearchView, and if you search on a term, you will be given snippets of the book showing where that word was used.

This chapter will focus on the BookRepository and its associated Room classes that handle storing the book content and conducting the searches.

Creating the FTS Table and Entity

While the book is packaged with the app in assets/ as plain text files, SQLite and FTS have no way of searching that content. So, the app has a SQLite table and associated Room entity for the text. Since the prose is divided into paragraphs, we have a ParagraphEntity that models those:

package com.commonsware.room.fts

import androidx.room.Entity
import androidx.room.PrimaryKey

@Entity(tableName = "paragraphs")
data class ParagraphEntity(
  @PrimaryKey(autoGenerate = true) val id: Long,
  val sequence: Int,
  val prose: String
)

Here, sequence is an integer showing where the paragraph appears in the book, with lower numbers meaning earlier appearances. prose is the text of the paragraph itself.

There are a few SQLite strategies for creating and maintaining an FTS index. This book goes with a parallel-table approach, where we not only have an entity and table for the data, but a separate entity and table for the FTS index.

To that end, we not only have ParagraphEntity, but we have ParagraphFtsEntity:

package com.commonsware.room.fts

import androidx.room.Entity
import androidx.room.Fts4

@Fts4(contentEntity = ParagraphEntity::class)
@Entity(tableName = "paragraphsFts")
data class ParagraphFtsEntity(val prose: String)

The FTS entity has the same properties as the main entity, for those columns that we want to be indexed by FTS. In this case, that is only the prose.

The @Fts4 annotation indicates that this is an entity representing an FTS4 “shadow table” for some other table. The contentEntity property of the annotation points Room to the main entity (ParagraphEntity in this case).

Under the covers, Room and SQLite’s FTS4 will:

As a result, while we will query using both tables, we will only insert content using ParagraphEntityParagraphFtsEntity will be handled for us.

Querying Using FTS

Our DAO is BookStore:

package com.commonsware.room.fts

import androidx.room.Dao
import androidx.room.Insert
import androidx.room.Query

@Dao
abstract class BookStore {
  @Insert
  abstract suspend fun insert(paragraphs: List<ParagraphEntity>)

  @Query("SELECT prose FROM paragraphs ORDER BY sequence")
  abstract suspend fun all(): List<String>

  @Query("SELECT snippet(paragraphsFts) FROM paragraphs JOIN paragraphsFts "+
      "ON paragraphs.id == paragraphsFts.rowid WHERE paragraphsFts.prose "+
      "MATCH :search ORDER BY sequence")
  abstract suspend fun filtered(search: String): List<String>
}

As noted above, our only data-manipulation function — insert() — works with ParagraphEntity rather than ParagraphFtsEntity. And, we are welcome to query ParagraphEntity as normal, without having to deal with FTS, as we do in the all() function to retrieve all the prose in sequential order.

However, we can also use the MATCH operator for queries against our FTS4 shadow table. MATCH takes a search expression, which could be as simple as a word to find in the indexed content.

For the moment, let’s pretend that the filtered() function on BookStore looked like this:

  @Query("SELECT prose FROM paragraphs JOIN paragraphsFts "+
      "ON paragraphs.id == paragraphsFts.rowid WHERE paragraphsFts.prose "+
      "MATCH :search ORDER BY sequence")
  abstract suspend fun filtered(search: String): List<String>

The SELECT prose FROM paragraphs and the ORDER BY sequence parts are simple SQL. Combined, they make up the query used in all(), to return all the prose, order by sequence.

However, filtered() also uses JOIN to connect the paragraphsFts table with the paragraphs table. The id of our ParagraphEntity will match the rowid of the corresponding ParagraphsFtsEntity in its shadow table.

Finally, we use WHERE paragraphsFts.prose MATCH :search to apply an FTS4 search against the FTS4-indexed prose in paragraphsFts. Through the JOIN, we can get columns back from paragraphs for whatever rows in paragraphsFts matched the FTS4 search expression.

As a result, calling this version of filtered() with an FTS4 search expression would give us all the complete text of all the paragraphs that matched that search expression.

Getting Snippets

However… the filtered() function shown above is not actually what BookStore has. Instead, it has:

  @Query("SELECT snippet(paragraphsFts) FROM paragraphs JOIN paragraphsFts "+
      "ON paragraphs.id == paragraphsFts.rowid WHERE paragraphsFts.prose "+
      "MATCH :search ORDER BY sequence")
  abstract suspend fun filtered(search: String): List<String>

Here, instead of retrieving prose from paragraphs, we have snippet(paragraphFts).

When you use Internet search engines like Google, DuckDuckGo, etc., usually their search results are not just links, but snippets of text showing you the context around your search term, often with a search keyword highlighted:

DuckDuckGo Search for CommonsWare, with Keyword Highlights
DuckDuckGo Search for CommonsWare, with Keyword Highlights

snippet() gives you the same effect. SQLite’s FTS engine will return to you a portion of the text, with search keywords in boldface. The value passed to the snippet() function is the name of the FTS table (in this case, paragraphFts).

The RecyclerView.ViewHolder renders that HTML using HtmlCompat:

package com.commonsware.room.fts

import android.view.View
import android.widget.TextView
import androidx.core.text.HtmlCompat
import androidx.recyclerview.widget.RecyclerView

class RowHolder(root: View) : RecyclerView.ViewHolder(root) {
  private val prose = root.findViewById<TextView>(R.id.prose)

  fun bind(paragraph: String) {
    prose.text = HtmlCompat.fromHtml(paragraph, HtmlCompat.FROM_HTML_MODE_COMPACT)
  }
}

The effect is that the user’s chosen search term shows up in bold in the search results:

FTS Sample App, Showing Search Results for chair
FTS Sample App, Showing Search Results for ‘chair’

You do not have to use this feature, if you do not want, but it is available to you if it would be of use to your users.

Populating the Database

Our BookDatabase does not need anything special for having FTS be a part of its entity and DAO mix. However, ParagraphFtsEntity is an entity, so it needs to be listed in the @Database annotation:

package com.commonsware.room.fts

import android.content.Context
import androidx.room.Database
import androidx.room.Room
import androidx.room.RoomDatabase

private const val DB_NAME = "book.db"

@Database(
  entities = [ParagraphEntity::class, ParagraphFtsEntity::class],
  version = 1
)
abstract class BookDatabase : RoomDatabase() {
  abstract fun bookStore(): BookStore

  companion object {
    fun newInstance(context: Context) =
      Room.databaseBuilder(context, BookDatabase::class.java, DB_NAME).build()
  }
}

Technically, BookRepository needs to know even less about FTS. However, somebody has to have the job of adding the book to the database on first run, and in this case, that job falls on BookRepository. Its all() function will be called when the app launches, to populate the RecyclerView to show the book contents. So, we can see if all() on BookStore returns an empty list — if it does, we can surmise that the database is empty and we need to fill it:

  suspend fun all(): List<String> {
    val result = db.bookStore().all()

    return if (result.isEmpty()) {
      import()
      db.bookStore().all()
    } else {
      result
    }
  }

The import() function iterates over all of the book chapters (which are numbered so their order can be maintained) and generates ParagraphEntity objects for each of their paragraphs. The resulting List of entities then gets inserted into the database via insert() on BookStore:

  private suspend fun import() = withContext(Dispatchers.IO) {
    val assets = context.assets

    val entities = assets.list(BOOK_DIR).orEmpty()
      .sorted()
      .map { paragraphs(assets.open("$BOOK_DIR/$it")) }
      .flatten()
      .mapIndexed { index, prose -> ParagraphEntity(0, index, prose) }

    db.bookStore().insert(entities)
  }
}

// inspired by https://stackoverflow.com/a/10065920/115145

private fun paragraphs(stream: InputStream) =
  paragraphs(stream.reader().readLines())

private fun paragraphs(lines: List<String>) =
  lines.fold(listOf<String>()) { roster, line ->
    when {
      line.isEmpty() -> roster + ""
      roster.isEmpty() -> listOf(line)
      else -> roster.take(roster.size - 1) + (roster.last() + " ${line.trim()}")
    }
  }
    .map { it.trim() }
    .filter { it.isNotEmpty() }

Here, BOOK_DIR identifies the directory within assets/ where the chapters reside:

private const val BOOK_DIR = "TheTimeMachine"

Dealing With Errors

It is possible that the user will type in an invalid search expression. SQLite has a specific search language, and the user’s search expression might not adhere to that.

In this case, our filtered() DAO function, being a suspend function, will throw the exception. We can catch that and handle it how we see fit.

The app has a SearchViewModel that calls filtered() on BookRepository and surfaces the result via a LiveData:

package com.commonsware.room.fts

import android.content.Context
import android.util.Log
import androidx.lifecycle.LiveData
import androidx.lifecycle.ViewModel
import androidx.lifecycle.liveData

class SearchViewModel(
  search: String,
  repo: BookRepository,
  private val context: Context
) : ViewModel() {
  val paragraphs: LiveData<List<String>> = liveData {
    try {
      emit(repo.filtered(search))
    } catch (t: Throwable) {
      Log.e("FTS", "Exception with search expression", t)
      emit(listOf(context.getString(R.string.search_error, search)))
    }
  }
}

We wrap our filtered() call in a try/catch. If we get an exception, we log its details to Logcat, and we emit() a result with an error message for the user. In this case, we are taking advantage of the fact that our UI is already displaying text to the user — we just replace the normal search results with an error message in this case.

Alternatively, we could have paragraphs be a LiveData of some sealed class with Content and Error implementations. Content could hold the paragraphs, while Error could hold the error message. This would give the code that displays the results (SearchFragment) more flexibility in displaying the error, albeit with somewhat greater code complexity.

Why Are We Bothering?

Of course, you might wonder what all the fuss is about. In this sample, we are loading the entire book into memory, via the all() functions on our BookStore and BookRepository. If we want to search, we could do so purely in memory, without any FTS stuff.

There are three reasons why we are implementing FTS here:

  1. While we could do a simple keyword search easily enough ourselves, FTS supports richer search expressions than that, and writing a whole search expression parser and executor is going to be a lot of work
  2. We would not have to load the entire book into memory, if we did not want to — we will examine that more in a later chapter
  3. It would be silly to have a chapter on FTS and then not show it

Prev Table of Contents Next

This book is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license.