Applying FTS to Room
The FTS
module of the book’s primary sample project demonstrates the use of FTS4 in Room.
The app is a trivial e-book reader. In assets/
, it contains a copy of the Project Gutenberg edition of H. G. Wells’ “The Invisible Man”. This is saved in plain text files, with blank lines serving as paragraph separators. The reader itself simply displays the book in its entirety, one paragraph per row in a RecyclerView
:
It also has a SearchView
, and if you search on a term, you will be given snippets of the book showing where that word was used.
This chapter will focus on the BookRepository
and its associated Room classes that handle storing the book content and conducting the searches.
Creating the FTS Table and Entity
While the book is packaged with the app in assets/
as plain text files, SQLite and FTS have no way of searching that content. So, the app has a SQLite table and associated Room entity for the text. Since the prose is divided into paragraphs, we have a ParagraphEntity
that models those:
package com.commonsware.room.fts
import androidx.room.Entity
import androidx.room.PrimaryKey
@Entity(tableName = "paragraphs")
data class ParagraphEntity(
@PrimaryKey(autoGenerate = true) val id: Long,
val sequence: Int,
val prose: String
)
Here, sequence
is an integer showing where the paragraph appears in the book, with lower numbers meaning earlier appearances. prose
is the text of the paragraph itself.
There are a few SQLite strategies for creating and maintaining an FTS index. This book goes with a parallel-table approach, where we not only have an entity and table for the data, but a separate entity and table for the FTS index.
To that end, we not only have ParagraphEntity
, but we have ParagraphFtsEntity
:
package com.commonsware.room.fts
import androidx.room.Entity
import androidx.room.Fts4
@Fts4(contentEntity = ParagraphEntity::class)
@Entity(tableName = "paragraphsFts")
data class ParagraphFtsEntity(val prose: String)
The FTS entity has the same properties as the main entity, for those columns that we want to be indexed by FTS. In this case, that is only the prose
.
The @Fts4
annotation indicates that this is an entity representing an FTS4 “shadow table” for some other table. The contentEntity
property of the annotation points Room to the main entity (ParagraphEntity
in this case).
Under the covers, Room and SQLite’s FTS4 will:
- Add a standard
rowid
column to the shadow table, matching theINTEGER
ID of the main table - Add triggers, such that whenever changes are made to the main table, Room can update the shadow table to match
As a result, while we will query using both tables, we will only insert content using ParagraphEntity
— ParagraphFtsEntity
will be handled for us.
Querying Using FTS
Our DAO is BookStore
:
package com.commonsware.room.fts
import androidx.room.Dao
import androidx.room.Insert
import androidx.room.Query
@Dao
abstract class BookStore {
@Insert
abstract suspend fun insert(paragraphs: List<ParagraphEntity>)
@Query("SELECT prose FROM paragraphs ORDER BY sequence")
abstract suspend fun all(): List<String>
@Query("SELECT snippet(paragraphsFts) FROM paragraphs JOIN paragraphsFts "+
"ON paragraphs.id == paragraphsFts.rowid WHERE paragraphsFts.prose "+
"MATCH :search ORDER BY sequence")
abstract suspend fun filtered(search: String): List<String>
}
As noted above, our only data-manipulation function — insert()
— works with ParagraphEntity
rather than ParagraphFtsEntity
. And, we are welcome to query ParagraphEntity
as normal, without having to deal with FTS, as we do in the all()
function to retrieve all the prose in sequential order.
However, we can also use the MATCH
operator for queries against our FTS4 shadow table. MATCH
takes a search expression, which could be as simple as a word to find in the indexed content.
For the moment, let’s pretend that the filtered()
function on BookStore
looked like this:
@Query("SELECT prose FROM paragraphs JOIN paragraphsFts "+
"ON paragraphs.id == paragraphsFts.rowid WHERE paragraphsFts.prose "+
"MATCH :search ORDER BY sequence")
abstract suspend fun filtered(search: String): List<String>
The SELECT prose FROM paragraphs
and the ORDER BY sequence
parts are simple SQL. Combined, they make up the query used in all()
, to return all the prose, order by sequence
.
However, filtered()
also uses JOIN
to connect the paragraphsFts
table with the paragraphs
table. The id
of our ParagraphEntity
will match the rowid
of the corresponding ParagraphsFtsEntity
in its shadow table.
Finally, we use WHERE paragraphsFts.prose MATCH :search
to apply an FTS4 search against the FTS4-indexed prose in paragraphsFts
. Through the JOIN
, we can get columns back from paragraphs
for whatever rows in paragraphsFts
matched the FTS4 search expression.
As a result, calling this version of filtered()
with an FTS4 search expression would give us all the complete text of all the paragraphs that matched that search expression.
Getting Snippets
However… the filtered()
function shown above is not actually what BookStore
has. Instead, it has:
@Query("SELECT snippet(paragraphsFts) FROM paragraphs JOIN paragraphsFts "+
"ON paragraphs.id == paragraphsFts.rowid WHERE paragraphsFts.prose "+
"MATCH :search ORDER BY sequence")
abstract suspend fun filtered(search: String): List<String>
Here, instead of retrieving prose
from paragraphs
, we have snippet(paragraphFts)
.
When you use Internet search engines like Google, DuckDuckGo, etc., usually their search results are not just links, but snippets of text showing you the context around your search term, often with a search keyword highlighted:
snippet()
gives you the same effect. SQLite’s FTS engine will return to you a portion of the text, with search keywords in boldface. The value passed to the snippet()
function is the name of the FTS table (in this case, paragraphFts
).
The RecyclerView.ViewHolder
renders that HTML using HtmlCompat
:
package com.commonsware.room.fts
import android.view.View
import android.widget.TextView
import androidx.core.text.HtmlCompat
import androidx.recyclerview.widget.RecyclerView
class RowHolder(root: View) : RecyclerView.ViewHolder(root) {
private val prose = root.findViewById<TextView>(R.id.prose)
fun bind(paragraph: String) {
prose.text = HtmlCompat.fromHtml(paragraph, HtmlCompat.FROM_HTML_MODE_COMPACT)
}
}
The effect is that the user’s chosen search term shows up in bold in the search results:
You do not have to use this feature, if you do not want, but it is available to you if it would be of use to your users.
Populating the Database
Our BookDatabase
does not need anything special for having FTS be a part of its entity and DAO mix. However, ParagraphFtsEntity
is an entity, so it needs to be listed in the @Database
annotation:
package com.commonsware.room.fts
import android.content.Context
import androidx.room.Database
import androidx.room.Room
import androidx.room.RoomDatabase
private const val DB_NAME = "book.db"
@Database(
entities = [ParagraphEntity::class, ParagraphFtsEntity::class],
version = 1
)
abstract class BookDatabase : RoomDatabase() {
abstract fun bookStore(): BookStore
companion object {
fun newInstance(context: Context) =
Room.databaseBuilder(context, BookDatabase::class.java, DB_NAME).build()
}
}
Technically, BookRepository
needs to know even less about FTS. However, somebody has to have the job of adding the book to the database on first run, and in this case, that job falls on BookRepository
. Its all()
function will be called when the app launches, to populate the RecyclerView
to show the book contents. So, we can see if all()
on BookStore
returns an empty list — if it does, we can surmise that the database is empty and we need to fill it:
suspend fun all(): List<String> {
val result = db.bookStore().all()
return if (result.isEmpty()) {
import()
db.bookStore().all()
} else {
result
}
}
The import()
function iterates over all of the book chapters (which are numbered so their order can be maintained) and generates ParagraphEntity
objects for each of their paragraphs. The resulting List
of entities then gets inserted into the database via insert()
on BookStore
:
private suspend fun import() = withContext(Dispatchers.IO) {
val assets = context.assets
val entities = assets.list(BOOK_DIR).orEmpty()
.sorted()
.map { paragraphs(assets.open("$BOOK_DIR/$it")) }
.flatten()
.mapIndexed { index, prose -> ParagraphEntity(0, index, prose) }
db.bookStore().insert(entities)
}
}
// inspired by https://stackoverflow.com/a/10065920/115145
private fun paragraphs(stream: InputStream) =
paragraphs(stream.reader().readLines())
private fun paragraphs(lines: List<String>) =
lines.fold(listOf<String>()) { roster, line ->
when {
line.isEmpty() -> roster + ""
roster.isEmpty() -> listOf(line)
else -> roster.take(roster.size - 1) + (roster.last() + " ${line.trim()}")
}
}
.map { it.trim() }
.filter { it.isNotEmpty() }
Here, BOOK_DIR
identifies the directory within assets/
where the chapters reside:
private const val BOOK_DIR = "TheTimeMachine"
Dealing With Errors
It is possible that the user will type in an invalid search expression. SQLite has a specific search language, and the user’s search expression might not adhere to that.
In this case, our filtered()
DAO function, being a suspend
function, will throw the exception. We can catch that and handle it how we see fit.
The app has a SearchViewModel
that calls filtered()
on BookRepository
and surfaces the result via a LiveData
:
package com.commonsware.room.fts
import android.content.Context
import android.util.Log
import androidx.lifecycle.LiveData
import androidx.lifecycle.ViewModel
import androidx.lifecycle.liveData
class SearchViewModel(
search: String,
repo: BookRepository,
private val context: Context
) : ViewModel() {
val paragraphs: LiveData<List<String>> = liveData {
try {
emit(repo.filtered(search))
} catch (t: Throwable) {
Log.e("FTS", "Exception with search expression", t)
emit(listOf(context.getString(R.string.search_error, search)))
}
}
}
We wrap our filtered()
call in a try
/catch
. If we get an exception, we log its details to Logcat, and we emit()
a result with an error message for the user. In this case, we are taking advantage of the fact that our UI is already displaying text to the user — we just replace the normal search results with an error message in this case.
Alternatively, we could have paragraphs
be a LiveData
of some sealed class
with Content
and Error
implementations. Content
could hold the paragraphs, while Error
could hold the error message. This would give the code that displays the results (SearchFragment
) more flexibility in displaying the error, albeit with somewhat greater code complexity.
Why Are We Bothering?
Of course, you might wonder what all the fuss is about. In this sample, we are loading the entire book into memory, via the all()
functions on our BookStore
and BookRepository
. If we want to search, we could do so purely in memory, without any FTS stuff.
There are three reasons why we are implementing FTS here:
- While we could do a simple keyword search easily enough ourselves, FTS supports richer search expressions than that, and writing a whole search expression parser and executor is going to be a lot of work
- We would not have to load the entire book into memory, if we did not want to — we will examine that more in a later chapter
- It would be silly to have a chapter on FTS and then not show it
Prev Table of Contents Next
This book is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license.