The following is the first few sections of a chapter from The Busy Coder's Guide to Android Development, plus headings for the remaining major sections, to give you an idea about the content of the chapter.
Android 6.0 introduced the concept of the device “assistant”. The assistant can be triggered by a long-press of the HOME button or via a spoken phrase (if the user has always-on keyphrase detection) enabled. An assistant is a special app that has access to the content of the foreground activity and other visible windows, much like an accessibility service does.
For the vast majority of users of Google Play ecosystem devices running Android 6.0 or higher, the “assistant” is known as Now On Tap. On some devices, such as the Google Pixel series, this assistant is known simply as the “Google Assistant”. This is marketed as an extension of the Google Now UI, where Now On Tap/Google Assistant will take the data from the foreground activity and use that to find other relevant things for the user to do based upon that data.
(for the purposes of this chapter, this Google-supplied assistant will be referred to as “Now On Tap”, to distinguish Google’s assistant from assistants that others might write using these APIs)
For example, suppose the user receives a text message, suggesting dinner at a particular restaurant. The restaurant is merely named — no URL — and so the text messaging client would just display the name of the restaurant as part of the message. If the user invokes Now On Tap, Google will take the contents of this message (and anything else on the screen), and presumably send it to Google’s servers, sending back things like details about the restaurant (e.g., URL to Web site, Google’s scanned reviews of the restaurant, link to Google Maps for driving directions). Google’s search engine technology would scan the data from the app, recognize that the restaurant name appears to be something significant, and give Now On Tap details of what to offer the user.
As with many things from Google, Now On Tap is very compelling and very much a privacy problem. Now On Tap is automatically installed and enabled on Android 6.0 devices — users have to go through some work to disable it. Users and app developers have limited ability to control Now On Tap, in terms of what data it collects and what it does with that data. On the other hand, certain apps (for which there are no privacy considerations) might wish to provide more data to Now On Tap, beyond what is visible in widgets, to help provide more context for Now On Tap to help users.
In this chapter, we will explore the Assist API, in terms of:
Understanding this chapter requires that you have read the core chapters of this book.
Quite a bit of data is made available to Now On Tap or other assistants through the Assist API alone, as will be explored in this section.
Assistants are welcome to use other APIs
as well, subject to standard Android permissions and such. So, for example,
an app might not show the device’s location, and therefore an assistant
could not get the location from the Assist API, but the assistant could
LocationManager or the Play Services location API to find out the
There is also a risk of pre-installed assistants using undocumented means of getting at data beyond what the normal Android SDK would allow.
All that being said, assistants will get a lot of information about the currently-visible UI, just from what the Assist API provides.
Assistants can get a screenshot of the current screen contents — minus the status bar — when the user activated the assistant (e.g., long-pressed HOME). Developers can block this for select activities or other windows. Hence, an assistant cannot assume that it will get a screenshot, though frequently it will.
Presumably, the “vision” here is to use computer vision and other image recognition techniques on the screenshot to find things of interest. For example, the user might bring up Now On Tap for some activity that is showing a photo of a monument. The activity might not be showing any other details about the monument, such as its name. However, Google’s servers might well recognize what monument it is and therefore give the user links to Wikipedia pages about the monument, a map of where the monument is located, etc.
By far the largest dump of data that the assistant gets comes in the
form of the view structure. This is represented by a tree
AssistStructure.ViewNode objects, one per widget or container within
a window. These provide similar information as to what one gets from
the accessibility APIs. For most assistants, the key data is the
text or content description in the widget. In the case of text, this
is available as a
CharSequence and so may contain additional information
(e.g., hyperlinks represented in
URLSpan objects) beyond the words
visible to the user.
Developers can restrict what widgets and containers are disclosed, but that is something developers have to do explicitly. In other words, making data available to assistants is something a developer has to opt out of, not opt into.
In addition to the view structure and a largely-undocumented
the other piece of data supplied to the assistant is the
Here is where an app can provide some additional context about
the foreground activity.
Specifically, the app can provide:
Intentthat represents the activity, replacing the
Intentthat was used to start the activity, if there is a better one for long-term use (e.g., the activity was started via a
Notificationaction and you want to route the user through a different
Intentfor other scenarios)
Urithat points to some Web page of relevance for this activity
Assistants can use this directly (e.g., offer a link to the
supplied in this content) or indirectly (e.g., using the schema.org
JSON to find places where the user can purchase related content).
The preview of this section is sleeping in.
The preview of this section was accidentally identified as an Android 'tasty treat' by the Cookie Monster.
The preview of this section is unavailable right now, but if you leave your name and number at the sound of the tone, it might get back to you (BEEEEEEEEEEEEP!).
The preview of this section may contain nuts.