Enabling voice search in Cobalt

Cobalt enables voice search through either:

  1. A subset of the MediaRecorder Web API
  2. A subset of the Speech Recognition Web API

Only one or the other can be used, and we recommend that the MediaRecorder API is followed, as the Speech Recognition API is deprecated as of Starboard 13.

In both approaches, in order to check whether to enable voice control or not, web apps will call the MediaDevices.enumerateDevices() Web API function within which Cobalt will in turn call a subset of the Starboard SbMicrophone API.

Partners can add microphone support and microphone gesture options using the optional SoftMicPlatformService, detailed below.

MediaRecorder API

To enable the MediaRecorder API in Cobalt, the complete SbMicrophone API must be implemented, and SbSpeechRecognizerIsSupported() must return false.

Speech Recognition API - Deprecated

The Speech Recognition API is deprecated as of Starboard 13.

In order to provide support for using this API, platforms must implement the Starboard SbSpeechRecognizer API as well as a subset of the SbMicrophone API.

Specific instructions to enable voice search

  1. Implement SbSpeechRecognizerIsSupported() to return true, and implement the SbSpeechRecognizer API.

  2. Implement the following subset of the SbMicrophone API:

    • SbMicrophoneGetAvailable()
    • SbMicrophoneCreate()
    • SbMicrophoneDestroy()

    In particular, SbMicrophoneCreate() must return a valid microphone. It is okay to stub out the other functions, e.g. have SbMicrophoneOpen() return false.

  3. The YouTube app will display the mic icon on the search page when it detects valid microphone input devices using MediaDevices.enumerateDevices().

  4. With SbSpeechRecognizerIsSupported() implemented to return true, Cobalt will use the platform's Starboard SbSpeechRecognizer API implementation, and it will not actually read directly from the microphone via the Starboard SbMicrophone API.

Differences from versions of Cobalt <= 11

In previous versions of Cobalt, there was no way to dynamically disable speech support besides modifying common Cobalt code to dynamically stub out the Speech Recognition API when the platform does not support microphone input. This is no longer necessary, web apps should now rely on MediaDevices.enumerateDevices() to determine whether voice support is enabled or not.

Speech Recognition API is deprecated in Starboard 13

Web applications are expected to use the MediaRecorder API. This in turn relies on the SbMicrophone API as detailed above.

SoftMicPlatformService

In starboard/linux/shared/soft_mic_platform_service.cc there is an example stub implementation of the SoftMicPlatformService. Platforms can optionally implement this [CobaltPlatformService](https://cobalt.dev/gen/cobalt/doc/ platform_services.html) to specify if they support the soft mic and/or hard mic for voice search. The soft mic refers to the software activation of the microphone for voice search through the UI microphone button on the Youtube Web Application search page. The hard mic refers to hardware button activation of the microphone for voice search. Platforms can also specify the optional micGesture. This specifies the type of UI prompt the YouTube Web Application should display to guide the user to start voice search. The options include an empty or null value for no prompt, "TAP" for tap the soft mic and/or hard mic to start voice search, or "HOLD" for hold the soft mic and/or the hard mic to start voice search.

The Web Application messages to the platform will be singular strings, encoded with enclosing quotation marks to make them JSON compliant:

"\"notifySearchActive\""


"\"notifySearchInactive\""

These messages notify the platform when the user is entering or exiting the Youtube Web Application search page. Only a synchronous true or false response is sent from the platform to confirm that the message was correctly received and parsed.

"\"getMicSupport\""

A similar synchronous true or false response is sent from the platform confirming the message was correctly received and parsed. The platform will also send an asynchronous string encoded JSON object with the above mentioned microphone preferences:

"{
    'hasHardMicSupport' : boolean,
    'hasSoftMicSupport' : boolean,
    'micGesture' : string,
 }"