blob: d4bd6eccfad6b7239f447bb548eb73ba2ccd48e8 [file] [log] [blame] [view]
# Enabling voice search in Cobalt
Cobalt enables voice search through either:
1. A subset of the [MediaRecorder Web API](https://www.w3.org/TR/mediastream-recording/#mediarecorder-api)
2. A subset of the [Speech Recognition Web API](https://w3c.github.io/speech-api/#speechreco-section)
Only one or the other can be used, and we recommend that the MediaRecorder API
is followed, as the Speech Recognition API is deprecated as of Starboard 13.
In both approaches, in order to check whether to enable voice control or not,
web apps will call the [MediaDevices.enumerateDevices()](https://www.w3.org/TR/mediacapture-streams/#dom-mediadevices-enumeratedevices%28%29)
Web API function within which Cobalt will in turn call a subset of the
[Starboard SbMicrophone API](../../starboard/microphone.h).
Partners can add microphone support and microphone gesture options using the
optional SoftMicPlatformService, detailed below.
## MediaRecorder API
To enable the MediaRecorder API in Cobalt, the complete
[SbMicrophone API](../../starboard/microphone.h) must be implemented, and
`SbSpeechRecognizerIsSupported()` must return `false`.
## Speech Recognition API - Deprecated
**The Speech Recognition API is deprecated as of Starboard 13.**
In order to provide support for using this API, platforms must implement the
[Starboard SbSpeechRecognizer API](../../starboard/speech_recognizer.h) as well
as a subset of the [SbMicrophone API](../../starboard/microphone.h).
### Specific instructions to enable voice search
1. Implement `SbSpeechRecognizerIsSupported()` to return `true`, and implement
the [SbSpeechRecognizer API](../../starboard/speech_recognizer.h).
2. Implement the following subset of the
[SbMicrophone API](../../starboard/microphone.h):
- `SbMicrophoneGetAvailable()`
- `SbMicrophoneCreate()`
- `SbMicrophoneDestroy()`
In particular, SbMicrophoneCreate() must return a valid microphone. It is
okay to stub out the other functions, e.g. have `SbMicrophoneOpen()`
return `false`.
3. The YouTube app will display the mic icon on the search page when it detects
valid microphone input devices using `MediaDevices.enumerateDevices()`.
4. With `SbSpeechRecognizerIsSupported()` implemented to return `true`, Cobalt
will use the platform's
[Starboard SbSpeechRecognizer API](../../starboard/speech_recognizer.h)
implementation, and it will not actually read directly from the microphone
via the [Starboard SbMicrophone API](../../starboard/microphone.h).
### Differences from versions of Cobalt <= 11
In previous versions of Cobalt, there was no way to dynamically disable
speech support besides modifying common Cobalt code to dynamically stub out the
Speech Recognition API when the platform does not support microphone input.
This is no longer necessary, web apps should now rely on
`MediaDevices.enumerateDevices()` to determine whether voice support is enabled
or not.
### Speech Recognition API is deprecated in Starboard 13 ###
Web applications are expected to use the MediaRecorder API. This in turn relies
on the SbMicrophone API as detailed above.
## SoftMicPlatformService
In `starboard/linux/shared/soft_mic_platform_service.cc` there is an example
stub implementation of the SoftMicPlatformService. Platforms can optionally
implement this [CobaltPlatformService](https://cobalt.dev/gen/cobalt/doc/\
platform_services.html) to specify if they support the `soft mic` and/or `hard mic`
for voice search. The `soft mic` refers to the software activation of the microphone
for voice search through the UI microphone button on the Youtube Web Application
search page. The `hard mic` refers to hardware button activation of the microphone
for voice search. Platforms can also specify the optional `micGesture`. This
specifies the type of UI prompt the YouTube Web Application should display to guide
the user to start voice search. The options include an empty or `null` value for no
prompt, `"TAP"` for tap the `soft mic` and/or `hard mic` to start voice search, or
`"HOLD"` for hold the `soft mic` and/or the `hard mic` to start voice search.
The Web Application messages to the platform will be singular strings, encoded with
enclosing quotation marks to make them JSON compliant:
```
"\"notifySearchActive\""
"\"notifySearchInactive\""
```
These messages notify the platform when the user is entering or exiting the Youtube
Web Application search page. Only a synchronous `true` or `false` response is sent
from the platform to confirm that the message was correctly received and parsed.
```
"\"getMicSupport\""
```
A similar synchronous `true` or `false` response is sent from the platform confirming
the message was correctly received and parsed. The platform will also send an
asynchronous string encoded JSON object with the above mentioned microphone
preferences:
```
"{
'hasHardMicSupport' : boolean,
'hasSoftMicSupport' : boolean,
'micGesture' : string,
}"
```