Differences between revisions 10 and 11
Revision 10 as of 2011-04-05 01:14:55
Size: 4430
Editor: shoobe01
Comment:
Revision 11 as of 2011-04-05 12:30:02
Size: 5007
Editor: shoobe01
Comment:
Deletions are marked like this. Additions are marked like this.
Line 16: Line 16:
Voice Command - use voice to input a limited number of commands; akin to use of '''[[Accesskeys]]''' but with a larger set of commands. big problem in affordance much like gestural or other touch commands, they are not on screen and generally cannot be due to space... '''Voice Command''' uses voice to input a limited number of pre-set commands. The commands must be spoken exactly as the device expects, and cannot interpret arbitrary commands. These can be considered akin to '''[[Accesskeys]]''', as they are sort of shortcuts to control the device. The command set is generally very large, offering the entire control domain. Often, this is enabled at the OS level, and the entire handset can be used without any button presses.
Line 18: Line 18:
text - speech recognition (voice recognition implies user dependent input)... to type with the voice
*** Always use "user-independent" systems for general use... only build user voice profiles (user dependent) when needed, such as for specialized languages or libraries of words.***
'''Speech to Text''' or "speech recognition" enables the user to type arbitrary text by talking to the mobile device. Though methods vary widely, and there are limits, generally the user can speak any word, phrase or character and expect it to be recognized with reasonable accuracy.

Note that "voice recognition" implies user dependent input, meaning the user must set up their voice profile before working. User-independent systems are strongly preferred for general use, as they can be employed without setup by the end user. Only build user voice profiles when this would be acceptable to the user, such as when a setup process already exists and is expected.

Problem

A method must be provided to control some or all of the functions of the mobile device, or provide text input, without handling the device.

Solution

Voice Input has for many decades promised to relieve users of all sorts of systems from executing complex commands in unnatural or distracting ways. Some specialized, expensive products have met these goals in fields like aviation, while desktop computing continues to not meet promises, or gain wide acceptance.

Mobile is however uniquely positioned to exploit voice as an input and control mechanism, and has uniquely higher demand for such a feature. The ubiquity of the device means many users with low-vision or poor motor function (so poor entry) demand alternative methods of input. Near universal use, and contextual requirements such as safety -- for example to use navigation devices while operating a vehicle -- demand eyes-off, and hands-off control methods.

And lastly, many mobile devices are (or are based on) mobile handsets, so already have speakers, microphones designed for voice quality communications, and voice processing embedded into the device chipset.

Since most mobile devices are now connected, or only are useful when connected to the network, an increasingly useful option is for a remote server to perform all the speech recognition functions. This can even be used for fairly core functions, such as dialing the handset, as long as a network connection is required for the function to be performed anyway. For mobile handsets, the use of the voice channel is especially advantageous as no special effort must be made to gather or encode the input audio.

Variations

Voice Command uses voice to input a limited number of pre-set commands. The commands must be spoken exactly as the device expects, and cannot interpret arbitrary commands. These can be considered akin to Accesskeys, as they are sort of shortcuts to control the device. The command set is generally very large, offering the entire control domain. Often, this is enabled at the OS level, and the entire handset can be used without any button presses.

Speech to Text or "speech recognition" enables the user to type arbitrary text by talking to the mobile device. Though methods vary widely, and there are limits, generally the user can speak any word, phrase or character and expect it to be recognized with reasonable accuracy.

Note that "voice recognition" implies user dependent input, meaning the user must set up their voice profile before working. User-independent systems are strongly preferred for general use, as they can be employed without setup by the end user. Only build user voice profiles when this would be acceptable to the user, such as when a setup process already exists and is expected.

A detailed discussion of the methods used for recognizing speech is beyond the scope of this book, and is covered in detail in a number of other sources.

Interaction Details

usually, mobile devices use key or touch input and visual output, so have to initiate any voice input from one of these methods... to support low-sighted users or eyes-off use cases, suggest a key or key-combination. common one is something already associated with audio like speakerphone, as a long press

when active, should make a Tone or voice readback/reminder of the condition (e.g. "Say a command")... after this, the system accepts input.

when done, usually should read back what was entered...

during this, much like pen input where you get a correction time, "no" wipes or allows for selection from a list...

For Voice Command, as much interactivity as practical should be provided. When controlling the device OS, all the basic functions must be able to be performed, by offering controls such as Directional Entry and the ability to activate menus. This also may mean that a complete scroll-and-select style focus assignment system is required, even for devices that otherwise rely purely on touch or pen input.

Provide an easy method to abandon the Voice Input function, and return to keyboard or touch screen entry, without abandoning the entire current process. The best method for this will reverse the command used to enter the mode, such as the press-and-hold speakerphone key.

Presentation Details

... input should also have a visual component, to support glancing at the device, or completing the task by switching to a hands-on/eyes-on mode.

...hints should be provided on screen to activate. use common shorthand icons when possible. When space provides, such as text input into a single field (e.g. for a search field) provide additional on-screen instructions, so first-time users may become accustomed to the functionality...

Antipatterns

Audio systems and processing cannot be relied on to be full duplex so don't get in the way with too-fast response, etc.

Examples

Voice Input (last edited 2014-02-24 19:32:35 by shoobe01)