Sorry, I have been away for a vacation (had a break from augmented reality to actually enjoy reality too :)). But now I am back and with a new tutorial to show you how to use Dictation in your Hololens apps. Simply put, how you can use speech that converts to text in your apps. (SpeechToText)
The Mixed Reality Toolkit makes it very easy for us to do this with the help of a DictationInputManager class.
The MR Toolkit examples package even comes with an example scene called DictationTest if you want to try it out.
Do the basics of creating a new Unity project and importing the MR Toolkit.
Delete the Main Camera and add the MixedRealityCameraParent.prefab from the MR Toolkit.
Change the Camera Settings to Hololens preferred:
1. Skybox to Solid Color
2. Background to 0,0,0,0
Save the scene.
So in this tutorial, we will have a cube in the scene which will be our visual feedback that recording has started and ended.
We would use Speech input to say “Start recording”, after which the cube changes color from white to red. The dictated text appears below the cube. While dictation is ongoing, the color is still red for both the dictated text and the cube.
When dictation has ended, the color of both the cube and the text changes to green.
First let’s accomplish the part where we get the speech done:
1. Get a white cube in the scene
2. Add speech input to the scene
3. Add voice command “Start recording” which changes color of the cube to red
Since I already covered speech input in a previous tutorial , I won’t be covering it in detail and will walk you through the speech part briskly.
1. Add a ‘Cube‘ GameObject in the Unity scene
2. Add an empty GameObject and call it ‘Managers‘ or something relevant. We will add all the Manager class objects into this as child object (Let’s stay organized :-))
3. Add a child GameObject to the ‘Managers‘ and add an empty child GameObject to it. Call it ‘SpeechGameObject‘ or something relevant. To this, we will add the ‘Speech Input Source‘ script from the MR toolkit via the ‘Add Component‘ in the Inspector (see my Voice Input tutorial for screenshots and complete descriptions)
4. We just have one keyword called “Start recording“. So let’s add that in the Speech Input Source Inspector variables
5. Now we need to add the Speech Input Handler to the ‘Cube‘ object. Click on the ‘Add Component‘ part of the Cube and search and add ‘Speech Input Handler’ from the MR Toolkit. Check the ‘Is Global Listener’ flag on the Speech Input Handler.
6. Before we assign keywords, let’s write the code which changes the color of the cube. For this click on the ‘Cube‘ and via ‘Add Component‘ in the Inspector, add a C# script and call it ‘ColorandDictationManipulation‘ or something relevant. This will also be the script where our Dictation logic will go.
7. Right click on the ‘ColorandDictationManipulation‘ script to open it up in Visual Studio
We will write a function called OnVoiceCommand() which should be called upon Speech Input, in which the code to change the color of the cube goes. For this we need a renderer which needs to be initialized in the Awake() function.
We also need an Inspector variable so that we can refer to our Cube to be changed to a different color. I call this variable objectToBeManipulated
So you can drag and drop whichever object you want to manipulate in objectToBeManipulated field if you want.
Now get the Renderer component of the objectToBeManipulated (in this example, the cube) into the renderer variable
Change the Material color on the renderer (i.e now the cube) to red.
Your script so far should look like the above code
8. Now go back to the Cube Inspector, and add the keywords and the methods on the Speech Input Handler
Select dropdown and choose “Start Recording” in the Speech Input Handler
Choose the object Cube and select the OnVoiceCommand method in the Response() section
Choose the Cube as the Object To Be Manipulated in the ColorandDictationManipulation script
10. Go to Edit -> Project Settings -> Player and under Pubishing Settings
–> Capabilities, make sure Microphone is checked
11. Adjust the cube in the scene to make the camera see it.
My values for the MRCamera and the Cube are shown in the screenshots
12. Now for the last bit before we try whether the Speech part works fine. If you remember the Speech Input tutorial, we need to add the InputManager in the MR Toolkit also to the scene. In the Project window, search for InputManager prefab and drag and drop it as a child under Managers.
Let’s see if this works so far… Build and deploy!
At this stage, if you say “Start Recording”, then your cube should turn red.
Now let’s move on to the Dictation part:
1. We first add a 3D Text object as a child object to the Cube. This is to display the dictation text. Call it DictationOutput or something relevant. Position this as text below the cube (or anywhere you want to see the dictation text)
I changed the scale on the Cube to 0.5,0.5,0.5
Since we want to prompt the user to say “Start Recording” to start the dictation function, let’s change the text on the DictationOutput to Say “Start Recording”
Change the scale on the DictationOutput object to 0.02, 0.02, 0.02 with Font Size 70. It just looks nicer and visible better. You can ofcourse adjust it to whatever suits you
Change the Anchor to Middle Center
Change Alignment to Center
Change Font Style to Bold
(just my preference :))
(all of the above are used for better UI)
2. Now we need to add the Dictation Input Manager script in the MR Toolkit to the scene for proper dictation logic to be implemented. Let’s create an empty GameObject under Managers and call it DictationManager. To this click on ‘Add Component’ and add the Dictation Input Manager script
3. Let’s go back to the Cube’s ColorandManipulation script and add the logic for Dictation
The class needs to inherit from IDictationHandler which consists of 4 interfaces which need to be implemented mandatorily for proper dictation. So first add the IDictationHandler as an inherited class
The IDictationHandler belongs to the namespace InputModule so utilize the using derivative to indicate that
In Visual Studio, if you are using IntelliSense, it will show you an error on the IDictationHandler if you see my screenshot, and also recommend a potential fixes. Simply click on the ‘Show potential fixes’ and you’ll see an ‘Implement Interface’ option.
Click on that to add all the 4 methods instantly into your code. The error on IDictationHandler should disappear
Now let’s add our logic into the interfaces. Let’s break it down:
a) We want the dictation text to be saved into the DictationOutput text object and displayed on the screen while we’re talking
b) We want the color of the cube and the DictationOutput text to change to red when we are dictating
c) We want the color of the cube and the DictationOutput text to change to green when we stop dictating
We need a TextMesh object to be initialized. I called it dictationOutputText
When the Dictation is finished, the result is stored in eventData.DictationResult. This needs to be assigned to the dictationOutputText variable. There is an interface called OnDictationComplete(). This is where we will do this. This is where we will also change the colors of the text mesh and the cube to green since dictation is complete. So we are also achieving c) here
If there is an error while dictating, for some reason, the text color and cube color still needs to be white. So we will ensure that by adding that code in OnDictationError() method
For b) we will just add an extra line to change the text color to red in the OnVoiceCommand() method
4. Now we need to add the logic where the StartRecording() and the StopRecording() functions of the Dictation Input Manager is called.
(I reused the script logic in the DictationRecord.cs in the Dictation Test scene which came with the MR Toolkit’s examples package and modified it for this example)
I wrote a function called ToggleRecording() within which we add the logic if recording is occuring currently, then we call stoprecording once the recording is complete. Add the isRecording flag and initialize this accordingly. We need this flag to indicate whether the recording is completed or still ongoing.
We then call this ToggleRecording() function in our OnVoiceCommand() method
You’ll notice that the StartRecording() has three parameters.
5f is the initialSilenceTimeout i.e it’ll listen to the user for any dictation input for 5s and if nothing is input, it gives a timeout message
20f is the timeout in between, if the user doesn’t speak for 20s in between the recording, then a timeout occurs here too
10 is the recording time in seconds.
You can provide whatever you want. I provided these for this example
We need to set the isRecording flag appropriately in the interface methods of the IDictationHandler
The code for the ToggleRecording() method is shown in the screenshot
Add isRecording = false in the OnDictationError() method since upon error, the recording did not occur. You also need to call the StopRecording() function since dictation did not occur
We also need to implement the OnDictationHypothesis() and the OnDictationResult() methods, else it is going to throw an exception. I assigned the output of the dictationresult to the dictationoutputtext object in both these methods.
The final code in the ColorandDictationManipulation.cs script is shown in the screenshots
5. That’s it, the dictation part is also done. Before you try it. do not forget to enable the ‘InternetClient‘ capability under Edit->Project Settings->Player -> Publishing Settings->Capabilities
Now build and deploy onto the Emulator. Say “Start Recording” and let me know if this works.
I have seen that the microphone on the Hololens works much better for dictation than the one on the laptop for the Emulator. I was screaming into the laptop microphone and my dog was wondering what the hell was going on 😉 Nevertheless, it works.
You can say “Start Recording” multiple times to initiate recording sequence all over again. Notice that the cube and text change to green when recording is complete and when you say “Start Recording”, it reverts to red
Possible Errors and solutions
Error: Dictation recognizer failed to start!
Solution: I broke my head for several hours over this only to realize that InternetClient is a must have capability for the dictation to work properly. Along with the other Player Settings’ tweaks, make sure that the Microphone and the InternetClient are checked under Capabilities
BOTH capabilities are must-haves for the dictation to be successfully recognized
Make sure also to connect the Hololens to the WiFi
Error: NullReferenceException in OnVoiceCommand()
Solution: Make sure to check if all Inspector values in the ColorandDictationManipulation script are assigned
Error: NotImplementedException: The method or operation is not implemented.
Solution: One of the InterfaceMethods are not implemented properly.
Hope this tutorial helped you 🙂 Let me know via the comments section