Dictation in Hololens apps

Use Dictation in your Hololens apps using the MR Toolkit

Sorry, I have been away for a vacation (had a break from augmented reality to actually enjoy reality too :)). But now I am back and with a new tutorial to show you how to use Dictation in your Hololens apps. Simply put, how you can use speech that converts to text in your apps. (SpeechToText)

The Mixed Reality Toolkit makes it very easy for us to do this with the help of a DictationInputManager class.

The MR Toolkit examples package even comes with an example scene called DictationTest if you want to try it out.

Do the basics of creating a new Unity project and importing the MR Toolkit.

Delete the Main Camera and add the  MixedRealityCameraParent.prefab from the MR Toolkit.

Change the Camera Settings to Hololens preferred:

1. Skybox to Solid Color

2. Background to 0,0,0,0

Save the scene.

Objective

So in this tutorial, we will have a cube in the scene which will be our visual feedback that recording has started and ended.

We would use Speech input to say “Start recording”, after which the cube changes color from white to red. The dictated text appears below the cube. While dictation is ongoing, the color is still red for both the dictated text and the cube.

When dictation has ended, the color of both the cube and the text changes to green.

First let’s accomplish the part where we get the speech done:

1. Get a white cube in the scene

2. Add speech input to the scene

3. Add voice command “Start recording” which changes color of the cube to red

Since I already covered speech input in a previous tutorial , I won’t be covering it in detail and will walk you through the speech part briskly.

1. Add a ‘Cube‘ GameObject in the Unity scene

2. Add an empty GameObject and call it ‘Managers‘ or something relevant. We will add all the Manager class objects into this as child object (Let’s stay organized :-))

3. Add a child GameObject to the ‘Managers‘ and add an empty child GameObject to it. Call it ‘SpeechGameObject‘ or something relevant. To this, we will add the ‘Speech Input Source‘ script from the MR toolkit via the ‘Add Component‘ in the Inspector (see my Voice Input tutorial for screenshots and complete descriptions)

4. We just have one keyword called “Start recording“. So let’s add that in the Speech Input Source Inspector variables

5. Now we need to add the Speech Input Handler to the ‘Cube‘ object. Click on the ‘Add Component‘ part of the Cube and search and add ‘Speech Input Handler’ from the MR Toolkit. Check the ‘Is Global Listener’ flag on the Speech Input Handler.

6. Before we assign keywords, let’s write the code which changes the color of the cube. For this click on the ‘Cube‘ and via ‘Add Component‘ in the Inspector, add a C# script and call it ‘ColorandDictationManipulation‘ or something relevant. This will also be the script where our Dictation logic will go.

7. Right click on the ‘ColorandDictationManipulationscript to open it up in Visual Studio

We will write a function called OnVoiceCommand() which should be called upon Speech Input, in which the code to change the color of the cube goes. For this we need a renderer which needs to be initialized in the Awake() function.

We also need an Inspector variable so that we can refer to our Cube to be changed to a different color. I call this variable objectToBeManipulated

So you can drag and drop whichever object you want to manipulate in objectToBeManipulated field if you want.

Now get the Renderer component of the objectToBeManipulated (in this example, the cube) into the renderer variable

Change the Material color on the renderer (i.e now the cube) to red.

Your script so far should look like the above code

8. Now go back to the Cube Inspector, and add the keywords and the methods on the Speech Input Handler

Select dropdown and choose “Start Recording” in the Speech Input Handler

Choose the object Cube and select the OnVoiceCommand method in the Response() section

Choose the Cube as the Object To Be Manipulated in the ColorandDictationManipulation script

10. Go to Edit -> Project Settings -> Player and under Pubishing Settings
> Capabilities,  make sure Microphone is checked

11.  Adjust the cube in the scene to make the camera see it.

My values for the MRCamera and the Cube are shown in the screenshots

 

12. Now for the last bit before we try whether the Speech part works fine. If you remember the Speech Input tutorial, we need to add the InputManager in the MR Toolkit also to the scene. In the Project window, search for InputManager prefab and drag and drop it as a child under Managers.

Let’s see if this works so far… Build and deploy!

At this stage, if you say “Start Recording”, then your cube should turn red.

Now let’s move on to the Dictation part:

1. We first add a 3D Text object as a child object to the Cube. This is to display the dictation text. Call it DictationOutput or something relevant. Position this as text below the cube (or anywhere you want to see the dictation text)

I changed the scale on the Cube to 0.5,0.5,0.5

Since we want to prompt the user to say “Start Recording” to start the dictation function, let’s change the text on the DictationOutput to Say “Start Recording”

Change the scale on the DictationOutput object to 0.02, 0.02, 0.02 with Font Size 70. It just looks nicer and visible better. You can ofcourse adjust it to whatever suits you

Change the Anchor to Middle Center

Change Alignment to Center 

Change Font Style to Bold

(just my preference :))

(all of the above are used for better UI)

2. Now we need to add the Dictation Input Manager script in the MR Toolkit to the scene for proper dictation logic to be implemented. Let’s create an empty GameObject under Managers and call it DictationManager. To this click on ‘Add Component’ and add the Dictation Input Manager script

3. Let’s go back to the Cube’s ColorandManipulation script and add the logic for Dictation

The class needs to inherit from IDictationHandler which consists of 4 interfaces which need to be implemented mandatorily for proper dictation. So first add the IDictationHandler as an inherited class

The IDictationHandler belongs to the namespace InputModule so utilize the using derivative to indicate that

In Visual Studio, if you are using IntelliSense, it will show you an error on the IDictationHandler if you see my screenshot, and also recommend a potential fixes. Simply click on the ‘Show potential fixes’ and you’ll see an ‘Implement Interface’  option.

Click on that to add all the 4 methods instantly into your code. The error on IDictationHandler should disappear

Now let’s add our logic into the interfaces. Let’s break it down:

a) We want the dictation text to be saved into the DictationOutput text object and displayed on the screen while we’re talking

b) We want the color of the cube and the DictationOutput text to change to red when we are dictating

c) We want the color of the cube and the DictationOutput text to change to green when we stop dictating

For a),

We need a TextMesh object to be initialized. I called it dictationOutputText

When the Dictation is finished, the result is stored in eventData.DictationResult. This needs to be assigned to the dictationOutputText variable. There is an interface called OnDictationComplete(). This is where we will do this. This is where we will also change the colors of the text mesh and the cube to green since dictation is complete. So we are also achieving c) here

If there is an error while dictating, for some reason, the text color and cube color still needs to be white. So we will ensure that by adding that code in OnDictationError() method

For b) we will just add an extra line to change the text color to red in the OnVoiceCommand() method

4. Now we need to add the logic where the StartRecording() and the StopRecording() functions of the Dictation Input Manager is called.

(I reused the script logic in the DictationRecord.cs in the Dictation Test scene which came with the MR Toolkit’s examples package and modified it for this example)

I wrote a function called ToggleRecording() within which we add the logic if recording is occuring currently, then we call stoprecording once the recording is complete. Add the isRecording flag and initialize this accordingly. We need this flag to indicate whether the recording is completed or still ongoing.

We then call this ToggleRecording() function in our OnVoiceCommand() method

You’ll notice that the StartRecording() has three parameters.

5f is the initialSilenceTimeout i.e it’ll listen to the user for any dictation input for 5s and if nothing is input, it gives a timeout message

20f is the timeout in between, if the user doesn’t speak for 20s in between the recording, then a timeout occurs here too

10 is the recording time in seconds.

You can provide whatever you want. I provided these for this example

We need to set the isRecording flag appropriately in the interface methods of the IDictationHandler

The code for the ToggleRecording() method is shown in the screenshot

Add isRecording = false in the OnDictationError() method since upon error, the recording did not occur. You also need to call the StopRecording() function since dictation did not occur

We also need to implement the OnDictationHypothesis() and the OnDictationResult() methods, else it is going to throw an exception. I assigned the output of the dictationresult to the dictationoutputtext object in both these methods.

The final code in the ColorandDictationManipulation.cs script is shown in the screenshots

 

5. That’s it, the dictation part is also done. Before you try it. do not forget to enable the ‘InternetClient‘ capability under Edit->Project Settings->Player -> Publishing Settings->Capabilities 

Now build and deploy onto the Emulator. Say “Start Recording” and let me know if this works.

I have seen that the microphone on the Hololens works much better for dictation than the one on the laptop for the Emulator. I was screaming into the laptop microphone and my dog was wondering what the hell was going on 😉  Nevertheless, it works.

You can say “Start Recording” multiple times to initiate recording sequence all over again. Notice that the cube and text change to green when recording is complete and when you say “Start Recording”, it reverts to red

Possible Errors and solutions

Error: Dictation recognizer failed to start!

Solution: I broke my head for several hours over this only to realize that InternetClient is a must have capability for the dictation to work properly. Along with the other Player Settings’ tweaks, make sure that the Microphone and the InternetClient are checked under Capabilities

BOTH capabilities are must-haves for the dictation to be successfully recognized

Make sure also to connect the Hololens to the WiFi

Error: NullReferenceException in OnVoiceCommand()

Solution: Make sure to check if all Inspector values in the ColorandDictationManipulation script are assigned

Error: NotImplementedException: The method or operation is not implemented.

Solution: One of the InterfaceMethods are not implemented properly.

Hope this tutorial helped you 🙂 Let me know via the comments section

11 thoughts on “Use Dictation in your Hololens apps using the MR Toolkit

  1. Hello and thank you for the tutorials. The voice command tutorial helped me alot. I was wondering if you could do a tutorial on text to speech. This topic seems to have very little info available. The only tutorial I could find is no longer valid since many thing keep changing in the mixed reality toolkit. Please state whether text to speech is supported in the Unity editor. I get a message in the debug window stating that it’s not and have read online that it’s not. I haven’t been able to test it on the hololens or emulator at the moment. And please mention any inheritance or basically the correct way to get it working. You do a great job explaining the proper steps. Thank you so much. Glad you are back!

  2. Hey Jonathan, Thanks for your feedback! Sure, text to speech is already on my backlog. I had done the TextToSpeech a while ago with the older version of the HoloToolkit. I am yet to upgrade it to MR Toolkit. I’m sure that Unity and the MR Toolkit has the TextToSpeech component because I have used it last year. Will post the tutorial soon enough. Thanks and have a nice weekend! 🙂

  3. Hallo,
    Hello and thank you for the tutorials.somehow the Start courtnie function is not working . iam getting the NotImplementedException . can you please tell me where the error could be and how to fix it?

  4. Hallo NISCHITA:).somehow the Start courtnie function is not working . iam getting the NotImplementedException . can you please tell me where the error could be and how to fix it?

  5. Hallo NISCHITA 🙂
    StartCoroutin is not working. iam getting the NotImplementedException error. could you please help me fix it?

  6. Hi Ahmad, Could you send me a screenshot of the output where you get the exception to codeholo@gmail.com
    And also make sure you are using the IDictationHandler derivative and namespaces correctly.
    NotImplemented exception occurs when the methods aren’t present.

  7. Hello, Is it possible to have dynamic commands implemented ? for instance I have a list of game objects lets say of shapes. could i use a command like Open or Show followed by the game object name and it finds the object that closely matches the name and opens it ? or do i need to use a switch case and hardcode each command?

  8. Hi Nischita, I’m having the error “Dictation completed unsuccessfully: Canceled.” or “Dictation completed unsuccessfully: UnknownError”.
    So it basically works when I start the application, but when the application loses focus or after some time has passed it automatically stops working. This is very killing as my application needs to listen to user input all the time.

    Do you know what I can do to fix this?

Leave a Reply

Your email address will not be published. Required fields are marked *