After a long vacation, I’m finally back and eager to bring you this article on voice and text annotations. I think many of us who use the HoloLens, especially for business purposes, have this use-case of annotating something in and around our workspaces.
This tutorial is a combination of what we have learnt in the Dictation Example and the Speech to Text example in the blog series.
To start off with, I’ve performed a set of upgrades (it is especially important in the Mixed Reality world to stay on top of updates because things keep changing at a crazy pace!)
P.S: Microsoft recently sent an email to all it’s subscribed users that a mandatory HoloLens update is coming soon. This should also add a bunch of features. 🙂
1. I’ve upgraded to latest Unity version 2018.2.13f1. Create a new project.
2. I’m using the latest stable release of the HoloToolkit 2017.4.2.0 Import it into your project
3. Do the usual tweaks:
a. Delete the main Camera. Add the MixedRealityCameraParent.prefab in the Hierarchy. This will serve as the Main Camera
b. Change the Camera Settings to Hololens preferred:
Skybox to Solid Color
Background to 0,0,0,0
In the MixedRealityCameraManager script which is attached to the MixedRealityCamera object, change the Clear Flags value to Solid Color
Ensure that the Background Color is black
c. Go to Edit ->Project Settings -> Player -> Other Settings -> and change the Scripting Runtime version to .NET 4.x Equivalent and Scripting Backend to .NET
Make sure also that in XR Settings, Virtual Reality Supported is checked
We need both the InputManager and the DefaultCursor in the scene
4. InputManager
Drag and drop the InputManager.prefab into your scene
5. DefaultCursor
Drag and drop the DefaultCursor.prefab into your scene. In the InputManager’s Inspector Settings, drag and drop the DefaultCursor into the Cursor field of the SimpleSinglePointerSelector script
d. Save the scene
How the app is designed:
There are two cases: One where the user creates annotations from scratch and the other where the user plays an existing annotation.
So let’s imagine that we want to walk around and leave some annotations in the room. We would start this off with the user speech input for a command, let’s say “Annotate”. Once the user says “Annotate”, we will instantiate an object to denote that we have made an annotation.
Then we start dictation by tapping on the object instantiated, at which point there will be some form of color coding. For example: when it is in dictation mode i.e when the user is still recording his/her voice, color of text will be red. The user taps on it again to stop recording. Then text changes to a different color, in this case:green.
Then when we walk around and tap on the objects, it plays out the annotation.
So let’s get the speech part first:
We need to be able to add a voice command and call a method in our Annotate script. But let’s not get ahead of ourselves.
First, for the speech part as explained in detail in the article in the blog series, we add the corresponding SpeechInputSource.cs and SpeechInputHandler.cs scripts to our scene.
Since we are going to have a lot of Manager scripts, we will organise all of these under an empty GameObject called “Managers“. Create an empty GameObject called SpeechManager. Drag the InputManager to the Managers.
Click on the AddComponent on the SpeechManager and add the SpeechInputHandler.cs script.
Before we go on to add the keyword to the SpeechInputHandler, we need to add the SpeechInputSource.cs to the scene.
Click on the AddComponent on the SpeechManager and add the SpeechInputSource.cs script to it.
Click on the plus sign and add the keyword “Annotate”
We will come back shortly to add the keyword Annotate and the logic to it.
For that, we need to move on to the Dictation part:
So we must be able to add a GameObject to represent an annotation that has been added.
For this purpose, the GameObject I’ve chosen to be instantiated whenever I say “Annotate”, are red Sticky notes from the Asset Store https://assetstore.unity.com/packages/3d/props/office-supplies-71580
I have deleted all other assets from the office supplies since I don’t need them. You can even simply have a cube in it’s place if you want.
So the logic is like this, When I use the voice command, “Annotate”, a red sticky note is instantiated denoting that it is in the recording mode. I also want to add a 3D Text object above the Sticky note which will be a text container for our dictated text. For this we first need to use the concept of Prefabs. Let’s get around to creating one.
First, drag and drop the Sticky_note_red from your Assets folder to the scene. Add a 3D Text object as a child object. Call it SpeechToTextOutput or something similar. Modify the SpeechToTextOutput object accordingly to whatever font size and style and color you want. Included is a screenshot of my SpeechToTextOutput object. Add some sensible text when it starts like: Tap to start recording, tap again to stop
Also we need to be able to tap on the Sticky_note_red prefab to be able to start recording. Therefore, add a Box collider to it. Else Tapping won’t work!
Now click on the Sticky_note_red object and Add Component. Call it AnnotateScript.cs or something similar. Edit this script in Visual Studio.
First let’s write a method called AnnotateOnSpeech(). This method will be called when we use the voice command “Annotate”
In this method, we will call the object to be instantiated simply as objectToBeInstantiated. Initialize it on the top as a private variable. Also include a public GameObject variable as prefabObject. We will use the Instantiate() function to instantiate a prefabObject, in this case the Sticky_note_red along with the text (we will shortly create a prefab) will be instantiated at the position of the cursor (ideally where the user is looking).
Next in your Assets folder (you can have it also in a folder inside the Assets folder), right click and Create Prefab. Drag and drop the Sticky_note_red (along with it’s child) to the Prefab object you just created. If it is done rightly, you will see the Sticky_note_red object
In the Inspector of the Sticky_note_red, drag and drop to prefabObject , whatever object prefab we just created, to instantiate. In this case, I will drag the New Prefab to it
cursor is also public, we will drag our DefaultCursor object to it
Now going back to the AnnotateScript.cs:
Include the instantiation code I have provided in the AnnotateSpeech() method.
Now let’s complete our Speech part:
Go to SpeechManager object and edit the SpeechInputHandler Inspector to drag and drop the Sticky_note_red object and under Methods, look for AnnotateSpeech(). Check also the Persistent Keywords in SpeechInputSource.
Make sure also that under Capabilities the Microphone is checked
Now build and deploy onto HoloLens to test at this intermediate stage
A minor change that has been made: by Microsoft, regarding the scripting backend for UWP. .NET is now deprecated and will be removed in future. I tried to switch to IL2CPP in the Scripting Backend, however the build took FOREVER!! So until the support is completely dropped, simply ignore this error and build the normal way.
(On a sidenote, hope Unity changes it’s mind or comes up with a better solution for this)
So when I built this, everything went fine. But when I deployed this, I encountered a really nasty error. One that took me a lot of time and talking to a lot of people to figure out. (You can also check this in Errors and Solutions at the bottom of this article)
Deployment threw a bunch of errors showing Unity TextMeshPro as the culprit. So I just went over to Unity and removed the package from Package Manager.
Go to Unity->Window->Package Manager and a pop up appears.Click on TextMeshPro and Remove. While you are at it, also remove the Analytics Library, Ads and In App purchasing. We don’t need any of this and it will speed up the build process. (OPTIMIZE!!!)
Now build and deploy and let’s see what happens.
When the app is deployed, you will see a popup asking you to enable Microphone. Tap on Yes. Now whenever you say “Annotate”. You should be able to see a new Sticky_note_red along with a 3D Text object instantiate.
The speech part is good.
Let’s go over to the dictation part:
For this again the HoloToolkit provides us the IDictationHandler which we can implement
Make the class derive the IDictationHandler
Your Unity console is going to complain of a namespace not being there. Add using HoloToolkit.Unity.InputModule;
There will be some more errors on your Unity console that interface members are not implemented. Let’s go over to the script in VS. The IDictationHandler needs to implement 4 different methods as shown in the screenshot. The errors will disappear on your console.
Also create an audioSource field and attach to your prefab sticky_note_red an AudioSource via Add Component.This will be used later to play back the dictation. Initilialize this in the AnnotateSpeech() method
Now we need to assign to speechToTextOutput the result of the dictation. Do this in all four methods of the IDictationHandler.
We also assign the color green to the speechToTextOutput in the OnDictationComplete() method- to denote that recording is complete, and in case of an error, we keep it white- in the method OnDictationError().
Then we save the dictation in an audio clip which we want to play later. So we create a private variable called dictationAudioClip or something similar. IDictationHandler saves it’s audio clip to eventData.DictationAudioClip. Assign this to dictationAudioClip in OnDictationComplete() method only. We don’t want to save the dictation in case there was an error.
We have taken care of the part where the speech text changes it’s color accordingly (Scroll down for full script)
How do we go about starting dictation recording then?
We now add the tap logic as we described it in the initial part of the article. Since we are using tapping, we use IInputClickHandler. (For a detailed explanation of how tapping is done, please see the tapping at objects tutorial) In it’s OnInputClicked() method, i.e whenever a tap is recognized on a clone of the New Prefab object, we simply call a function called Recording().
In the Recording() function we call the DictationInputManager’s StartRecording function. We pass to it 4 objects. First one is a listener object. You can pass null to it. The script DictationInputManager takes care of the listener. The next three variables represent:
an initialtimeout – i.e until which the dictation listener will wait for you to speak in the current session and then time out
an autosilencetimeout – i.e until which the listener will wait for you to speak and due to lack of audio input will timeout automatically
recording time in seconds
Provide whatever values (in float) that seems comfortable to you.
Don’t forget to add the color of the speechToTextOutput to red because it is in recording mode.
Make sure you initialize dictationAudioClip in the Awake() method. Else it will throw a nullreferenceexception.
(Scroll down for full script)
Before building, don’t forget to enable the capability ‘InternetClient‘. This is most required for Dictation to work. Also make sure your HoloLens is connected to the Internet.
Let’s try building and deploying until this stage to check if dictation works.
So if you get a NullReferenceException at this stage, that’s because I sheepishly forgot to add a DictationInputManager.cs which the InputManager is screaming at me for.
But we need to create a dummy object to which we can attach this script to. Let’s call it DictationManager. Add Component and attach DictationInputManager script to it.(Dictation tutorial is explained in detail in this article: https://codeholo.com/2018/03/17/dictationexamplehololens/)
Build and try again. Remember you need to tap on the sticky note, to start recording at which point the color of text is red. Then tap on it again to stop recording, then color of text is green.
Let’s add some more logic to it. Now if the recording mode is already on, we need to ensure that tapping on it stops recording. Therefore we need a check for this in the Recording() method. isRecording can be a simple bool value to do this. Don’t forget to initialize it to false in the Awake() method. It also needs to be false when there is an error in Dictation i.e the OnDictationError() method. We also need to call the StopRecording() function in the OnDictationError() method.
Now for the part where we play back the dictation clips:
We will finally use the audioSource which we had in the AnnotateSpeech() method.
First we initialize it in the Awake() method
Next we assign the dictationClip to the audioSource.clip when Dictation is complete, this is to store the audio clip.
We also want to make sure that if any audio is playing, when we tap on the sticky note, the audio is stopped. We also want to ensure that audio is only played when the text is green. Otherwise,tapping on it should not play back the audio- it should call the Recording() function. Include this check in the OnInputClicked() method.
We will also write a PlayAudio() function which is called in the event that the tapped object’s text is green.
We need a flag to check if audio is already playing. Create a bool field called audioState. Initialize this to false in the Awake() method. Make sure this is also set to false when the tapped object’s text is green. And it is set to true after the audio is played.
Lastly, we also have to add an Update() function which constantly checks the values of audioSource and audioState. And in case the audio is not playing and audioState is false and audioSource itself is not null, we are just going to stop playing the audioSource and set the audioState to false.
Full Script AnnotateScript.cs:
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using HoloToolkit.Unity.InputModule;
public class AnnotateScript : MonoBehaviour, IDictationHandler, IInputClickHandler
{
//Initialization variables
private GameObject objectToBeInstantiated;
public GameObject prefabObject;
public GameObject cursor;
public TextMesh speechToTextOutput;
private AudioSource audioSource;
private AudioClip dictationAudioClip;
private bool isRecording;
private bool audioState;
private void Awake()
{
dictationAudioClip = GetComponent<AudioClip>();
audioSource = GetComponent<AudioSource>();
isRecording = false;
audioState = false;
}
public void Update()
{
if (audioSource != null && !audioSource.isPlaying && audioState)
{
audioSource.Stop();
audioState = false;
return;
}
}
public void AnnotateOnSpeech()
{
//Instantiation code
objectToBeInstantiated = Instantiate(prefabObject, cursor.transform.position, Camera.main.transform.rotation);
objectToBeInstantiated.transform.position = cursor.transform.position; // Instantiate object at cursor position
audioSource = objectToBeInstantiated.GetComponent<AudioSource>();
}
//Methods that need to be implemented fór the IDictationHandler
public void OnDictationComplete(DictationEventData eventData)
{
speechToTextOutput.text = eventData.DictationResult;
speechToTextOutput.color = Color.green;
dictationAudioClip = eventData.DictationAudioClip;
audioSource.clip = dictationAudioClip;
}
public void OnDictationError(DictationEventData eventData)
{
speechToTextOutput.text = eventData.DictationResult;
speechToTextOutput.color = Color.white;
isRecording = false;
StartCoroutine(DictationInputManager.StopRecording());
}
public void OnDictationHypothesis(DictationEventData eventData)
{
speechToTextOutput.text = eventData.DictationResult;
}
public void OnDictationResult(DictationEventData eventData)
{
speechToTextOutput.text = eventData.DictationResult;
}
public void OnInputClicked(InputClickedEventData eventData)
{
//check if tapped object's child's Textmesh is not null
if (eventData.selectedObject.GetComponentInChildren<TextMesh>() == null) { Debug.Log("Text mesh is null"); }
if (eventData.selectedObject.GetComponentInChildren<TextMesh>().color == Color.green)
{
audioSource.Stop();
audioState = false;
PlayAudio();
}
else
{
Recording();
}
}
public void PlayAudio()
{
if (audioSource == null || audioSource.isPlaying)
{
Debug.Log("Audio source is null or is playing");
return;
}
audioSource.Play();
audioState = true;
}
private void Recording()
{
if (isRecording)
{
isRecording = false;
StartCoroutine(DictationInputManager.StopRecording());
speechToTextOutput.color = Color.white;
}
else
{
isRecording = true;
StartCoroutine(DictationInputManager.StartRecording(null, 5f, 20f, 10));
speechToTextOutput.color = Color.red;
}
}
}
That’s it. End of a longggg tutorial. Build and deploy. It helps to have a lot of debugging statements in between so that you can test if each function is reached. Let me know how it goes 🙂
Here’s a video of the output:
Errors and solutions
Error: Unity TextMeshPro! Deployment fails with a bunch of errors.
Solution: So this package is now automatically bundled in Unity’s projects. This is of no use to me for this project, so if I simply remove the package and build and deploy it works.
Go to Unity->Window->Package Manager and a pop up appears.Click on TextMeshPro and Remove.
Error: Dictation Recognizer does not start
Solution: For the Dictation Recognizer to work correctly, enable ‘InternetClient’ under Capabilities. Also make sure your HoloLens is connected to the Internet.
Error: Tapping won’t work on the Prefab.
Solution: Make sure you have added a Box Collider to the Sticky_note_red. Also ensure that you are checking for the right name: in this case, if the tapped object is New Prefab(Clone) in the code
Error: Null reference exception of Dictation Manager.
Solution: Need to add to the scene an empty game object to which the DictationInputManager.cs script can be attached to. The InputManager looks for this in order for the DictationManager to work. (Dictation tutorial is explained in detail in this article: https://codeholo.com/2018/03/17/dictationexamplehololens/)
You are the light bringer:)
amazing tutorial… once again you managed to hit right in the super-needed MR element.
Thanks and keep up the amazing work!
Hi Ariel, thank you so much for the positive words 🙂 will be glad to also learn where else I could focus my articles on.
Hi:).
Well… Im part of a startup who builds AR tutorials for industrial employees… that’s kind of big thing these days (you can see it in all the big guns HW use case…. MS, HTC, ML, PTC).
So a good thing to focus on is things related to CAD handling + the UX aspects- which you already post a few great blogs before ( solver system / scene loading / Object Manipulation).
But! if I really need to point on something that can be awesome to learn is how to handle spawn objects which play well with save and load futures (instantiate / resource load/ json /cloud services)
its seem there is a major shortage on tutorials regarding these issues, especially in hololens:) thanks and I hope this was helpful;)
Ariel
Absolutely! Thanks for the tips. I am currently informing myself about Azure and how it could provide some cool things for HoloLens- Let’s see if I find some more time to explore the topics you have suggested. 🙂
Hello Nischita,
I’m doing this tutorial, But i can’t spawn the object in the cursor place, always give me 0,0,0. I’m using the cursor with feedback. i did not test in hololens yet, But in unity it spawns always in 0,0,0. How can i fix it?
Thanks for all the amazing tutorials.
Hello,
I’m doing this tutorial, but i can’t spawn the object in the cursor position.
I’m using the cursor with feed back.
It always spawn at 0,0,0. I only used it in Unity, did not test this in Hololens yet. How can i fix this?
Thanks for the amazing tutorials.
Hi Rafael,
Welcome! Glad they are helping you. For this very example, it’s best to try with the HoloLens, if you get your hands on one since the object is instantiated at the cursor position. Have you included this code when you want to instantiate your annotation object?
objectToBeInstantiated = Instantiate(prefabObject, cursor.transform.position, Camera.main.transform.rotation);
objectToBeInstantiated.transform.position = cursor.transform.position; // Instantiate object at cursor position
In Unity, it probably would still spawn at 0,0,0 since the cursor is at 0,0,0 and doesn’t understand the concept of 3D and gaze. So the gaze of the user probably just keeps the cursor at 0,0,0. Therefore, I urge you to try this on the HoloLens and let me know!
I’m using this for now:
var headPosition = Camera.main.transform.position;
var gazeDirection = Camera.main.transform.forward;
RaycastHit hitInfo;
if (Physics.Raycast(headPosition, gazeDirection, out hitInfo))
{
objectToBeInstantiated = Instantiate(prefabObject);
objectToBeInstantiated.transform.position = hitInfo.point; // Instantiate object at cursor position
audioSource = objectToBeInstantiated.GetComponent();
}
but i will test the one that you made in Hololens this week
Hi Rafael, The code looks fine. So it should essentially work on HoloLens.
Hello Nischita,
i have written you an email. I trie to do your tutorial. Its great and the first tutorial of dictation works fine, but i have rly Problems with annotation.
My first problem is that i cant figure out sometimes where i should add some components prefab or my cube/sticky not in the scene. Is their still the sticky note in the hirachy after build? The second problem i have i get some Errors in Visual Studio. The audiosource = getcomponent() are not allowed to be empty. This issue progs different times of in Code. 4 Times if i remind rightly.
Do you have a GitHub account where i could with the Programm that i can figure out why i cant get the tutorial done?
All the best
Sascha
Hi Sascha, Yeah I responded to your email already yesterday. I will set up the GitHub account tomorrow and send you the link.
Hi Nischita,
Thanks very much for the tutorial. I’m curious if you have any thoughts on how this annotation would be altered to stick to virtual objects? For example if I annotated a certain part of a model, then moved it, the annotation would stay connected to the same piece of the model.
Hi James, Interesting question. For annotations to “stick” to any virtual object (whole or part), I would create an annotation as a child of the virtual object as a very simple solution. This way, whenever the part is moved, the annotation moves with it as well. Let me know if you have cooler ideas to do so 🙂
Hello Nischita,
Thanks a lot for such an awesome blog. I would like to know is it possible to annotate 3D models (prefabs) using HoloLens 2 and then store those annotations with prefab?
Could you please answer this?
Moreover, I have already sent you an email so could you please take a look into it..
Hi Ruchir, Thanks, I’m glad it helps you 🙂
I guess with the HoloLens 2, it’s the same thing. First, create a TextMeshPro prefab. (TextMesh is no longer recommended as it will soon be deprecated). So you’ll need to import TextMeshPro package. And then on runtime, change this TMPro’s Text field with whatever you want to annotate with. Now, i don’t know what exactly the use case is: but if you are looking for tooltips to annotate different parts of a 3DModel- then this is also done very easily https://microsoft.github.io/MixedRealityToolkit-Unity/Documentation/README_Tooltip.html I’m going to start writing articles for the HoloLens 2 features this weekend. Stay tuned!
Hey, i gave some comments on your reply via email. So if you get time, kindly reply.
Thanks,