3 Dialogue with the interlocutor
Voice recognition (or ASR, automatic speech recognition) now allows you to include the interlocutor in the dialogue process with the user, rather than always waiting for the user to initiate communication.
Voice recognition by default uses Groq's AI, but you can change your provider in :
Parameters > Home
IMPORTANT: Currently, we use an external button to more easily trigger voice recognition, to help the speaker start a conversation more easily, and the user to avoid clicking a button. It also reduces bugs in noisy environments like hospitals.
If you don't have an external button, click on the microphone icon in the IGOOR software to start voice recognition, and click again to stop it:

The button triggers the transcription of the interlocutor's words.
See also 2 - External button for voice recognition
THE AI TRANSCRIBES YOUR SENTENCE, "REASONS" AND GENERATES PHRASE PREDICTIONS
TRANSCRIPTION
Your choice of speech recognition model (see Local or Cloud ASR) can affect the quality of the transcription. However, language models often overcome transcription "hallucinations" by using context, for example, the ongoing conversation.
Here is an example of a sentence spoken by the interlocutor and perfectly transcribed by the AI:
REASONING
The AI "reasons" before generating responses. This process, invisible to the user, is nevertheless important to understand how it works, and the fact that it takes into account the user's preferences (in this case, dietary preferences):
We need to consider Igor's preferences: prefers fish, soups, light, soft foods, dislikes heavy, likes fish, likes miso, foie gras? He likes foie de veau, foie de morue, but not heavy? He likes fish, soups, light, avoids heavy meat. He also has difficulty chewing and swallowing, prefers soft foods, fish, soups, non-spicy Asian soups. He doesn't like rice cake. So for options: fish (good), foie gras (maybe heavy and rich, but could be okay?), meat (likely heavy, not preferred), spaghettis (hard to chew, not ideal), fruits (soft, okay). So we can give positive responses for fish and maybe fruits, neutral for foie gras, negative for meat and spaghettis. Need between 3 and 6 answers total, spread across columns: left (positive), center (mitigated/alternative/ironic), right (negative). Left can have 1-2 answers, center 1-2, right 1-2. Must have at least 3 total. Provide up to 6.
We also need to incorporate Igor's ironic style at most 1 in 6. So maybe one answer is ironic, placed maybe in center.
NOTE: Reasoning is not supported on all models. Reasoning can also make the response generation time slightly longer, but generally improves the quality of predictions.
GENERATION AND DISPLAY OF RESPONSES
The responses are then structured visually in three columns.
The principle is the same as with daily needs:
- On the left, we have "positive" responses (in this case, the acceptance of fish or fruit);
- On the right, we have "negative" responses (in this case, foods he refuses);
- In the center, we have a more nuanced response.
Several sentences can be selected and thus spoken through text-to-speech, one after the other: in this specific case, the user could choose several to indicate all his preferences for the meal.
TIPS FOR INTERLOCUTORS
- Use SIMPLE SENTENCES
- Speak with CLEAR PRONUNCIATION
- Provide AS COMPLETE INFORMATION AS POSSIBLE TO SIMPLIFY THE AI'S LEARNING AND STRENGTHEN THE RELEVANCE OF THE RESPONSES