3 Dialogue with the interlocutor

Voice recognition (or ASR, automatic speech recognition) now allows you to include the interlocutor in the dialogue process with the user, rather than always waiting for the user to initiate communication.

Voice recognition by default uses Groq's AI, but you can change your provider in :

Parameters > Home

IMPORTANT: Currently, we use an external button to more easily trigger voice recognition, to help the speaker start a conversation more easily, and the user to avoid clicking a button. It also reduces bugs in noisy environments like hospitals.

If you don't have an external button, click on the microphone icon in the IGOOR software to start voice recognition, and click again to stop it:

ASR button

The button triggers the transcription of the interlocutor's words.

THE AI TRANSCRIBES YOUR SENTENCE, "REASONS" AND GENERATES PHRASE PREDICTIONS

TRANSCRIPTION

Your choice of speech recognition model (see Local or Cloud ASR) can affect the quality of the transcription. However, language models often overcome transcription "hallucinations" by using context, for example, the ongoing conversation.

Here is an example of a sentence spoken by the interlocutor and perfectly transcribed by the AI:

REASONING

The AI "reasons" before generating responses. This process, invisible to the user, is nevertheless important to understand how it works, and the fact that it takes into account the user's preferences (in this case, dietary preferences):

We need to consider Igor's preferences: prefers fish, soups, light, soft foods, dislikes heavy, likes fish, likes miso, foie gras? He likes foie de veau, foie de morue, but not heavy? He likes fish, soups, light, avoids heavy meat. He also has difficulty chewing and swallowing, prefers soft foods, fish, soups, non-spicy Asian soups. He doesn't like rice cake. So for options: fish (good), foie gras (maybe heavy and rich, but could be okay?), meat (likely heavy, not preferred), spaghettis (hard to chew, not ideal), fruits (soft, okay). So we can give positive responses for fish and maybe fruits, neutral for foie gras, negative for meat and spaghettis. Need between 3 and 6 answers total, spread across columns: left (positive), center (mitigated/alternative/ironic), right (negative). Left can have 1-2 answers, center 1-2, right 1-2. Must have at least 3 total. Provide up to 6. We also need to incorporate Igor's ironic style at most 1 in 6. So maybe one answer is ironic, placed maybe in center.

NOTE: Reasoning is not supported on all models. Reasoning can also make the response generation time slightly longer, but generally improves the quality of predictions.

GENERATION AND DISPLAY OF RESPONSES

The responses are then structured visually in three columns.

The principle is the same as with daily needs:

On the left, we have "positive" responses (in this case, the acceptance of fish or fruit);
On the right, we have "negative" responses (in this case, foods he refuses);
In the center, we have a more nuanced response.

Several sentences can be selected and thus spoken through text-to-speech, one after the other: in this specific case, the user could choose several to indicate all his preferences for the meal.

TIPS FOR INTERLOCUTORS

Use SIMPLE SENTENCES
Speak with CLEAR PRONUNCIATION
Provide AS COMPLETE INFORMATION AS POSSIBLE TO SIMPLIFY THE AI'S LEARNING AND STRENGTHEN THE RELEVANCE OF THE RESPONSES