Large Language Models (LLMs) like ChatGPT, Claude, and Gemini have been trained on enormous amounts of language data in order to have natural-sounding exchanges. But using them to teach language learners is not as easy as saying, “Hey! Talk to this learner in Spanish!”

To develop AI-powered features like Video Call with Lily, we can't just let the model roam freely. Instead, we use targeted instructions and a predictable structure to make sure every call with Lily brings delight and sass—and, of course, the opportunity for speaking practice.

How we design each Video Call

When designing the perfect call experience, we have a lot of priorities to balance:

  • It needs to be at the appropriate CEFR level.
  • It needs to have a purpose—like telling you a story, asking your opinion, teaching you something, or simply having a chat.
  • It needs to feel like it’s coming from Lily, a sarcastic emo teenage girl, not from a generic AI chatbot.

To achieve the right balance, we create a prompt (or set of instructions) for the LLM. You can think of the prompt like a conversation involving three characters:

  • Assistant: This is Lily, the AI bot who reacts to what you, the User, say in accordance with the instructions from the System.
  • System: This is like the Assistant’s coach. Duolingo Learning Designers write the instructions that the System says to the Assistant (Lily) about how to act and what to say.
  • User: This is you—the learner who interacts with the Assistant (Lily).

In all our calls, we provide the System with a robust set of instructions for how Lily should behave. These instructions include information about Lily’s personality and backstory, they tell her how to help you if you’re stuck, they ensure she speaks at the correct level, and more. 

Also, very importantly, we include a basic blueprint for the conversation. Though each conversation you have with Lily is unique, they all follow a similar format:

Part 1: Opener
The System tells Lily what to say first. This is almost always a greeting in the target language. Our engineers have built a cycle of greetings that Lily will go through for each CEFR level.

Part 2: First Question
This sets the scene for what the call will be about. Lily might ask you something new about yourself, she might revisit a previous topic, or she might say that she has information to share about your target language’s culture. 

Part 3: Conversation
Lily and you can then go back and forth freely through the conversation. The System has instructed Lily to react to what you say and then to continue the conversation naturally. 

Part 4: Closer
After a certain number of back-and-forths, engineers have created a program where the System jumps in and whispers in Lily’s ear “Psst! Say it’s time to go.” This prevents the call from going on forever.

Behind the scenes

Lily's memory

If you’ve done several Video Calls, you may wonder, “How does Lily remember that about me?!” when she brings up information from previous calls. This is because after Lily hangs up, we take the call’s transcript, show it to the LLM, and ask “What important information have we learned about the User?” The information gleaned is then added to a List of Facts. The updated list becomes part of the instructions that the System gives to Lily during your next call. 

That is to say, before Lily begins talking, the System says “Remember this User? Here’s a List of Facts: They said they have two dogs, they’re studying architecture, and their favorite food is tacos.” That way, Lily might mention “How are your dogs doing?” or “Have you tried any good tacos lately?” to make the call seem personalized and magical.

Creating the first question

The first question is an important launchpad for the conversation. We want to get it right: It needs to be relevant to what you’re learning, it needs to be the right difficulty, and it needs to set the scene for a good conversation. With all these criteria, we have to write detailed instructions just for how to write a good starting question!

In fact, when your Video Call is ringing, that's when the System is formulating the first question.

Conversation Prep
An illustration of Duo wearing glasses and sitting in front of a computer Hey, LLM! You need to write a question that the Assistant (Lily) can ask the learner.
  • The question needs to be appropriate for the learner's CEFR level.
  • The question needs to use these words: music, like.
  • The question needs to… [etc]
An illustration of Lily on a Video Call Here's the question!

What kind of music do you like listening to?

We then take this question from the Conversation Prep and feed it into the Main Conversation, where the System instructs Lily on how to lead the conversation with you:

Main Conversation
An illustration of Duo wearing glasses and sitting in front of a computer You’re Lily, here’s some information about you:
  • You’re a teenage girl.
  • You’re very sarcastic.
  • You’re an introvert.
You’re talking to a learner who’s at A2 CEFR level. Here’s some information about this learner:
  • They have two dogs.
  • They’re studying architecture.
  • Their favorite food is tacos.
Begin the conversation with this opener: “Hey!

Then say this first question: “What kind of music do you like listening to?
An illustration of Lily on a Video Call I understand. I’m going to start the conversation with the learner now.

Hey!

As Video Call has developed and evolved, we've learned that it's important for the LLM to write the first question separately. When we include the instructions for the first question with the instructions for the rest of the call, we can often overload the LLM and get undesirable results—like sentences that are overly complex or missing the vocabulary provided in the Conversation Prep. It’s kind of similar for humans: If you’re told to do fifty tasks at the beginning of the day, you’ll probably forget to do some of them—or maybe you’ll do all fifty in a half-baked way. And since we want everything fully baked, we prepare the first question on its own.

Evaluating conversations

The first question isn't the only one that matters—we want Lily to react dynamically throughout the call, at the drop of a hat!

Earlier this year, we saw that sometimes, learners didn’t want to talk about the subject that Lily was instructed to focus on. You’d say “You won’t believe it, Lily! I just completed the entire Spanish course!” and Lily would respond “That’s nice. Have you heard about Swiss folk music?” 🫣

In order to let learners lead the conversation, we've since added an extra check that says “Does it seem like the learner wants to lead this conversation? If yes, ignore what you originally were going to talk about.” We have high hopes for these mid-call evaluations, as the LLM is always working—even during the Video Call itself—to ensure a great experience. 

In the mid-call evaluation, the System looks at what you've said and asks Lily questions to keep the conversation engaging and on track:

Mid-Call Evaluation
An illustration of Duo wearing glasses and sitting in front of a computer Hey, Lily! Consider the following for what the learner just said to you:
  • Did the learner talk about something that Lily loves? If yes, act excited!
  • Did the learner talk about something inappropriate? If yes, hang up now!
  • Does the learner seem confused? If yes, rephrase what you just said!
An illustration of Lily on a Video Call Got it! The learner just mentioned that they are really good at playing the guitar. Here’s my reaction to what they said:

Wow, I'm actually impressed. What’s your favorite song to play?

The most intelligent speaking practice around!

All this may seem complicated—because it is! As our team continues to tinker and as AI continues to forge new paths, we're teaching Lily to meet you at your level and allow you to practice speaking without fear.