Conversational skills are among the most important to learners, but getting the hang of speaking in a new language takes a lot of practice! At Duolingo, we're building new tools to give you the confidence to know what to say and how to say it, whether you'll be using the language on FaceTime with your cousins or chatting with locals at a pub in another country. In this post, we’re focusing on a new project to improve how we teach and assess speaking skills.

When you do a speaking exercise in your lesson, your speech has until now been graded by third-party speech recognizers and simple text comparison tools that are provided by phone manufacturers (e.g. Apple for iPhone, Google for Android). In the past, Duolingo has not saved or used learner speech, but this data contains a lot of information about how second language learners actually learn speaking skills, which could help unlock significant improvements in the learning experience — if we can access it.

So last month, we began asking a subset of learners if they are willing to share their recorded speech with us, in order to better understand their learning process. We only collect speech data from learners who have given their permission, and we ensure that the speech data is anonymized to protect privacy. Collecting and analyzing speech data will help us develop new features to help you improve your speaking skills, such as:

  • Giving tips on pronunciation, word by word, sound by sound
  • Picking speaking exercises that focus on areas where you need the most practice
  • Grading beginners' speech more leniently, to reduce frustration
  • Improving how the app understands speech

This speech data could also improve Duolingo’s courses for under-represented languages like Gaelic, Swahili, and Navajo. There are no third-party speech recognizers for these languages on all our platforms, so learners today can’t practice speaking skills with our app. By collecting anonymized speech data from learners in these courses, we can build our own technologies for these under-represented languages.

How do we protect learner privacy?

Protecting our learners’ privacy is a top priority for Duolingo, so we’ve taken many steps to ensure the data we collect can never be tied to an individual learner.

As our first line of defense, we:

  • Do not collect speech data with any uniquely identifying information (e.g. name, ID) and information about when the data was received
  • Do not store speech data from child users (see our privacy policy)

We also treat all speech data as an aggregation, and never at an individual level. So as our second line of defense, we:

  • Only collect data from frequently used exercises — to ensure larger numbers of learners are generating speech and avoid any chance of identifying an individual based on a particular exercise
  • Only access the data after enough has been collected that none can be tied back to any particular learning moment

By following these precautions, we ensure that no learner can ever be identified by their speech data.

How do we get permission?

Before we collect any speech data from a learner, we explain why we want to collect the data and we ask for their permission:

Learners who have speaking exercises disabled will not receive this prompt. And new learners will only see the prompt after they’ve completed 15 lessons.

Learners can always later change their mind and turn off this feature in the Settings screen:

Note that we only collect speech data over WiFi so we don’t use learners’ cellular data.

Improved speaking experiences coming to the Duolingo language app near you!

Duolingo is hard at work to improve your learning experience from this speech data. Our mission is to develop the best education in the world and make it universally available, and part of our commitment to being the best is doing so in a way that is thoughtful, responsible, and protective of your privacy.

These changes will allow us to improve how learners use speech in their courses and develop new features and technologies to make language learning more effective. So stay tuned for more speech features to come to make your learning experience with Duolingo even more fun and effective!