Several years ago, we began to see the democratizing effect that Duolingo had on education. We heard from countless learners — from janitors in India to high schoolers in Indiana — about how our flagship language-learning app helped them develop language skills otherwise unavailable to them, in turn helping them get jobs, fulfill life goals, or save thousands of dollars in tuition. English is the most popular language to learn on Duolingo, and many learners also asked if we could certify their English skills formally, in order to help them gain access to higher education and better job opportunities.

This struck a chord with our co-founders, who are both US immigrants and had to deal with high costs and other hurdles that plague traditional English proficiency tests. In fact, when our CEO Luis von Ahn decided to apply to American universities, he was unable to schedule an English test in his home country of Guatemala. So his family spent more than $1,200 to fly him to El Salvador to take one.

We set out to develop the Duolingo English Test as a modern approach to testing: providing fast, high-quality, secure language assessments to anyone, anywhere, at any time over the Internet. Today, the test is accepted by more than 2,000 institutions including Columbia, McGill, NYU, University College London, and Williams. It has helped many thousands of students worldwide who had no other English testing options (because of price or location, for example), and in the wake of test center closings due to the COVID-19 pandemic, the Duolingo English Test is, in fact, the only major English testing option available worldwide right now.

None of this would be possible without artificial intelligence (AI), and this week the first peer-reviewed paper about our test, “Machine Learning Driven Language Assessment”, is published in Transactions of the Association for Computational Linguistics. This is the top journal for AI and natural language processing (NLP), and marks the first of many journal articles to come on the science behind our approach.

The Duolingo English Test is the first and only high-stakes test to use AI and machine learning end-to-end at every step of the process. In particular, we use these technologies to:

  1. Create thousands of test items automatically;
  2. Assess the language ability required for each test item — namely by automatically aligning them to proficiency levels in the CEFR;
  3. Adaptively administer items for every test, which makes it both shorter and more secure;
  4. Automatically grade test-taker answers, which can be complex and are not simple multiple-choice items;
  5. Synthesize all of this into a final test score; and
  6. Help make the human review and proctoring stage more stable and efficient before test scores are certified and released.

Our paper describes how we tackle applications 1-5 above. For example, the ML/NLP models we use to assess the ability level for test items agree significantly with CEFR judgments made by human experts (PhD-level linguists with ESL teaching experience).

Now that the test has been available for a few years, we also have much more evidence for the validity, reliability, and security of the test scores in the wild. As you can see from the graphs below, the correlation of Duolingo English Test scores with scores from two other traditional English tests — TOEFL and IELTS — are high, and in fact on par with their correlation with each other (.73, according to this research report). This suggests that scores from all three tests are equally well-suited to the purpose of aiding university admissions decisions.

Duolingo English Test scores are also highly reliable; meaning they are consistent across test items and test retakes, which is important for any high-stakes test. Furthermore, the fact that we use AI to generate items for a computer-adaptive test means that (1) the pool of items to draw from is very large, so (2) no two test administrations are alike. This is important for test security, and in fact, we show that test-takers would have to take the test about 1,000 times before seeing the same test item again, on average.

We are actively working on additional research as we continue to develop the Duolingo English Test, including how we use other AI technologies (like computer vision and biometrics) to help ensure test security and integrity. However, making an accessible, reliable, and secure test is our first priority. It often takes more than a year for academic journal papers to be reviewed and published, so even this “new” research paper is already out of date! The test today is even more advanced than the results in the paper show, since we now incorporate grading of more nuanced, open-ended speaking and writing exercises, and also exhibit even higher test score reliability. We are excited to announce more research-driven Duolingo English Test features later this summer, as well!

Duolingo is a mission-driven company, and we created the Duolingo English Test to break down barriers to higher education. As a result, we've learned that an online, personalized approach to testing is not only important for increasing access — it's an essential innovation that is reshaping the education system as we know it, and we are excited to be leading the way.