Recently, we profoundly refactored the engine that drives Duolingo lessons. This post talks about our engineering choices, experiences, and the pain points of rewriting a highly complex system.

Highlights:

• Redesigned architecture
• Refactored code from Python to Scala
• Latency dropped from 750ms to 14ms
• Engine uptime increased from 99.9% to 100%

## Meet Duolingo’s engine, the Session Generator

Duolingo is the world’s most popular language learning app with more than 150 million users (at the time of this writing). At the core of the Duolingo experience, users learn in bite-sized lessons which consists of several interactive exercises (internally, we call them “challenges”).

So, for any given lesson, which exercises should a user see and in what order? This is the responsibility of Session Generator, our backend module which gets data from one of our 88 language courses (and counting!) in the Duolingo Incubator, sprinkles some machine learning magic, and proceeds to serve a sequence of exercises tailored to the needs of each of our millions of users.

## Overview of the rewrite

Rewriting code is a complicated, but necessary process. Even though it halts the development of new features and may take several months, eventually, the technical debt that has built up must be addressed.

In case you’re not familiar with the term, technical debt is like financial debt. You "borrow money" by making engineering decisions that let you develop something quickly. In the long term, though, development starts to stall because of these accumulated shortcuts, at which point it’s time to pay it off.

Session Generator has existed since day one of Duolingo. It has lived through all the pains that come with the rapid growth of a startup, and, unsurprisingly, had built up a lot of technical debt. In the next section, we’ll talk about some of the indicators that made us decide the time had come to rewrite, and the decisions we made to address these issues.

### System redesign overview

In the course of the Session Generator history, we often added a dependency to a shared resource because we needed some piece of data from some data store, or needed to cache an expensive computation. These decisions ultimately evolved into an architecture resembling the following:

There were many problems with this architecture. There are numerous "hard dependencies," marked in red: Session Generator would fail if any of these (or their respective network connections) failed. In addition, they all represent additional network calls which added to the total latency.

When redesigning the architecture, our main guiding principle was to remove as many shared resources as possible, resulting in a design that is simple and robust:

Under this new architecture, course data (which needs more processing steps but is shared among users in the same course) is processed off-line and serialized into files in AWS S3, Amazon’s cloud storage service. So we only need to fetch these files from a stable and cheap data store, and cache them.

On the other hand, the user data we leverage for personalization (which requires fewer processing steps but can’t be shared) is fetched through the API servers, and then injected into the request sent to Session Generator.

### Codebase refactor overview

One major decision we made was to rewrite Session Generator in Scala.

As with most of Duolingo’s backend, Session Generator was originally written in Python. Python is easy to understand and especially useful if you have developers from all sorts of backgrounds.

It does however come with drawbacks:

1. Performance. Python is considerably slower than C or Java, for example.

2. Memory management. Python is generally not thread-safe. This prevents developers from using in-memory caches to their full potential.

3. Dynamic typing. While only an actual problem for complex systems, dynamic typing can increase the number of runtime bugs, since the definition of interfaces between modules is weaker and the interpreter is not able to catch type-related bugs. This results in an excessive number of "deploy → find bug → fix" iterations, which in turn slows down engineering efforts.

We decided then to port the codebase to Scala, a statically-typed functional programming language built on top of the Java Virtual Machine (which means you can use existing Java libraries). Scala has been widely used in "big data" applications, which have some of the same high complexity characteristics as Session Generator. It seemed like a good fit.

One of the main concerns with Scala is its steep learning curve. However, in highly complex systems where the learning curve of the system is steeper than that of the programming language itself, improving readability and maintainability has a higher priority. In this regard, Scala does a good job, as we’ll present in the next section.

## Putting it all together

Let’s now dive into more specific parts of Session Generator’s implementation and development process in Scala.

### Functional programming in Scala

Aside from learning a new syntax, the main challenge of learning Scala from Python or Java backgrounds is getting used to functional programming.

We’ll mention some things developers should know in order to get started with functional programming. Note that this is not a Scala tutorial by any means, and that you are able to write imperative code with Scala since it’s multi-paradigm – although you might not be using it to its full potential.

Referential transparency and side effects

One of the pillars of functional programming is referential transparency. A referentially transparent function is one where the output is always the same for the same input, like an algebraic function. And it does nothing else.

So what’s an example of a function that is not referentially transparent? An IO function. For example, if you fetch a user from the database, you won’t get the same data all the time because the user might have changed their e-mail. Or if you print something, your code will be doing something other than computing your output. This is called a side effect.

If your method returns a Unit type (which is the Scala equivalent for void), then you certainly have a side effect, since all computation is done in order to fulfill a side-effected procedure that returns nothing.

Immutability

Referential transparency goes hand in hand with immutability. If you can’t have side effects, you can’t mutate state. Here are some ideas for dealing with immutability in a Scala code.

Idea #1: Use val instead of var when defining a variable, since a val can’t be reassigned.

>>> val x = 1
>>> x = 2
error: reassignment to val
x = 2
^


In particular, use case classes. An instance of a case class has non-reassignable attributes (all of them are val), is hashable and allows nice things such as pattern matching – all of that straight out of the box.

Idea #2: Use immutable collections. They are in the package scala.collection.immutable and are the default collections in Scala. Therefore you need to explicitly import from scala.collection.mutable if you want to use the corresponding mutable collection.

>>> val x = Set(1, 2, 3) // immutable
>>> x += 4
error: value += is not a member of scala.collection.immutable.Set[Int]
x += 4
^

>>> val x = scala.collection.mutable.Set(1, 2, 3) // mutable
>>> x += 4
>>> x
res0: scala.collection.mutable.Set[Int] = Set(1, 2, 3, 4)


Idea #3: Use control structures which do not need a mutable state. In particular, avoid while loops, and prefer to use transformation methods such as map, flatten, flatMap, filter, collect, fold, etc.

>>> val x = List(1, 2, 3)
>>> x.map(_ + 1) // Creates a new immutable list
res0: scala.collection.immutable.Set[Int] = Set(2, 3, 4)


Idea #4: If you can’t write your loop with the control structures above, try a recursive approach!

The name "monad" comes from category theory. We don’t want to go into mathematical formalities, so instead we’ll start with an example that illustrates the usage of the Option monad.

The Option monad is a wrapper for some object or a null. In the example below, we first try to fetch values for "x" and "y" from the map, yielding Some(x + y) if there existed entries for both, or None otherwise. In short, we’re gracefully dealing with a NullPointerException.

val someMap: Map[String, Int] = ???
val optionalX: Option[Int] = stringToIntMap.get("x")
val optionalY: Option[Int] = stringToIntMap.get("y")
for {
x: Int <- optionalX
y: Int <- optionalY
} yield {
x + y // yields Some(x + y) or None
}


Pure functional programming requires us to separate evaluation from execution of the code. Monads express code as chains of data transformations (evaluation) that are ultimately run by the middleware (execution), so they’re particularly useful for achieving purity for IO operations, which are inherently side-effected.

Let’s see an example with the Future monad below. We’re trying to fetch some data from the data access layer asynchronously, so futureX and futureY hold the state of the async call (waiting, fetched or error) for now. The for-yield then chains the sum operation, such that when we receive the response, we yield Future { x + y } if both futureX and futureY were successful, or propagate the error otherwise.

val futureX: Future[Int] = dal.getAsync("x")
val futureY: Future[Int] = dal.getAsync("y")
for {
x: Int <- futureX
y: Int <- futureY
} yield {
x + y // yields Future.value(x + y) or Future.exception(...)
}


### Developing in Scala

Scala is a language that seems to have learned from the pain points of other languages, and proposes to solve some of them.

Less verbosity

Scala does a number of things to remove verbosity. For example, it infers types whenever possible, so you can write val x = 1 instead of val x: Int = 1.

Its functional nature also allows developers to write one-liners in many situations. You can write a list comprehension as List(1, 2, 3).map(_ + 1). You can even write entire class definitions, such as class Person(val name: String), where getters and setters for name are created under the hood.

One last thing that is worth mentioning is that Scala does many things implicitly (if told to do so). In the example below, we’re passing an argument implicitly so that we don’t have to write it every time.

>>> def greet(name: String)(implicit greeting: Greeting): Unit =
>>>   println(s"${greeting.value},$name!")

>>> implicit val hello = new Greeting(value = "Hello")
>>> greet("John")
Hello, John!

>>> greet("Mary")
Hello, Mary!


Less boilerplate

Scala removes a lot of boilerplate from code. For example, it provides tools so that developers don’t need to write error-prone control structures, such as while loops with mutating states.

In addition, Scala code does not need to handle trivial edge cases everywhere. For example, you can treat an Option monad only where it makes sense to do so, instead of treating null pointers all over the place. The same goes for exceptions and the Future monad.

Fewer bugs

Static typing allows the compiler to catch a number of errors as compile time exceptions. These are further enhanced by the fact that Scala has a more powerful generic typing system than Java.

Moreover, by using Option and Future monads, the most common runtime exceptions are explicitly treated in the code.

Finally, less verbosity and less boilerplate ultimately make complex code easier to read, since code tends to be more representative of the actual application logic. Together with immutability and referential transparency, code is easier to reason about and to debug.

### Stack and infrastructure

The stack we chose uses Finatra as the HTTP server, with Guice for the dependency injection framework and Mockito for the testing framework.

This stack is deployed on AWS Elastic Beanstalk, Amazon’s orchestration service which automatically handles rolling deployment, load balancing, autoscaling, monitoring, and so on.

### Testing

The mixture of functional programming, less boilerplate, and a stack that offers dependency injection and unit/integration/feature tests out of the box results in less time spent writing application code logic and more time writing tests.

For the rewrite, we were able to write tests for virtually every single method, which improves stability and serves as documentation. A thorough test suite also helped us seamlessly integrate components that were developed separately from each other.

## Launching the new Session Generator

After launch, the rewrite obtained a decrease in average latency by 98%, going from 750ms in the old Session Generator to 14ms in the rewrite for Duolingo lessons. Most of the time was spent downloading course data from S3 whenever the in-memory cache expires.

The rewrite was also considerably more stable. Degraded performance in the old infrastructure was 0.1% (around 2 hours per quarter); with the rewrite, it dropped to zero downtime during the first couple of months. We obviously do not expect to keep 100% availability for the service forever, but this number indicates that the new infrastructure is much more robust.

An additional positive result is that not only did the number of bugs go down, but developers are now more confident that they are deploying code without breaking the site, thanks to an improved testing suite and the ability to detect compile-time errors during development.

## Final thoughts

With this rewrite, we were able to produce a faster and more reliable Session Generator, with fewer bugs and a cleaner codebase. We initially thought that it would take a long time to onboard developers, but in practice, the time taken has been considerably less than anticipated.

The two main pain points in this process were (1) documentation for certain Scala libraries, which was often either missing or scattered around across the web, and (2) some trouble with Java integrations. For example, some integration would not support some nice Scala-specific features, such as using the Option and Future monads, supporting optional parameters, and so on.

Finally, it’s still too early to say how well the rewrite will help reduce future technical debt – but so far things are looking good!