Duolingo Blog

This article describes how Duolingo migrated its first synchronous (sync) Python microservice to asynchronous (async) Python.

The structure of this article follows questions we needed to answer as the project progressed:

How do we build consensus to try the migration?
How do we architect async versions of sync libraries?
How do we structure async code?

Should you try async?

Before diving into the migration itself, I’d like to share some unfortunate realities of an async Python migration and some background on why we tried it.

Supporting async Python, or allowing developers to write async Python microservice code, is about as hard as supporting a new programming language. We had to reconsider almost every aspect of our service design to support async Python. This included but was not limited to:

Rewriting our HTTP clients
Rewriting our auth clients
Reintegrating observability tooling
Rearchitecting our Python backends
Adding async APIs to in-house tools

Supporting async Python is difficult; however, migrating a service from sync to async Python is fairly straightforward once support is in place. At Duolingo, we have a lot of sync Python service code. Async Python provides a significant performance benefit for services (we measured that an async service can handle 40% more requests per instance than its sync counterpart), at a lower cost of migration than rewriting that service in a more performant language like Go.

There are many reasons to perform an async Python migration, but service performance seemed the easiest to measure and sell.

Incremental migration

Another benefit of migrating to async Python over a more performant language is the potential for an incremental migration.

To start our async Python migration, we did not do an incremental migration as we were concerned about maximizing the performance of our async programs immediately to justify our work. Now that support is mostly in place though, we will perform other migrations incrementally.

How do we build consensus to try the migration?

Before starting our async Python migration, we had to convince our organization it was worth trying.

We found it hard to sell our organization on an async Python migration for the following reasons:

The technology is hard to explain
It is unclear how to break the project into small pieces

To solve these problems, we decided to structure the project in the following way:

Focus on savings
Migrate by route

Focus on savings

When we proposed an async Python migration, we focused on savings. We had a hard time explaining and selling “an efficient I/O paradigm based on an event loop managing non-blocking coroutines." However, “more money” did not need an explanation and sold itself.

We did not need to explain async programming in-depth to get some time to test it – a very high-level explanation was sufficient given that we were focused on a clear and intuitively valuable metric.

Example: Our pitch

Our pitch was as follows: “We think async Python will let us pay AWS less money to host our Python services. It will do this by letting our computers spend less time waiting. We think the savings could be large.”

That pitch was enough to buy one person two weeks to try the technology. Two weeks is too short to migrate a sync codebase to async Python, but it is long enough to make a measurable working program.

Migrate by route

When we migrated our first async service, we migrated one route at a time. Migrating one route at a time worked well as we always had a working and fully-async program to measure. If the measurements against this program were promising, we would have a clear justification for our work and could use them to buy more time to perform the migration.

Migrating to async Python takes a long time — if we did not have a way to confirm that the work was worth doing as we did it, it would be easy for other things to prevent the project from finishing.

Example: Migrating a route and testing

Using the two weeks we bought with the savings pitch mentioned in the preceding section, we migrated a single route in a small service. We cut out some features we didn’t think we’d have time to support (Observability tooling, Internal HTTP policies, etc…) and got a pared-down async version of one route working.

We then deployed this route to a staging server and used a feature flag to direct a small amount of traffic to the pared-down route. We then measured how many requests an instance of the service could handle serving this one migrated route. We then deployed the production version of the service to the staging server with the same sections of the program turned off, and then measured how many requests an instance of the production service could handle.

The async service handled ~10,000 requests per instance and the sync service handled ~7,000 requests per instance. The async service could handle ~40% more requests per instance, suggesting we could save ~30% on AWS EC2 spending per service we migrated from sync to async Python. We used this result to buy more time to migrate a larger portion of the service. We performed tests like this periodically throughout the project, each covering a larger portion of the service.

Once the entire service was rewritten in async Python, we did a final test where we directed 50% of production traffic to the async service and 50% to the sync service. Our final results were close to our initial readings.

The value of our work was clear throughout the project; we didn’t need to wait until the end of the project to prove that we were doing something valuable. External stakeholders were excited about the progress of the work and didn’t question the investment they made in budgeting time for us to try it.

How do we architect async versions of sync libraries?

As we performed our async Python migration, we produced lots of async code that looked like our sync code. This was fine in services, as when we migrated to async Python we deleted the sync Python code. However, when we needed to provide an async version of a sync library, it was hard to avoid ending the project with two copies of our business logic.

We did not find a silver bullet for sharing code between async and sync libraries, but we tried the following strategies with varying degrees of success:

Core Libraries
Sans I/O

In addition to the preceding strategies, you can see some ideas for structuring async/sync library code that we plan to test soon in the appendix below.

Core libraries

A core library is a sync library containing I/O agnostic business logic.

I/O agnostic code is software uncoupled to a specific network I/O paradigm. The I/O agnostic code in a core library can be neither blocking nor async. Blocking code is an instruction in a program that takes an indeterminate amount of time to complete and prevents the program from proceeding until it completes.

You cannot put blocking code in a core library because async programs must not run blocking code. You cannot put async code in a core library because sync code cannot run async code without creating an event loop.

To migrate a sync library feature to async Python using a core library, decompose the feature into I/O dependent and I/O agnostic sections. Put the I/O agnostic section in a core library named my-feature-core. Update the initial sync implementation of your feature to use the core library. The sync implementation should only contain code for sync network I/O.

Next, update your async implementation to also depend on my-feature-core. After the update, your async implementation should no longer contain logic related to my-feature-core aside from mapping the models of my-feature-core onto async networking APIs.

Example: HTTP metrics

To illustrate this, I’ll show how we migrated our HTTP metrics collection code to share an async and sync implementation using a core library.

We created a library called duo_metrics_core that, given information about an HTTP request, tracked the request using Prometheus.

To write duo_metrics_core, we first defined a representation of an HTTP request and response that was neither tied to the aiohttp package (async network I/O) nor the requests package (sync network I/O). The code for it looks something like this:

@dataclass
class MetricsResponse:
    status_code: int
    elapsed: float

@dataclass
class MetricsRequest:
    method: str

Next, we created a class in duo_metrics_core to track a request in Prometheus, given a request and response object. It looks like this:

class MetricsCore:
    ...

    def track_request(self, request: MetricsRequest, response: MetricsResponse) -> None:
        # Implementation for how we track requests in prometheus.
        ...

Next, we took these classes from duo_metrics_core and used them to add a Prometheus integration in our async HTTP client defined in our duo_requests_async library. The code for that looks like this:

class MicroserviceClient:
    ...

    @asynccontextmanager
    async def request(
        self, method: str, url: str
    ) -> AsyncGenerator[aiohttp.ClientResponse, None]:
        start = time.time()
        async with self._session.request(url) as response:
            elapsed = time.time() - start
            self.metrics_core.track_request(
                response=MetricsResponse(response.status, elapsed),
                request=MetricsRequest(method=method),
            )
            yield response

Notice that the duo_request_async library does not need to know how to track a request in Prometheus. Instead, it only needs to know how to do the following things:

Convert an aiohttp.Response object into a duo_metrics_core.MetricsResponse object
Convert the parameters of its MicroserviceClient.request method into a duo_metrics_core.MetricsRequest object

Sans I/O

The sans I/O programming style is something we tried but ultimately decided was not right for us.

Sans I/O is a style of programming where you implement the logic of an I/O-based application but defer performing actual I/O operations to a caller. In practice, sans I/O-based applications often include a core library (see preceding section) that contains a state machine modeling an action with an event-based interface implemented by I/O-dependent libraries. A state machine is a way to describe a problem as a finite set of states where one state can transition to another given an input. An event-based interface is one in which two pieces of software communicate by sending each other and performing actions based on objects representing the outcome of an action the sender performed.

You can consider the sans I/O programming style as an extension of the core library pattern discussed in the preceding section. It is a technique to let a core library encode tasks that perform network I/O within a core library method call.

The sans I/O programming style is primarily used in programs specified as large flow charts with intermediary I/O operations, such as network protocols and consensus algorithms.

In the following section, we discuss why we tried sans I/O and why we stopped using it.

To learn more about sans I/O, see this website and this talk.

To learn more about state machines, see this Wikipedia article.

To view an event-based interface in a sans I/O state machine, see this class.

Example: Microservice client

We used the sans I/O programming style in our initial async Microservice Client implementation. The Microservice Client aimed to provide a request method that abstracted away details of our internal observability and auth systems and provided defaults for some HTTP client configurations.

Our Microservice Client needed to encode the following actions:

Authenticate a request with one or more auth providers if it was not already authenticated
Apply default retry behavior
Record Prometheus Metrics for a request
Update and use a circuit breaker
Log the results of the request to AWS Cloudwatch

The issue with the preceding feature set is that the following features require performing network I/O and occur within the core library method call:

Authenticating a request with an auth service
Logging the result of the request to AWS Cloudwatch

To solve the preceding problem, we created a sans I/O state machine for making a request and put it in a core library. The diagram for the state machine looked like this:

---
title: Microservice Client State Machine
---

stateDiagram-v2

start --> get_auth_token : no token
start --> check_circuit_breaker : has token
get_auth_token --> check_circuit_breaker : got token

check_circuit_breaker --> request : closed - make request
request --> update_circuit_breaker : update circuit breaker given response (could be success or failure)

get_auth_token --> log_to_cloudwatch_if_failed : no token - failure response
check_circuit_breaker --> log_to_cloudwatch_if_failed : open - failure response
update_circuit_breaker --> log_to_cloudwatch_if_failed : response

log_to_cloudwatch_if_failed --> return_respose

I/O dependent code would create event objects and put them into the state machine. The state machine would then return new event objects that the I/O dependent code would perform actions based on.

This worked and let us share a Microservice Client implementation between our sync and async libraries, but the problem was how complicated the code became. We ended up defining a lot of event objects for the different states our system could be in. The code became ~1,000 lines long and was hard to understand.

We reimplemented the code with an I/O-dependent implementation and it shrank to ~300 lines and was easier to understand. We duplicated logic, but given that the sans I/O solution was more than double the code and the code was much trickier, picking up some code duplication made sense.

Just duplicate your code

If your logic doesn’t fit into a single method and must be modeled through a sans I/O-based state machine, don’t be afraid to duplicate it if it makes your overall library ecosystem less complicated. Sans I/O is a wonderful tool, but be careful about the complexity it can introduce.

If you are interested in more strategies for structuring async/sync libraries, see the appendix below.

How do we structure async code?

Now that we had a strategy to architect our async code relative to our sync code, it was time to start writing some async Python code.

The following are strategies we found to be effective at structuring async Python code:

Make everything a context manager
Use Context Vars

The strategies in this section apply to both library and service code.

Make everything a context manager

We use async Context Managers as the API by which we initialize our async Python classes.

Python __init__ methods are sync, meaning they cannot directly contain async code. This means that if your async class needs to perform any async startup actions, you must define a separate method for those startup actions to take place.

Async context managers are a standardized place to put async startup code, and, if necessary, async cleanup code. We structure libraries and applications as a series of nested async context managers. We ensure that each async context manager only handles its initialization, not the initialization of the classes it depends upon.

Cancellation

An async Python task object can cancel on any await statement. If async classes need to perform any cleanup, make sure to put that cleanup in a context manager. There isn’t a way to guarantee that cleanup logic will run unless it’s in a context manager or finally block - and even then there are issues around task scheduling that you must consider.

To learn more about using async context managers as the API for initializing async classes, see this talk.

To learn more about async context managers themselves, see PEP-492.

Example: Migrating two classes

To demonstrate the coding style discussed in the preceding section, we will migrate two classes from sync Python to async Python using async context managers.

Say you have the following sync Python code:

class Math:
    def __init__(self, a: int):
        self._a = a
        # Do not perform network I/O in an __init__ method.
        # This is only here for demonstration purposes.
        self._int_from_web: int = get_int_from_web()  # sync I/O

    def add(self, b: int):
        return self._a + b + self._int_from_web


class Display:
    def __init__(self, math: Math):
        self._math = math
        self._joke: str = get_joke_from_web()  # sync I/O

    def show_with_joke(self):
        print(f"{self._math.add(b=2)}, {self._joke}")

You can run the Display.show_with_joke method in the preceding code snippet as follows:

display = Display(Math(1))
display.show_with_joke()

You can migrate this to async Python like this:

class Math:
    def __init__(self, a: int):
        self._a = a
        self._int_from_web: int

    async def __aenter__(self):
        self._int_from_web: int = await get_int_from_web_async()  # async I/O
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb): ...

    async def add(self, b: int):
        return self._a + b + self._int_from_web


class Display:
    def __init__(self, math: Math):
        self._math = math

    async def __aenter__(self):
        self._joke: str = await get_joke_from_web_async()  # async I/O
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb): ...

    async def show_with_joke(self):
        math_value = await self._math.add(b=2)
        print(f"{math_value}, {self._joke}")

You can run the Display.show_with_joke method as follows:

async def main():
    async with Math(1) as m:
        async with Display(m) as d:
            await d.show_with_joke()


asyncio.run(main())

Important

Notice that Display takes a fully initialized Math object as an argument. Display does not enter the Math object's context manager — the caller should have entered the Math context before passing the Math object to Display.

Use Context Vars

We use Context Variables (Context Vars) to manage the global state in async Python applications.

Context Vas are a tool that provides a mechanism to allow managing globally scoped objects relative to an asyncio.Task. Context Vars let us work with global objects and avoid issues tied to manipulating a global object from multiple places at once in a program.

The following section shows a problem we solved with Context Vars.

To learn more about Context Vars, see PEP-567.

Example: Authentication decorator

In our sync base library, we have a decorator called requires_auth that performs the following actions:

Determines if a user has an authentication token
Checks if the authentication token is valid
Rejects requests if the user has an invalid token or no token

The code for the decorator looks like this:

AUTH_CLIENT = AuthClient()

def requires_auth(func):
    @wraps(func)
    def decorate(*args, **kwargs) -> Optional[flask.Response]:
        try:
            token = get_auth_token_from_flask_context()
            AUTH_CLIENT.authenticate(token)
            return func(*args, **kwargs)
        except AuthError:
            return unauthenticated_response_flask()

We wanted to provide a decorator with an identical interface in async Python to help services migrate from sync to async Python.

When we tried to convert this to async Python, we ran into the following issue: how can we get an instantiated async AuthClient in our decorator?

To solve this problem, we added an initialization component to our require_auth decorator. The initialization component is called at service start and is responsible for populating a Context Var with an initialized async AuthClient instance.

The code for the async requires_auth decorator looks like this:

CurrentInitializer: ContextVar[Optional["RequiresAuthInitializer"]] = ContextVar(
    "CurrentInitializer", default=None
)


class RequiresAuthInitializer:
    def __init__(self, auth_api: AuthAPI) -> None:
        self.auth_api = auth_api
        self._token: Optional[Token] = None

    async def __aenter__(self) -> Self:
        self._token = CurrentInitializer.set(self)
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb) -> None:
        assert self._token is not None
        CurrentInitializer.reset(self._token)
        self._token = None


def _get_current_initializer() -> RequiresAuthInitializer:
    current_initializer = CurrentInitializer.get()
    if current_initializer is None:
        raise AssertionError()
    return current_initializer


def requires_auth(func):
    @wraps(func)
    async def decorate(*args, **kwargs) -> Optional[quart.Response]:
        try:
            token = get_auth_token_from_quart_context()
            initializer = _get_current_initializer()
            await initializer.auth_api.authenticate(token)
            return await func(*args, **kwargs)
        except AuthError:
            return unauthorized_response_quart()

    return decorate

We create an instance of our RequiresAuthInitializer in our web framework’s startup hooks.

Acknowledgments

Thank you Bryan C Mills for your massive contributions throughout this project. Bryan’s insane concurrent programming and API design skills made this work. Bryan pulled us out and helped us avoid more pitfalls than I can count. Bryan even went so far as to fix upstream async libraries so that the project moved smoothly.

Appendix: Architecture next steps

This section describes some strategies we plan to experiment with in architecting async implementations of sync libraries. We haven’t tried these yet, but we hope some of them could make our libraries easier to maintain and use.

We have seen some friction in the core library pattern in that it requires multiple pull requests to make what could be straightforward changes (one PR for core, one for sync, and one for async). We are considering combining libraries and using package extras to separate async and sync package dependencies. This would enable us to include the async, sync, and core code within one package, allowing us to change our libraries in a single PR.

We are also interested in a code-sharing strategy where sync implementations defer to async implementations that run in a class-managed event loop. Each sync version of an async class would create an event loop when it starts and manage that loop for its lifetime. This would add 1 thread per sync class that defers to an async class.

As an additional code-sharing strategy, we have begun using (synchronous) non-blocking context managers to provide shared core functionality. Rather than one entry point to an IO-agnostic code block you get two or more: one before the IO-dependent code, one after the IO-dependent code, and potentially one in the middle of your IO-dependent code if you return a non-blocking callable from your __enter__ method.

How we started our async python migration

Should you try async?

How do we build consensus to try the migration?

Focus on savings

Migrate by route

How do we architect async versions of sync libraries?

Core libraries

Sans I/O

How do we structure async code?

Make everything a context manager

Use Context Vars

Acknowledgments

Appendix: Architecture next steps

About us

Help and support

Privacy and terms

About us

Press

Careers

Help and support

Privacy and terms

How we started our async python migration

Should you try async?

How do we build consensus to try the migration?

Focus on savings

Migrate by route

How do we architect async versions of sync libraries?

Core libraries

Sans I/O

How do we structure async code?

Make everything a context manager

Use Context Vars

Acknowledgments

Appendix: Architecture next steps

RELATED ARTICLES

RELATED ARTICLES