Getting your computer set up to write code at Duolingo was once a rite of passage that took several hours—often days—of installing, configuring, and troubleshooting. Now it takes just a few clicks and a few minutes!
As we’ve migrated our codebase from a monolith to a microservice architecture with hundreds of different environments, it’s become even more important to offer our software developers a fast and easy onboarding experience. Our latest approach is to move our development process into the cloud with GitHub Codespaces.
"Really really love Codespaces, got a new laptop and was ready to develop in like 3 minutes"
—Isaac A., Software Engineer on our Content Tooling team
Why code in the cloud?
Developers traditionally install and run tools like integrated development environments (IDEs), libraries, and language runtimes on their local workstations. This approach makes sense at first but has some downsides as an organization grows larger:
- Everyone must spend time carefully following long lists of setup instructions every time they work on a new project or get a new device.
- Build speeds are limited by laptops’ processing power. Docker is fast on Linux but much (much!) slower on Mac.
- Managing multiple different versions of Python/Node.js/etc on the same laptop gets pretty hairy, requiring use of tools like nodenv.
- Making local builds accessible to coworkers for testing takes significant time and effort.
- Some required tools may not even work on all laptops, e.g. Docker for Mac has some problems unique to M1.
Moving the development process into the cloud solves all these problems. A simple analogy: local development is to remote development as Microsoft Word is to Google Docs.
Why we chose Codespaces
We first began looking into remote development in 2020, and there weren’t many mature options out there. Rather than investing in a homegrown solution to what must be a common problem, we decided to wait for an off-the-shelf product from an established player. Such a product did arrive the following year from GitHub, now part of Microsoft.
The most popular general-purpose editor at Duolingo has long been VS Code, and we’ve always hosted our code on GitHub. We figured that neither of those two Microsoft products is likely to ever integrate as seamlessly with alternatives like AWS Cloud9 and Gitpod as they do with Codespaces.
Keeping things DRY
Code duplication is the root of many evils in software development, so we’ve configured and scripted Codespaces to avoid it as much as possible.
We started by baking common tools like pre-commit and the AWS CLI into a Docker image that we host on GitHub’s container registry and use as the base environment for each repository’s codespaces. We do maintain a few separate image tags corresponding to different Python versions, but apart from that we try to keep this base image completely repo-agnostic.
This image is used only at codespace creation time—what if we want to add a new tool or behavior to all existing codespaces without requiring that they be destroyed and recreated? We use a multi-layered system of hook scripts. In each repo’s codespace config file, we specify an executable Bash script included in our base image as the postStartCommand
to run when spinning up a codespace. That script in the base image calls a corresponding postStart
script bundled in our self-updating duo
CLI, which performs some actions like starting Tailscale before further calling a repo-specific .devcontainer/postStart
script if one exists.
Thanks to these careful arrangements, we’re able to declare (via Pulldozer, our tool for editing hundreds of repos at a time) the same lean and consistent codespace config file across the vast majority of our repos:
{
"features": {"docker-in-docker": "latest"},
"image": "ghcr.io/duolingo/codespaces:python3.9",
"onCreateCommand": "/onCreate",
"postAttachCommand": "/postAttach",
"postCreateCommand": "/postCreate",
"postStartCommand": "/postStart",
"remoteUser": "vscode",
"settings": {"git.autofetch": true},
"updateContentCommand": "/updateContent"
}
Accessing private resources with Tailscale
Many of our AWS resources are accessible to developers only via the office VPN. In order to make those resources available to Codespaces, we host a Tailscale relay node in our VPC and run the Tailscale client inside each codespace.
As mentioned previously, our postStart
scripts automatically start the Tailscale client upon codespace launch. To connect, our developers simply run duo vpn
(our wrapper around tailscale up
) and sign in with Google SSO. This connection persists across codespace launches and will be ready to go again as soon as you start work the next day, a feature that even our regular VPN doesn’t have!
Rough edges
GitHub has experienced a few outages over the past year that temporarily prevented us from using Codespaces. Although generally short-lived, these incidents have a more direct impact on productivity than those of most other services we use.
The secret management system provided by Codespaces works pretty well. However, reading those values currently doesn’t require two-factor authentication like the rest of a GitHub organization’s resources do. Manually uploading secrets into each codespace is a more rigorous alternative, but it does add some friction (especially when rotating secrets).
Results
We still fully support running local development environments, and we’ve never encouraged adoption of Codespaces for its own sake among developers who already have existing local setups. Despite this lack of an aggressive internal push, our usage of Codespaces has shown strong and steady growth over the past year.
Adoption was especially high among our summer interns, who joined Duolingo with fresh laptops and only 12 weeks to make an impact—and they didn’t want to spend most of their summer troubleshooting!
"I 😍 this feature and it is the reason I did not go bald while trying to set up duolingo-web to create a small PR earlier this week"
—Software Engineer on our Data Infrastructure team
Codespaces have made some workflows not just better, but even possible again. Locally building our original monolith repo had eventually become such an intractable problem that most developers simply gave up and deployed straight to staging servers instead. Now it takes just 8 minutes to spin up a monolith codespace that’s practically equivalent to a local environment, with a feedback loop on the order of seconds rather than minutes.
Remote development is a growing industry trend that we’re excited to offer at Duolingo. Join our team and see for yourself!
Interested in software engineering at Duolingo? We’re hiring!