At Duolingo, we always put learners first. And we know that streaks are an important motivational tool for learners. During a site issue, learners are concerned about maintaining their streaks, so it's important that we protect their streaks (those are hard work to earn!) while also moving quickly to resolve the issue.
How do we protect streaks, inform learners that their streaks are safe, and bring systems online faster? That's where Big Red Button comes in.
Preparing for the unexpected
We've put a great deal of effort into improving and hardening our underlying infrastructure, but… things happen. Site issues can be caused by internal or external software changes or hardware failure, internet outages, or even unexpected learner behavior (drastically more traffic due to a Duo meme, for example).
These issues can be compounded due to ongoing higher-than-normal traffic, which might happen because learners are concerned about their streaks, and continue to retry lessons.
Big Red Button (BRB for short) is an internal project which allows us to pause requests, inform learners of the pause, gradually bring requests online as problems are resolved, and repair streaks that were interrupted.
Simplicity is key
When deciding how to move forward with BRB, we sought solutions that were simple, yet robust. BRB should have few dependencies, be easy to use and maintain, and be usable even when the site is down.
Originally we planned a microservice, but that introduced many additional dependencies. Instead, we decided on AWS S3, a simple, high-availability storage service with a key feature: access logs. By having learners' clients query for a specific S3 file, we could toggle BRB by modifying that file. Even better: with multiple files, we could control which learners would see BRB, allowing slow rollout and recovery! And with access logs, we would know which users saw BRB and needed their streaks repaired.
Still, no matter how simple and robust our design, the tool may fail. In that case, we didn’t want to make things worse! Failure modes for BRB shouldn't adversely affect the site or learners. To handle this, if clients encounter any errors accessing BRB, then they simply ignore BRB entirely.
Now we had the capability to use BRB, but we also needed to determine which learners had seen its message, so that we could repair their streaks. We could query logs with AWS Athena, but that might add a new dependency! Luckily it wasn't an issue—since the access logs are stored, streak repair could be done after the site and infrastructure had recovered.
A peek behind the streak
Streaks are a mechanic designed to encourage daily use of Duolingo. Maintaining a long streak is considered prestigious since it can only be earned with dedication. If a learner loses their streak because of something out of their control (like the issues BRB is meant to protect against), then the learner may feel cheated out of their hard work.
To understand how BRB protects learners’ streaks we need to understand Duolingo’s streak algorithm. Duolingo tracks every day in which a learner has practiced a lesson. When a learner misses a day, it creates a gap in the contiguous days practiced. This corresponds to when a learner loses their streak.
We offer flexibility to allow learners to keep their streak when they do not practice. To keep their streak protected, learners must equip a “streak freeze” in advance, which acts as insurance in case they miss a day of practice. The streak freeze fills in the missing day so that the streak can continue unbroken to the next day. We can use the same logic that fills in missing days to protect streaks with BRB.
Borrowing from the freeze
It’s easy to freeze streaks if we know the date and time. The only change made to BRB was to log the exact timestamp each learner tried to use Duolingo. Later we freeze the target day for every learner who was affected. Their streaks are frozen so their hard work and dedication are not gone!
To date, BRB has protected over 2 million streaks. This has significantly improved the learner experience while lightening the load on our customer support team.
If you miss your duolingo streak because of a maintenance issue, they preserve your streak.— Vic 🌮 (@VicVijayakumar) October 6, 2021
BRB is currently an all-or-nothing approach. In the future, we could enhance BRB to only disable subsets of features, allowing learners to use portions of the app to continue learning.
We also hope to increase the robustness of this tool by leveraging a second cloud computing provider in parallel, such as Google Cloud Platform.
Interested in software engineering at Duolingo? We’re hiring for full-time and intern roles!