Skip to content

How Duolingo Runs Experiments at Scale

Social Sharing

02.17.2021

Severin Hacker is the co-founder and CTO of Duolingo, the world’s most popular language-learning platform and most downloaded education app, with over 500 million users worldwide. Severin joined FirstMark’s Guilds to share how Duolingo has built a test-driven culture and how the company runs experiments at scale.

Why Duolingo Started Testing Everything

Duolingo’s culture is centered around one of the key operating principles of “test everything.” To validate this, the company launches 30 new experiments per week and is running 100s of experiments concurrently.

Early in its lifecycle, Duolingo used a third-party tool to launch and manage product experiments. Eventually, the expense became too great and they reached the limits of experimentation possible using the tool — essentially, they had outgrown it. As a company, they also had determined that testing was so core to the business that they needed to run the infrastructure in-house. In the end, they decided to roll out their own platform.

How to Empower Experimentation

  • Hypothesis
  • Expected outcome
  • Links to related work
  • Audience selection
  • Design and interaction specs

Experiment results are available through a custom dashboard that shows how a given experiment impacts every important company metric vs. a control group — covering conversion, engagement, and monetization. (For Duolingo’s context, these are things like new lesson starts, the total number of lessons consumed, the total amount of lesson time, conversion to paid.)

Benchmark: On average, PMs are generally launching and managing one experiment per week (in addition to experiments launched by other teams, like Engineering.)

Moving to Production & Guardrail Metrics

Best practice: Run experiments for longer than you think. It’s easy to be fooled by statistics or noise, and push something to production as soon as it’s “statistically significant”. Run experiments for at least a few weeks.

The Pros and Cons of a Testing Culture

Below the surface, some of the other benefits include:

  • Repeatability: Having a repeatable process (essentially, the scientific method) to drive continued product improvement
  • Objectivity: Having an objective system for making decisions around product changes (avoiding alternatives like the “HIPPO” — the highest-paid person’s opinion.)
  • Autonomy: Encouraging autonomy, which in turn drives higher product velocity
  • Metric-Driven: Creating a system that can drive improvements in the most important business metrics, while also minimizing the change of launch catastrophic changes

Of course, no system comes without drawbacks. In this case, an overemphasis on testing can lead to:

  • Requires Investment: Significant investment in infrastructure
  • Incrementalism: finding local rather than global minima/maxima)
  • Tech Debt: Can create additional QA overhead
  • Tough Metrics: Can be less equipped to drive certain metrics (say, virality or learning)
  • User Volume: Having a high volume of users and data is critical to successful experiments

Building a Testing Culture

While companies should adopt principles that work within their own specific cultures, Duolingo’s can serve as an inspiration to other teams. Their operating principles include:

  • Learners first (aka users first)
  • Take the long view
  • Prioritize ruthlessly
  • Test everything
  • Ship it
  • Strive for excellence
  • Be candid and kind

Bonus: Experimentation culture even permeates the hiring process. For prospective PMs, hiring managers share a current in-flight experiment and have the PM walk them through what changes, if any, they would make given the results.

How to Get Started

  • Make sure you’re at the right scale for A/B testing. If you have dozens of customers, running A/B tests makes no sense, since you simply don’t have the data. While there’s no hard rule, a soft threshold is around ~100,000 users or more.
  • Have the right tools to make experimentation easy. If you want to run a lot of experiments, the marginal cost (whether measured in time or dollars) of each experiment should be small — the cost should be “less than 5% of overhead” — for both technical and non-technical employees. You’ll need to invest in the infrastructure to make running tests easy.
  • Document and train relentlessly. If you want to build a culture where nearly every team member can launch an experiment, you will need very precise documentation on how to execute experiments.
  • Know what to measure… and prioritize what matters. Be thoughtful about what you choose to measure when it comes to experiments — only measure the things that actually affect your business. And perhaps even more importantly, make sure your entire team has a shared understanding of what matters the most.