GitHub Data Reveals Which Countries Lead in Software Complexity

Researchers used GitHub data to measure "software economic complexity" across 163 countries. Germany, Australia, and Canada lead. The findings predict GDP and inequality better than traditional trade data — and reveal which countries are positioned to win in the AI era.

GitHub Data Reveals Which Countries Lead in Software Complexity

TL;DR

  • Researchers used GitHub Innovation Graph data to measure "software economic complexity" across 163 countries
  • Germany, Australia, and Canada rank highest — software complexity predicts GDP and inequality better than traditional trade data
  • Countries diversify into related technology stacks, just like they do with physical exports
  • Public GitHub data misses proprietary code, likely underestimating complexity in countries with weaker open source cultures

The Big Picture

Trade data has a blind spot. For fifteen years, economists have measured national economic complexity by tracking physical exports, patents, and research papers. These metrics predict growth and inequality remarkably well. But they miss software entirely.

Code doesn't go through customs. It crosses borders via git push, cloud services, and package managers. Four researchers decided to fix that using GitHub's Innovation Graph, which tracks developer activity by country and programming language based on IP addresses.

Their findings, published in Research Policy, show that software complexity surfaces information traditional economic data leaves on the table. Countries that specialize in rare, sophisticated technology stacks — think certified embedded systems for aerospace rather than basic Python scripts — score higher on the Software Economic Complexity Index (ECI). And that score predicts GDP per capita and income inequality even after controlling for trade flows, patents, and research output.

The research team includes Sándor Juhász and Johannes Wachs from Corvinus University of Budapest, Jermain Kaminski from Maastricht University, and César A. Hidalgo from Toulouse School of Economics. Hidalgo created the Observatory of Economic Complexity, which has tracked physical trade data for over a decade. Now they're applying the same lens to software.

How It Works

The core dataset comes from GitHub Innovation Graph: quarterly counts of developers pushing code by economy and programming language for 163 economies and 150 languages from 2020 to 2023. But individual languages aren't the right unit of analysis. Real software uses bundles of languages together. A web app combines HTML, CSS, and JavaScript. A data science project pairs Python with Jupyter Notebook. Systems programming uses C with Assembly.

The researchers queried GitHub's GraphQL API for all active repositories in 2024 to find which languages co-occur within the same repos. They computed cosine similarity between languages based on weighted co-occurrence, with normalization so polyglot repos with twenty languages don't dominate the signal. Hierarchical clustering grouped the 150 languages into 59 "software bundles" — coherent technology stacks.

From there, they applied the standard economic complexity pipeline. Build a country-by-bundle matrix. Compute revealed comparative advantage: does this country have a disproportionate share of developers in this bundle relative to the global average? Binarize it. Run the iterative method to compute the Economic Complexity Index.

Countries that specialize in many non-ubiquitous bundles score high. Countries that only specialize in what everyone does score low. Germany tops the list at 1.739, followed by Australia (1.730) and Canada (1.729). The United States ranks sixth at 1.695.

For the relatedness analysis, they defined proximity between bundles using co-specialization patterns. If countries good at bundle A also tend to be good at bundle B, those bundles are close in software space. Then they tested whether countries are more likely to enter bundles close to their existing specializations. The answer: yes. Countries don't jump randomly between software specializations. They diversify into related technology stacks, just like they do with physical exports.

What This Changes For Developers

Understanding that countries are highly specialized in the software they produce matters when you're looking to relocate. Developers can use the product space representation of software capabilities to know which countries their skillsets match. If you're deep in embedded systems and aerospace tooling, Germany's ecosystem might be a better fit than a country specializing in web development.

For policymakers, software presents an interesting industrial policy target because it depends primarily on highly movable human capital. In principle, it provides a development opportunity that can be incentivized via talent attraction programs. In practice, high mobility cuts both ways. It makes software talent sensitive to consumer protection regulations that make it hard to work with data, or worker protection schemes that distribute innovation risk to small and medium firms.

The researchers predict that countries figuring out how to attract software talent without suffocating it with well-intentioned but poorly designed regulation will pull ahead. Within a decade, they expect software-based economic complexity indices to become standard in the policymaker's toolkit, sitting alongside trade-based measures. The data is open, updates quarterly, and captures something traditional data genuinely can't.

The big unknown is what generative AI does to this picture. If AI coding assistants lower the barrier to working in new programming languages, does relatedness weaken? Do countries diversify faster? Or does it reinforce existing advantages because countries with the best AI infrastructure benefit most? Johannes Wachs and colleagues have a new paper in Science tracking the global diffusion of AI-assisted coding on GitHub. The answer will reshape how we think about digital complexity within five years.

Try It Yourself

Browse the Observatory of Economic Complexity and look up your own country. See what it exports, where it sits in the product space, and think about how software fits in. It's an intuitive way to build the mental model before diving into the math.

For a deeper dive, read César Hidalgo's book The Infinite Alphabet: and The Laws of Knowledge, which puts economic complexity in broader context. The full Research Policy paper is available at doi.org/10.1016/j.respol.2026.105422.

If you're working with GitHub data for research, check out the GitHub Innovation Graph Q4 2025 data release, which includes quarterly developer activity by economy and programming language.

The Bottom Line

Use this research if you're a policymaker trying to understand your country's software capabilities beyond anecdotal evidence, or if you're a developer evaluating where your skillset fits in the global market. Skip it if you're looking for tactical coding advice or immediate workflow improvements.

The real opportunity here is that software complexity is measurable, trackable, and predictive in ways we couldn't see before. The risk is that we're only seeing public GitHub activity — proprietary enterprise work remains invisible, which likely underestimates complexity in countries with weaker open source cultures. Four years of data (2020-2023) is enough for cross-sectional analysis but too short to test long-run growth predictions. Economic structures shift over decades, not quarters. The researchers would love twenty years of this data. We'll get there.

Source: GitHub Blog