The Linux Foundation and Harvard’s Lab for Innovation Science Release Census of Most Widely Used Open Source Application Libraries

SAN FRANCISCO (March 2, 2022) - The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced the final release of “Census II of Free and Open Source Software – Application Libraries.” This follows the preliminary release of Census II, “Vulnerabilities in the Core,’ a Preliminary Report and Census II of Open Source Software” and identifies more than one thousand of the most widely deployed open source application libraries found from scans of commercial and enterprise applications. This study informs what open source packages, components and projects warrant proactive operations and security support.

The original Census Project (“Census I”) was conducted in 2015 to identify which software packages in the Debian Linux distribution were the most critical to a Linux server’s operation and security. The goal of the current study (Census II) is to pick up where Census I left off and to identify and measure which open source software is most widely deployed within applications developed by private and public organizations. This Census II allows for a more complete picture of free and open source software (FOSS) adoption by analyzing anonymized usage data provided by partner Software Composition Analysis (SCA) companies Snyk, the Synopsys Cybersecurity Research Center (CyRC), and FOSSA and is based on their scans of codebases at thousands of companies.

“Understanding what FOSS packages are the most widely used in society allows us to proactively engage the critical projects that warrant operations and security support,” said Brian Behlendorf, executive director at Linux Foundation’s Open Source Security Foundation (OpenSSF). “Open source software is the foundation upon which our day-to-day lives run, from our banking institutions to our schools and workplaces. Census II provides the foundational detail we need to support the world’s most critical and valuable infrastructure.”

Census II includes eight rankings of the 500 most used FOSS packages among those reported in the private usage data contributed by SCA partners. These include different slices of the data based on versions, structure, and packaging system. For example, this research enables identification of the top 10 version-agnostic packages available on the npm package manager that were called directly in applications:

  • lodash
  • react
  • axios
  • debug
  • @babel/core
  • express
  • semver
  • uuid
  • react-dom
  • jquery

To review all of the Top 500 lists in their entirety, please visit Data.World.

The study also surfaces these five overall findings that are detailed in the report:

  1. The need for a standardized naming schema for software components so that application libraries can be uniquely identified
  2. The complexities associated with package versioning – SBOM guidance will need to reflect versioning information that is consistent with the public “main” repository for that package, rather than private repositories
  3. Much of the most widely used FOSS is developed by only a handful of contributors – results in one dataset show that 136 developers were responsible for more than 80% of the lines of code added to the top 50 packages
  4. The increasing importance of individual developer account security – the OpenSSF encourages the use of MFA tokens or organizational accounts to achieve greater account security
  5. The persistence of legacy software in the open source space

Census II is authored by Frank Nagle, Harvard Business School; James Dana, Harvard Business School; Jennifer Hoffman, Laboratory for Innovation Science at Harvard; Steven Randazzo, Laboratory for Innovation Science at Harvard; and Yanuo Zhou, Harvard Business School.

“Our goal is to not only identify the most widely used FOSS but also provide an example of how the distributed nature of FOSS requires a multi-party effort to fully understand the value and security of the FOSS ecosystem. Only through data-sharing, coordination, and investment will the value of this critical component of the digital economy be preserved for generations to come,” said Frank Nagle, Assistant Professor, Harvard Business School.

About the Linux Foundation

Founded in 2000, the Linux Foundation and its projects are supported by more than 1,800 members. The Linux Foundation is the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Linux Foundation projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, Hyperledger, RISC-V, and more. The Linux Foundation’s methodology focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.