Foiegras: Source Code Based Software Composition Analysis For C/C++ Applications

by Gousios, Georgios and Hamer, Philip and Odlund, Camilla and Melo, Leandro and Hejderup, Joseph and Muniraju, Sridhara and Durieux, Thomas

You can get a pre-print version from here.
You can view the publisher's page here.

Abstract

Software Composition Analysis (SCA) identifies third-party components in applications for vulnerability management and license compliance. C and C++ applications often rely on copy-based code reuse without maintaining origin information, rendering precise SCA generation challenging. We present Foiegras, a production SCA system addressing this challenge through advanced code clone detection. The system segments source files into functions, types, and licenses, maintaining cryptographic signatures for exact matching and embeddings for similarity-based detection. Uniquely, we curate an index of 1,700 authoritative open-source projects, constructed through manual annotation by 13 domain experts. This curation addresses the fundamental challenge of distinguishing original sources from copies. We evaluate Foiegras on a file-based synthetic dataset and 28 high-profile open-source C/C++ applications with manually verified ground truth. At the file level, Foiegras identifies exact file versions with 78% average precision. In real-world applications, it achieves mean precision of 0.7 and recall of 0.5 for exact version matching (0.79/0.71 for library name matching), approaching state-of-the-art performance and significantly outperforming a commercial SCA platform. Foiegras is deployed at Endor Labs, processing thousands of SCA requests daily, demonstrating both the feasibility and the necessity of combining automated analysis with expert curation for accurate software composition analysis in ecosystems lacking modern dependency management.

Bibtex record

@inproceedings{GHOMHMD26,
  title = {{Foiegras}: Source Code Based Software Composition Analysis For {C}/{C++} Applications},
  author = {Gousios, Georgios and Hamer, Philip and Odlund, Camilla and Melo, Leandro and Hejderup, Joseph and Muniraju, Sridhara and Durieux, Thomas},
  booktitle = {Proceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering: Software Engineering in Practice},
  series = {ICSE-SEIP '26},
  year = {2026},
  month = apr,
  pages = {12--18},
  address = {Rio de Janeiro, Brazil},
  publisher = {ACM},
  doi = {10.1145/3786583.3786878},
  isbn = {979-8-4007-2426-8},
  url = {https://doi.org/10.1145/3786583.3786878}
}