enero 23, 2025

Meet our new CTO: Thomas Aynaud

Thomas Aynaud is a recent addition to the Software Heritage team, joining as Chief Technical Officer (CTO) in January 2025. However, his connection with co-founder Roberto Di Cosmo dates back to his student days when Di Cosmo taught him programming. Aynaud now holds a Ph.d in computer science (thesis: the study and detection of dynamic communities in complex networks) and has worked as a researcher, data scientist, and software engineer. Most recently, he spent six years as senior technical lead at French search giant Qwant.
Here, he shares how his previous experiences inform his current approach and insights on open source and artificial intelligence.

Tell us a bit about yourself and how you heard about Software Heritage.

I was lucky to be born in a house with a computer and have always been fascinated by these «universal machines.» I studied computer science for a theoretical master’s at The Ecole normale supérieure de Cachan (ENS Cachan.) Then I went on to earn a PhD on complex networks.

After that, I wanted to see more concretely how companies build big software, and joining the academic world is very difficult in France so I worked for several startups (Criteo, Clustree, and Qwant) doing backend, data engineering, data science, and research. My last position was senior tech lead at Qwant, working on almost all the parts of the search engine backend, from crawling to data enrichment, indexing, evaluation, architecture, and ranking.

I think I first heard of Software Heritage at a business event in Paris in 2019, where they had a stand. As Roberto Di Cosmo was a former programming teacher of mine, I went to speak a bit with them. I was just starting at Qwant, so not looking for a new position, but I kept an eye out to see how the project evolved.

What interests you about Software Heritage?

My primary interest lies in the mission of Software Heritage, which is contributing to the Library of Alexandria of software. There are parallels between this mission and my previous work on a web search engine at Qwant. You have to store, enrich, and make accessible a lot of knowledge, from the web at Qwant, from all the software sources at the Software Heritage. Since my PhD, I have always been fascinated by technologies allowing this, so a place where I can combine this fascination and knowledge about these technologies with a beautiful mission…I had to try and I am very glad to help!

You have experience working with Vespa.ai at Qwant, particularly on large-scale deployments. What are the challenges and rewards of working with a web-scale search index?

A very big challenge is managing complexity. In a large-scale search engine, almost everything you take for granted in computer science may become complicated and every piece of software may (or will !) fail, often almost silently with some strange and rare perturbations of the output. When you detect such an issue in final results, finding the root cause can be daunting, and fixing it… may require reprocessing the complete web. There are many rewards, for the technical part, you learn a lot, have access to an amazing dataset, and technology surprises you. On a more social part, you work on something you or people you meet use, and search engines have become so fundamental in our daily lives that they often bring some profound philosophical, societal, or political questions.

What’s your take on artificial intelligence – specifically where it intersects with Software Heritage?

I personally don’t like AI as a term to designate Large Language Models (LLMs.) The ‘intelligence’ part is especially misleading. In a few years, we’ll call it something else. Right now, with ChatGPT and other LLM, what we’ve built is unbelievably good text prediction, which often feels like human intelligence, and we do not yet understand very well why it seems so intelligent.

I’ve worked a lot with AI in my previous jobs, particularly machine learning, a field where you try to make computers learn from examples how to reproduce them. For machine learning, you need to know the provenance of your data, to be able to reproduce results and understand outcomes. So, as a practitioner, SWH is a great starting point because there are a lot of ways to identify the data.

I don’t want SWH to be a dusty museum, I want SWH to be a place where we archive all the software and everyone can come and see what’s inside. This includes data scientists and machine learning practitioners, especially if it helps them to better understand LLMs.

What’s your interest in open source?

I’ve always liked how you can build on what people have made and share with everyone…It’s a way to build something collective, together, and for everyone. It was also fundamental in my day-to-day life. Without open source, it would’ve been impossible to do my job. Most companies build a lot of ‘glue’ between open-source components, and they can’t function without it. That’s why I have often asked to contribute back a bit, and I am glad Software Heritage builds free and open-source software.

How does your research background inform your approach to technical leadership now?

When you do research, you fight a lot of doubts and uncertainty: ‘Is this a good question?’ ‘Is this the right answer?’ ‘Is this really a problem?’ ‘How can I understand it better?’ In such a big technical endeavor as Software Heritage, you face a lot of this kind of uncertainty. You clearly cannot control everything; you have to live with it and keep going while wondering if you’re on the right path.

The best way to build knowledge and understanding is the scientific method, make hypotheses, reduce your bias, validate some assumptions, and embrace that you were wrong! It helps a lot to be trained in these questions to gain confidence that you and your team are on the right path.

What drew you to moving from the private sector to a non-profit?

With experience, I’ve learned what drives me most for my job. It’s a combination of technical challenges, mission, and people. I’ve worked with amazing people and on great pieces of technology both in the private sector and non-profit. Missions often feel to me more for the common good in a non-profit than in the private sector where shareholders are often treated as kings. It’s an amazing luxury to have the liberty to make this choice.

What are you most interested in contributing to the organization?

Where I can be the most useful, of course! To be more precise, I need to gain a better understanding of how the organization works, what the difficulties are, what can be tackled, and what we can get around. My expertise is in working on big systems, handling massive datasets, and machine learning. Software Heritage has already made a lot of headway in collecting and archiving code sources (still a work in progress, but already some solid foundations), but we need to find a better way to do some massive analysis and computation on it. It’s also a small organization managing a big service and I believe my background can bring another point of view for reducing complexity and making good decisions.

Keeping up in the tech world requires constant learning – how do you stay on top of things?

For computer science and technology, I always prefer to read. My sources depend if there is an identified subject or if I just want to stay informed. Once I’ve identified a subject I need to study, my approach typically involves: reading project documentation (often multiple times!), reviewing blog articles and research papers (found through sources like Google Scholar, Qwant, or books from O’Reilly), and conducting thorough research. To stay informed, it’s mostly blogs of people or companies I trust about a subject, some tech news websites, or social networks, but the web is constantly evolving, platforms change, and people stop writing…you need to have a lot of diverse sources to stay informed.

Software Heritage