Data Engineer
The Software Heritage project
Software Heritage is a universal software source code archive project that aims to recover, preserve for the long term, and share all publicly available source code with its development history (e.g., as stored in version control systems). Software Heritage, an open source non-profit initiative hosted by French research institute Inria, has already archived over 17 billion source files and 3.6 billion commits from over 266 million software development projects.
The position
We’re looking for an experienced Big Data-oriented software engineer. The ideal candidate will have significant interest and experience in large-scale data processing and exploitation architectures, including storage, indexing, and retrieval.
You can check out a more detailed list of our current projects on the Software Heritage Roadmap 2024: https://docs.softwareheritage.org/devel/roadmap/roadmap-2024.html
Main tasks and activities
– Set up a data processing architecture (à la Spark)
– Design and modeling of Big Data architectures
– Implement solutions based on defined architectures
– Set up Big Data pipelines
Skills
The ideal candidate will have experience in Big Data development and architecture, preferably in an open-source context. We expect self-organization and autonomy skills commensurate with the candidate’s experience. Participation in existing FOSS projects in any capacity (developer, community organizer, technical writer, etc.) is an added advantage.
The following skills are expected:
– Mastery of a large-scale data processing system (e.g. Apache Spark, Flink, or Hadoop)
– Fluent software development skills (basics in Rust and Python)
– Good level of English (written and spoken)
– Use of Git
– Use of continuous integration tools (e.g. Gitlab and/or Jenkins)
Knowledge and experience in the following will be considered a plus:
– Experience in data processing on a scale of tens of terabytes or even petabytes
– Experience with Cassandra and Kafka
– Knowledge of Java
– Knowledge of Kubernetes
– Data visualization
Software Heritage is a complex technical architecture, based on many different technologies, which continues to evolve. We don’t expect candidates to be experts in all of these areas. However, prior knowledge of one or more will be beneficial. We encourage applications from candidates of all experience levels, as a willingness to learn and discover is more important than specific expertise.
Working with us
We’re a team of 15 people, including nine technical staff (five developers and four sysadmins).
Autonomy, transparency, and collaboration are core values of our free and open-source project.
Most of the team is based at the Inria center in Paris, but the position is open to any location in France close to an Inria center (Bordeaux, Lille, Lyon, Grenoble, Rennes, Saclay, Sofia Antipolis, Nancy).
The contract offered by Inria is a two-year renewable full-time fixed-term contract, with the prospect of a permanent position.
– Telecommuting: 90 days/year (average 2 days per week)
– Vacation: 35 days + 10 days RTT
Salary range: €30,000 – €70,000 depending on profile and experience.
Application
Please send your application (resume+ cover letter) to hiring@softwareheritage.org