Pierre Mallinjoud

Pierre Mallinjoud

Tech expert, bioinformatics engineer, web and database developer

I'm a web and database developer with a strong passion for science. This interest led me to study genomics and statistics in order to work in biological research. These days, I'm also exploring AI and blockchain through various personal projects.

Experience at EnyoPharma

At EnyoPharma, I worked as a bioinformatics engineer in the R&D team, collaborating closely with biologists to study human-virus protein-protein interactions. The biologists curated scientific publications to manually extract protein-protein interaction data.

To support this effort, I developed two full-stack applications:

  • Drakkar: A database and web interface designed to assist biologists in curating protein-protein interactions. It provides rich web forms to populate a large PostgreSQL database with curated data.
  • Vinland: A read-only, smaller version of the Drakkar database, offering a public web interface for querying protein-protein interactions and visualizing protein interaction networks.

Both applications share the same technology stack:

  • PostgreSQL as the database
  • A custom PHP backend
  • A React frontend
  • Containerized deployment with Docker

Additionally, I developed Perl scripts to transform the Drakkar database into Vinland. This work and the curated data contributed to a published scientific article.

Meyniel-Schicklin L, Amaudrut J, Mallinjoud P, et al. Viruses traverse the human proteome through peptide interfaces that can be biomimetically leveraged for drug discovery. Proc Natl Acad Sci U S A. 2024;121(5):e2308776121. doi:10.1073/pnas.2308776121

typescript javascript php perl python sql reactjs docker database postgresql uniprot

Experience at CRCL

At the CRCL (Cancer Research Center of Lyon), I built a database cataloging alternative splicing events of the human and mouse genomes. To do so, I started from messenger RNA sequences available in GenBank, which I aligned to the human and mouse reference genomes.

I then mapped Affymetrix exon array probes onto this annotation. This allowed me to create another database compiling differential splicing expression data from numerous experiments, both public and internal to the lab.

This work contributed to a scientific publication and is available online at FASTERDB.

Mallinjoud P, Villemin JP, Mortada H, et al. Endothelial, epithelial, and fibroblast cells exhibit specific splicing programs independently of their tissue of origin. Genome Res. 2014;24(3):511-521. doi:10.1101/gr.162933.113

perl R sql mysql genbank microarrays affymetrix blast

Fine tuning ESM3 for generative biology

As part of my exploration into generative AI for biology, I undertook a project to build Mímir, a generative peptide binder model, by fine-tuning ESM3, a 1.4-billion parameter protein language model. The goal was to train the model to generate novel binding sequences based on target protein 3D structures.

Rather than relying on high-level wrappers or basic tutorials, I developed a complete, custom fine-tuning pipeline from scratch. This involved deploying and training the model end-to-end on cloud infrastructure using a Lightning AI H100 GPU. I had to navigate significant engineering challenges to fit a model of this size into memory, implementing techniques like 8-bit AdamW, gradient checkpointing, Flash Attention, and dynamic bucket batching.

Beyond the engineering, this project allowed me to deeply understand the inner workings of ESM3. I learned how to handle its multi-track input design (sequence, 3D coordinates, and solvent accessibility), how geometric attention processes spatial relationships, and how to design custom loss functions for masked language modeling. Even though the model didn’t fully achieve generalized transfer learning due to the complexities of multi-domain structural representations, going through the entire process - from data pipeline to cloud execution and post-mortem analysis - gave me invaluable hands-on experience with large-scale model training.

The project, including its design document and a detailed post-mortem, is available on its GitHub repository.

python pytorch esm3 deep learning fine tuning llm cloud

Fine tuning of a binary classification model

As part of my exploration into AI, I undertook a side project to learn how to fine-tune a model. I used a dataset of approximately 80,000 manually curated scientific publication abstracts from my past work at EnyoPharma. Each abstract is labeled based on whether it comes from a publication describing protein-protein interactions or not. I believe this dataset provides a highly relevant real-world example for practicing fine-tuning.

To conduct this study, I used the Hugging Face library to fine-tune a pretrained model into a binary classifier. The goal is not to create a perfect model but to go through the entire fine-tuning process, understand each step involved, and explore potential ways to improve the model.

The study is documented in its GitHub repository and Jupyter notebooks.

python jupyter deep learning fine tuning huggingface transformers data science

Command-line RAG pipeline for websites

I’m developing RAG-URL, a command-line pipeline that transforms a website into a searchable knowledge base, queried through an interactive agent.

This is a personal project to explore, step by step, the architecture of a fully agentic Retrieval-Augmented Generation (RAG) system - from web scraping to vector search to LLM-powered interaction.

The system runs in four stages:

  • Scrape: Crawls and extracts content into cleaned Markdown files
  • Chunk: Uses Gemini to split content into semantically meaningful sections
  • Embed: Embeds each chunk with Gemini and stores vectors in LanceDB
  • Agent: A CLI chatbot built with PydanticAI, querying the database with Gemini

The pipeline is hardwired to use Gemini models and is not yet model-agnostic or extensible.

python ai agent rag llm prompting gemini pydantic-ai lancedb

Conversation-driven multi-agent framework

I’m developing MC Architecture (Master of Ceremony), an experimental framework that enables multiple AI agents to engage in natural, turn-based group conversations.

Unlike traditional multi-agent systems where communication is directed by a central controller, MC Architecture lets agents take turns based on shared context - like a group chat where everyone sees the full history, but only one speaks at a time.

The framework is fully agent-agnostic and model-agnostic, serving as a lightweight wrapper around any AI library. I’ve demonstrated integration with PydanticAI, with intelligent participant selection to maintain a coherent dialogue flow.

Originally created for creative storytelling and simulation, the architecture also shows potential for collaborative problem-solving where context awareness and conversational dynamics are essential.

This project is an ongoing experiment in conversation-first coordination, moving beyond rigid orchestration toward more natural multi-agent interaction.

python ai agent llm prompting openai anthropic gemini pydantic-ai