Our Story - PetaGene

PetaGene started as a small hint of an idea: that a team of Cambridge University PhDs could together devise a novel approach to the problem of genomic data storage. Through “Project PetaGene,” a collaboration with the Stegle Group at EMBL-EBI, we researched how to compress huge amounts of genomic data—without compromising data quality

Members of Project PetaGene: Dan Greenfield, Alban Rrustemi, Oliver Stegle

Three years later, PetaGene is much more than a research project. We took our idea to London’s Entrepreneur First incubator and graduated with the program’s sixth cohort in 2016. That same year, we won Best of Show for our PetaSuite compression tools at Bio-IT World’s annual conference in Boston, beating out 46 competitors from 190 total exhibiting companies. Now, we’re a full-fledged company with a suite of products that do what we hoped they would when we began our research at the University of Cambridge: compress genomic data up to 6x, without affecting the quality of the data. Beyond the awards and accolades, we’re deeply committed to the motive that inspires PetaGene daily: reducing the size and associated storage cost of genomic data, so it can be more readily accessed and analyzed.

Why is accessing genomic data important? Genome sequencing promises abundant medical benefits. It can offer researchers insights on the genetic origins of certain diseases and enable them to potentially uncover new treatments; can allow patients to practice “precision medicine,” in which understanding their specific genetic composition enables them to make better, more precise decisions about their health; and can be stored and analyzed as it accrues, offering long-term research opportunities and solutions to health concerns.

Despite these myriad benefits, the storage cost associated with genome sequencing—which produces a huge volume of data—can be prohibitive. While DNA sequencing has itself become more affordable (less than $1,000 today vs. $300M when it was first introduced in 2001), storage costs have risen. One thousand petabytes of genomic data are already being stored worldwide, and the aggregate cost of genomic storage is expected to grow from $0.5B today to $5B by 2021.

That’s where PetaGene comes in. Our products reduce the footprint of genomic datasets in FASTQ and BAM by up to 6x while preserving genotyping accuracy and reducing hardware storage costs. We’ve worked hard to make integration seamless, with no need for a separate mount or volume. Our fast transfer speeds and open access allow researchers to collaborate easily, sharing PetaGene-compressed files quickly and efficiently.

We’ve created these products so researchers can do the most important work of analyzing genomic data to improve health and medicine for everyone. That’s why we’re committed to doing what we do—and we’ll keep working to make accessing genomic data faster, more efficient, and more cost-effective.

The Co-Founders

Dan Greenfield, Co-Founder and CEO

A Cambridge University PhD with a Masters in Bioinformatics, Dan has experience leading teams in Silicon Valley to build groundbreaking new products. His PhD was awarded a prize for the top dissertation in the UK by the British Computer Society.

Vaughan Wittorff, Co-Founder and Business Development Manager

A Cambridge University PhD, Vaughan has significant experience in inventing technology and commercializing it in the private sector. He was formerly a Senior Lecturer in Electrical Engineering at Curtin University of Technology, and long-term Visiting Fellow at Cambridge University in Computer Science and Technology.