Case Study: Top 3 U.S. Children’s Hospital Deploys PetaSuite

Posted on April 27, 2022
Deploying PetaGene’s lossless compression for genomic data at a premier Children’s Hospital in the United States.

View the full case study

Understanding the genomic information of a patient is key in diagnosing a plethora of genetic and rare diseases. As a result, genome sequencing approaches such as Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) are growing in use in clinical and research laboratories such as at CH.

A critical challenge emerges as genome sequencing scales up - how to manage the increasing cost of storing these data. At CH, the volume of sequencing data is growing rapidly along with the associated storage costs. CH were looking for solutions to reduce ballooning costs of storing these data that needed to be accessed for critical research and analysis. To solve this challenge CH adopted PetaGene’s lossless compression software to tremendous effect:

“Within two months of deploying PetaGene’s compression solution, CH had made a return on investment, recovering the full cost of the PetaGene licence in storage savings. With PetaGene, we now have better control over the growth of our NGS data, allowing us to reduce storage costs while freeing up financial resources for more compute and analysis to further the research and clinical goals of CH. Hospital senior management consider the PetaGene purchase a big success.”

- Infrastructure Manager, CH

File type # Files Input Size (TB) Output Size (TB) Savings (TB)
FASTQ.gz 663,669 854 360 494 (58%)
BAM 849,465 2,165 879 1,286 (59%)
Total 1,513,134 3,019 1,239 1,780 (59%)

“Deploying PetaGene at CH was a straightforward process. Our users do not have to modify any of their tools or workflows since PetaGene’s decompression library transparently serves original uncompressed data to the tools/pipelines. Since users are essentially able to work with the original data except that the data is more than 50% smaller, there are additional benefits such as faster data transfer speeds and analysis times.”

- Principal Systems Engineer, CH

NVIDIA and PetaGene Combine Genomic Technologies to Address Critical Analysis Bottlenecks

Posted on October 6, 2020
PetaGene and NVIDIA announce seamless integration of PetaGene’s PetaSuite tools as a standard part of NVIDIA Clara Parabricks Pipelines. PetaGene’s transparent compression reduces file sizes by 60-90%, and enables Parabricks Pipelines GPU-accelerated genome analysis to run 29% faster.

Cambridge, UK, Oct. 6, 2020: PetaGene and NVIDIA today announce their integrated bioinformatics solution to accelerate genomic analysis and simultaneously reduce data storage. PetaGene’s PetaSuite software decreases the size of genomic data and is integrated into NVIDIA Clara™ Parabricks Pipelines, a GPU-accelerated tool for accurate genomic data analysis. This technology combination now allows scientists and clinicians to access PetaGene’s genomic compression software within Parabricks Pipelines compute framework. FASTQ.gz and BAM files compressed by PetaGene’s PetaSuite software within Parabricks Pipelines compute framework or elsewhere can now be analyzed directly with Parabricks Pipelines.

PetaGene and Parabricks clear choices for TGen

The Translational Genomics Research Institute (TGen), a nonprofit medical research institute that is examining the genetic components of common and complex diseases, has confirmed faster end-to-end analysis when using PetaGene-compressed files within Parabricks Pipelines. TGen confirmed that the transparent readback of the compressed files using PetaGene’s user-mode library is now fully compatible with the Parabricks Pipelines environment. Sequencing and compute costs have plummeted, but storage costs have not. With Petagene and GPU-powered Parabricks Pipelines, genomic analyses can be run faster, and with significant cost savings on storage and compute.

“At TGen, we have a long history of working with a large number of bulky genomic files. As our workflows mature and scale, we have been keen to build our genomics infrastructure from the ground up with the most efficient tools and systems available.” said Dr Glen Otero, VP of Scientific Computing at TGen. “PetaGene and NVIDIA Clara Parabricks Pipelines were independently clear choices for us. Having them interoperable like this is important, and the fact that the combination further accelerates the aggregate performance is fantastic.”

PetaGene gives 29% speedup to Parabricks Pipelines without added complexity

TGen’s benchmarking showed that germline workflows in Parabricks Pipelines run 29% faster with PetaGene-compressed data than with regular genomics files, and generate identical results. This significant speedup is due to the behind-the-scenes I/O savings from PetaGene’s PetaLink user-mode library as it does just-in-time decompression. Since this library creates virtual FASTQ.gz and BAM files, users of the compressed files never need to interact directly with the compressed files - the virtual files are fully compatible with all existing tools and pipelines. Besides PetaGene’s compressed data, Parabricks GPU-accelerated analysis workflow is a clear differentiator compared to CPU-based workflows. GPUs save on overall space and operational costs as it requires fewer GPU servers than CPU servers to run the same analyses.

NVIDIA Clara Parabricks Product Manager Tim Harkins, Ph.D. commented, “Many scientists and clinicians are working hard to identify those genetic variants that contribute to health and diseases, ultimately providing better therapeutic choices for patients. The amount of data generated on a per individual genome is significant and all trends are pointing to more growth. The compression technologies from Petagene are going to save the community a significant amount in data storage, and by integrating with Parabricks Pipelines, the amount of time saved will be of an equal contribution.”

Dr Vaughan Wittorff, Co-Founder and Chief Commercial Officer of PetaGene said: “GPU-Powered NVIDIA Clara Parabricks Pipelines appeals to customers who want the fastest analysis speeds for genomics, and those kinds of customers also want the most efficient storage techniques without having to change anything they do and without vendor lock-in. Integrating PetaGene tools with NVIDIA’s genomic tools was therefore very natural. The fact that PetaGene’s transparent compression also makes Clara Parabricks even faster is great for everyone.”

Joint free trial of Parabricks and PetaGene available today

Get Started today. NVIDIA and PetaGene provide a FREE 30-day license to NVIDIA Clara Parabricks which comes with a free trial of PetaGene’s PetaSuite tools to do compression and transparent readback.

About PetaGene

PetaGene was founded in Cambridge, the birthplace of genomics, to address the rapidly growing data management problems of the genomics industry. PetaGene’s software enables compression of huge amounts of genomic data without compromising on access or data quality. The company’s products go beyond regular data reduction techniques and have three times been recognized by Bio-IT World’s Best of Show Award for their industry-leading performance and usability. For more information visit or e-mail

AstraZeneca deploys PetaSuite genomic data compression software in core genomics initiative

Posted on October 24, 2019
AstraZeneca logo
PetaGene’s PetaSuite compression software and cloud-computing solutions speed up data transfers and reduce storage costs for research projects involving genomics data.

We are pleased to announce that Astrazeneca has selected PetaSuite software to compress the genomics data sets for AstraZeneca’s Centre for Genomics Research (CGR). Using genomics data and state-of-the-art methods for genomic analysis, the CGR investigates underlying genetic causes of disease and aims to integrate genomics across the company’s drug discovery platform. PetaSuite accelerates data transfers for cloud computing and reduces storage costs for any research project involving genomics data.

“Using genomic data for biopharmaceutical targets discovery requires large cohorts with massive multi-petabyte data sets. The time required to transfer these data from sequencers to compute clusters as well as the cost of storage can cripple these large initiatives,” said Vaughan Wittorff, Ph.D., Co-founder and Chief Commercial Officer of PetaGene. “PetaSuite addresses the challenges caused by growing volumes of genomics data and achieves up to 10x reductions in storage costs and transfer times, while adhering to the industry-standard BAM and FASTQ genomics file formats.”

More than 200,000 files processed

To date, AstraZeneca’s CGR has processed more than 200,000 genomics datasets, generating over a petabyte of data. One petabyte of data is equivalent to streaming HD movies for 40 years without a break. At this volume of data, problems in processing time, data transfers and storage size can impact the ability to deliver at scale. PetaGene’s compression software will enable the CGR to compress over 200,000 BAM files in a 24-hour period and will add the compressed data to tiered cloud storage.

AstraZeneca needed to minimize storage footprint & maximize data access
Average data size reduction of 76%

“AstraZeneca’s Centre for Genomics Research has the bold ambition to analyse up to two million genomes by 2026. Minimizing the storage footprint and transfer time of genome data while maximizing data access and compute processing is a necessity to enable us to achieve our ambition.” said Slavé Petrovski, Vice President and Head of Genome Analytics and Bioinformatics, Discovery Sciences, R&D, AstraZeneca.

PetaSuite will enable the CGR to achieve an average data reduction of 76% or a 4x expansion of storage capacity. PetaGene’s transparent, lossless compression of files reduces transfer times to less than a quarter, and PetaGene’s software allows unmodified analysis tools to run more quickly.

PetaSuite users typically make it an intrinsic part of their cloud or locally hosted analysis pipeline. As data is processed, it is compressed ready for use in the next stage of analysis without it needing to be decompressed later. PetaSuite Cloud Edition allows for the seamless integration of an organization’s own tools and pipelines in the cloud or local environment of their choosing.

Read more

Click here to read the far-reaching GenomeWeb article of 31st October 2019 about PetaGene which includes this news about AstraZeneca (requires premium subscription).

Princess Máxima Center for Pediatric Oncology chooses PetaSuite for genomic oncology data compression

Posted on October 22, 2019
Princess Maxima Center news story

We are pleased to announce that Princess Máxima Center for Pediatric Oncology, the largest pediatric cancer center in Europe, has chosen to use PetaGene’s transparent, lossless genomic data compression software, called PetaSuite, to reduce its data storage costs while accelerating access to the data. Next-generation sequencing plays an integral role in the Center’s diagnostics and research discoveries. These valuable genomic datasets are large, and their volumes are growing. As such the Center sought to find a compression technology that can store genomic data for longer at a much lower cost while removing bottlenecks in genomic sequence analysis.

PetaGene's PetaSuite software was evaluated by the Center against other compression techniques and unlike these, PetaSuite met and exceeded the criteria for a simple to implement and high compression performance solution, supported to a commercial standard.

Positive evaluation results

Senior Principal Investigator Dr. Patrick Kemmeren at the Princess Máxima Center describing the process, said: “Our tests with PetaGene’s compression software gave very positive results. We tested whole exome samples, RNA-Seq and whole genome sequencing data for different tumor samples. Implementing the software on our high-performance compute cluster is easy, the compression ratios are larger than what we obtain compared to CRAM compression, and accessing data is actually slightly faster compared to non-compressed BAM files. This on top of the added benefits of not having to switch to a different file format, a perpetual license for decompression and the time gains in not doing the BAM to CRAM conversion/retooling (and vice versa for some tools). As a result, we decided to implement PetaGene’s compression software within our computational infrastructure."

Oncology data compression quote
The right software at the right time

Jos Leendertse, Manager Research IDT at Princess Máxima Center, commented “By implementing PetaGene’s compression software we are also able to speed up the migration process to our new storage infrastructure. It’s not only the right software but also at the right time.”

Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer at PetaGene added, “During the evaluation process, the researchers found PetaSuite’s transparent access technology particularly compelling since it meant that the compressed data could integrate seamlessly with the bioinformatics structure Princess Máxima Center already had in place. A key challenge with compression is to ensure that end-users can continue working with the compressed files without having to change their existing, optimised workflows. PetaGene has solved this by ensuring that the compressed files are readable to existing tools and pipelines in the compressed state. This means our customers do not have to change any of their tools and pipelines, making it easy to integrate our compression technology within their infrastructure.”

About Princess Máxima Center for Pediatric Oncology

Opened in 2018, the Princess Máxima Center for Pediatric Oncology, based in Utrecht, The Netherlands, consolidated the work of seven different academic centers across the Netherlands into the largest pediatric cancer center in Europe. As both a hospital and a research institute, the Center has a combination of world-class facilities, leading clinicians and researchers all driven by a passion to cure pediatric cancers. By integrating the research facilities with the hospital, the Center is better equipped to implement novel discoveries into clinical care. For more information, visit

Case study: Optimizing genomic data storage for clinical research facilities

Posted on May 1, 2019
Scientist working in laboratory

In clinical research, next generation sequencing (NGS) allows production of genomic data at an ever increasing rate. Storing genomic data effectively is critical, and while sequencing costs are falling, the cost of storage for the resulting files is increasing. As the amount of data sequenced grows, genomic data storage costs and transfer times can be a blocker on effective research and collaboration.

Clinical Genomics Gothenburg is a translational clinical research facility, performing bioinformatics and sequencing to provide an end-to-end solution to genomic data analysis. Increase in storage requirements was about to place their existing infrastructure under considerable strain. Clinical Genomics Gothenburg found a solution to their problem in PetaGene’s PetaSuite genomic data compression software.

Read more about how genomic data compression allowed Clinical Genomics Gothenburg to increase their storage capacity without modifying their workflows in our case study.

PetaSuite genomic data compression allows 60-90% storage cost and transfer time reductions, increasing storage capacity without adding to it. PetaGene’s software works with existing tools and pipelines, so can be used without workflow disruption. Data is quickly and easily accessed whether in the cloud or on-premise.

Compression is an effective method for managing genomic data generated in clinical research. By compressing genomic data, storage costs and transfer times can be reduced, facilitating analysis and saving valuable resources.  To find out more, get in touch using our contact form.

If you would like to keep up to date with news from us, please complete the form to subscribe to updates.