PetaGene – Genome UK considerations

Posted on October 8, 2020
This article is about Genome UK and targeted at its management.

Genome UK is an exciting and ambitious new strategy that builds upon the UK's world-leading excellence in genomics to build "the most advanced genomic healthcare system in the world". We applaud this initiative, and believe this will enable the NHS to leverage precision medicine to improve outcomes and reduce costs, significantly drive research into new treatments and diagnostics, as well as foster an ecosystem for the UK Genomics Industry to thrive.

However, this is all contingent on getting the execution right, and we have seen past ambitious genomics projects suffer due to what are "Bio-IT" issues, that is IT problems with practically handling biological datasets. Due to the scope of this initiative, and having had a great deal of experience addressing issues in this domain, we have some recommendations for how to prevent some foreseeable problems, and to make it a real success.

Our recommendations cover key issues in:

    • data security and privacy, including regional encryption and data minimisation
    • reducing technical barriers to ensure efficient IT and lower costs
    • the need for computational reproducibility, and supporting existing pipelines
    • data integrity and information loss

Click here to download the 7-page slide deck.

NVIDIA and PetaGene Combine Genomic Technologies to Address Critical Analysis Bottlenecks

Posted on October 6, 2020
PetaGene and NVIDIA announce seamless integration of PetaGene’s PetaSuite tools as a standard part of NVIDIA Clara Parabricks Pipelines. PetaGene’s transparent compression reduces file sizes by 60-90%, and enables Parabricks Pipelines GPU-accelerated genome analysis to run 29% faster.

Cambridge, UK, Oct. 6, 2020: PetaGene and NVIDIA today announce their integrated bioinformatics solution to accelerate genomic analysis and simultaneously reduce data storage. PetaGene’s PetaSuite software decreases the size of genomic data and is integrated into NVIDIA Clara™ Parabricks Pipelines, a GPU-accelerated tool for accurate genomic data analysis. This technology combination now allows scientists and clinicians to access PetaGene’s genomic compression software within Parabricks Pipelines compute framework. FASTQ.gz and BAM files compressed by PetaGene’s PetaSuite software within Parabricks Pipelines compute framework or elsewhere can now be analyzed directly with Parabricks Pipelines.

PetaGene and Parabricks clear choices for TGen

The Translational Genomics Research Institute (TGen), a nonprofit medical research institute that is examining the genetic components of common and complex diseases, has confirmed faster end-to-end analysis when using PetaGene-compressed files within Parabricks Pipelines. TGen confirmed that the transparent readback of the compressed files using PetaGene’s user-mode library is now fully compatible with the Parabricks Pipelines environment. Sequencing and compute costs have plummeted, but storage costs have not. With Petagene and GPU-powered Parabricks Pipelines, genomic analyses can be run faster, and with significant cost savings on storage and compute.

“At TGen, we have a long history of working with a large number of bulky genomic files. As our workflows mature and scale, we have been keen to build our genomics infrastructure from the ground up with the most efficient tools and systems available.” said Dr Glen Otero, VP of Scientific Computing at TGen. “PetaGene and NVIDIA Clara Parabricks Pipelines were independently clear choices for us. Having them interoperable like this is important, and the fact that the combination further accelerates the aggregate performance is fantastic.”

PetaGene gives 29% speedup to Parabricks Pipelines without added complexity

TGen’s benchmarking showed that germline workflows in Parabricks Pipelines run 29% faster with PetaGene-compressed data than with regular genomics files, and generate identical results. This significant speedup is due to the behind-the-scenes I/O savings from PetaGene’s PetaLink user-mode library as it does just-in-time decompression. Since this library creates virtual FASTQ.gz and BAM files, users of the compressed files never need to interact directly with the compressed files - the virtual files are fully compatible with all existing tools and pipelines. Besides PetaGene’s compressed data, Parabricks GPU-accelerated analysis workflow is a clear differentiator compared to CPU-based workflows. GPUs save on overall space and operational costs as it requires fewer GPU servers than CPU servers to run the same analyses.

NVIDIA Clara Parabricks Product Manager Tim Harkins, Ph.D. commented, “Many scientists and clinicians are working hard to identify those genetic variants that contribute to health and diseases, ultimately providing better therapeutic choices for patients. The amount of data generated on a per individual genome is significant and all trends are pointing to more growth. The compression technologies from Petagene are going to save the community a significant amount in data storage, and by integrating with Parabricks Pipelines, the amount of time saved will be of an equal contribution.”

Dr Vaughan Wittorff, Co-Founder and Chief Commercial Officer of PetaGene said: “GPU-Powered NVIDIA Clara Parabricks Pipelines appeals to customers who want the fastest analysis speeds for genomics, and those kinds of customers also want the most efficient storage techniques without having to change anything they do and without vendor lock-in. Integrating PetaGene tools with NVIDIA’s genomic tools was therefore very natural. The fact that PetaGene’s transparent compression also makes Clara Parabricks even faster is great for everyone.”

Joint free trial of Parabricks and PetaGene available today

Get Started today. NVIDIA and PetaGene provide a FREE 30-day license to NVIDIA Clara Parabricks which comes with a free trial of PetaGene’s PetaSuite tools to do compression and transparent readback.

About PetaGene

PetaGene was founded in Cambridge, the birthplace of genomics, to address the rapidly growing data management problems of the genomics industry. PetaGene’s software enables compression of huge amounts of genomic data without compromising on access or data quality. The company’s products go beyond regular data reduction techniques and have three times been recognized by Bio-IT World’s Best of Show Award for their industry-leading performance and usability. For more information visit or e-mail