PetaGene’s customers have compressed over 3 million genome files

Posted on April 28, 2022

PetaSuite compression savings continue to grow

We are pleased to announce the reaching of another landmark: PetaGene’s customers have now compressed over three million genome files.

As genomic data sets continue their rapid growth, PetaGene customers across the full range of the genomic research and applications leverage PetaSuite's high compression ratios to limit their storage and infrastructure costs.

PetaGene’s software is deployed across the life sciences industry: biopharma, hospitals and research centres around the world. Our customers old and new continue to leverage our compression technology to mitigate against their rapidly rising storage costs. PetaGene’s compression software uniquely preserves all of the file data in truly lossless compression, giving our customers the guarantee that all their data is retained in a much smaller file. Additionally, our compressed files are transparently presented back in the identical original BAM/FASTQ.gz format to all tools and pipelines, which makes integration trivial and ensures compatibility with all tools. Not just storage savings: using compressed data with PetaGene’s just-in-time decompression speeds up pipelines and workflows significantly thanks to the remarkable reduction in data reads and transfers.

As genomic applications in healthcare grow it is expected that even more data will be generated and this will present a dual challenge of requiring cost-effective storage and also ensure secure, compliant data management. With that in mind, PetaGene has developed an award-winning platform to encrypt and audit all data, with bespoke region-based encryption for BAM and VCF files - the product is now in general availability. These compressed and encrypted files are compatible with all tools and, additionally, all data accesses are captured in a tamper-evident cryptographic ledger. To find out more please contact info@petagene.com.

Case Study: Top 3 U.S. Children’s Hospital Deploys PetaSuite

Posted on April 27, 2022
Deploying PetaGene’s lossless compression for genomic data at a premier Children’s Hospital in the United States.

View the full case study

Understanding the genomic information of a patient is key in diagnosing a plethora of genetic and rare diseases. As a result, genome sequencing approaches such as Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) are growing in use in clinical and research laboratories such as at CH.

A critical challenge emerges as genome sequencing scales up - how to manage the increasing cost of storing these data. At CH, the volume of sequencing data is growing rapidly along with the associated storage costs. CH were looking for solutions to reduce ballooning costs of storing these data that needed to be accessed for critical research and analysis. To solve this challenge CH adopted PetaGene’s lossless compression software to tremendous effect:

“Within two months of deploying PetaGene’s compression solution, CH had made a return on investment, recovering the full cost of the PetaGene licence in storage savings. With PetaGene, we now have better control over the growth of our NGS data, allowing us to reduce storage costs while freeing up financial resources for more compute and analysis to further the research and clinical goals of CH. Hospital senior management consider the PetaGene purchase a big success.”

- Infrastructure Manager, CH

File type # Files Input Size (TB) Output Size (TB) Savings (TB)
FASTQ.gz 663,669 854 360 494 (58%)
BAM 849,465 2,165 879 1,286 (59%)
Total 1,513,134 3,019 1,239 1,780 (59%)

“Deploying PetaGene at CH was a straightforward process. Our users do not have to modify any of their tools or workflows since PetaGene’s decompression library transparently serves original uncompressed data to the tools/pipelines. Since users are essentially able to work with the original data except that the data is more than 50% smaller, there are additional benefits such as faster data transfer speeds and analysis times.”

- Principal Systems Engineer, CH

Architecting IT puts PetaGene in a Data-Centric Spotlight

Posted on March 30, 2021

Architecting IT's Chris Evans, co-host of the Storage Unpacked podcast, takes to his blog to focus a spotlight on PetaGene's technology.

In his ongoing series on data-centric architectures, in this article Chris takes a look at PetaGene’s PetaSuite Cloud and Protect platform as another way to securely access content and accelerate remote access.

Read more

Click here to read the Architecting IT article of 24th March 2021 titled "Data-Centric Spotlight: PetaGene"

Managing NGS Data, a Dell and PetaGene healthcare podcast

Posted on February 15, 2021

Recently our co-founder Vaughan Wittorff and Phil Sweeney from Dell Technologies sat down to discuss how the use of Next-Generation Sequencing is expanding as the costs are coming down, creating an explosion of NGS processing and resulting data.

Find out how PetaGene can address the demands of that scale of data, in a two-part Dell Healthcare PowerChat podcast.

In Part Two (which we recommend you listen to first), Vaughan recaps and completes his review of PetaGene’s capabilities in addressing NGS data challenges and outlines PetaGene’s product set.

Phil discusses PetaGene and Dell’s partnership and Vaughan shares customer success stories.

Click here to listen to the podcast:
(no account required)

Alternatively, here are other ways of listening to the podcast:

In Part One, (which we recommend you listen to second), Phil gives a refresher on NGS and describes some new NGS use cases, and Vaughan describes how he sees NGS evolving in terms of use cases.

Phil then discusses the components of successful NGS Processing. Phil outlines the challenges associated with managing NGS data and Vaughan begins to describe how PetaGene responds to those challenges.

Click here to listen to the podcast:
(no account required)

Alternatively, here are other ways of listening to the podcast:

Technology Networks report on PetaGene’s insights on the challenges faced by Genome UK

Posted on December 14, 2020

PetaGene's Dan Greenfield and Vaughan Wittorff talk to Technology Networks regarding the recent Genome UK announcement and PetaGene's insights.

Dan and Vaughan sat down to talk with science writers Ruairi J MacKenzie and Molly Campbell of Technology Networks to discuss Genome UK and our insights into the endeavour.

Read more

Click here to read the Technology Networks article of 4th December 2020 titled "How Will Genome UK Securely Handle 150 Petabases of Genomic Data?"

PetaGene’s customers have now compressed one million genome files

Posted on November 30, 2020

For PetaGene, the one million genome era is underway

We are pleased to announce the reaching of a landmark: PetaGene’s customers have now compressed over one million genome files.

The dramatic drop in the cost of sequencing genomes and the numerous applications of this data to tackle critical diseases such as cancer and rare diseases has led to the rapid growth in genomic data.

PetaGene’s software is deployed across the life sciences industry: biopharma, hospitals and research centres around the world. Our customers adopted our compression technology to mitigate against the rapidly rising storage costs. PetaGene’s compression software uniquely preserves all of the file data in truly lossless compression, giving our customers the guarantee that all their data is retained in a much smaller file. Additionally, our compressed files are transparently presented back in the identical original BAM/FASTQ.gz format to all tools and pipelines, which makes integration trivial and ensures compatibility with all tools. Using the compressed data with PetaGene’s just-in-time decompression in this way actually speeds up pipelines and workflows significantly.

As genomic applications in healthcare grow it is expected that even more data will be generated and this will present a dual challenge of requiring cost-effective storage and also ensure secure, compliant data management. With that in mind, PetaGene has developed an award-winning platform to encrypt and audit all data, with bespoke region-based encryption for BAM and VCF files - the product is now in general availability. These compressed and encrypted files are compatible with all tools and, additionally, all data accesses are captured in a tamper-evident cryptographic ledger. To find out more please contact info@petagene.com.

PetaGene – Genome UK considerations

Posted on October 8, 2020
This article is about Genome UK and targeted at its management.

Genome UK is an exciting and ambitious new strategy that builds upon the UK's world-leading excellence in genomics to build "the most advanced genomic healthcare system in the world". We applaud this initiative, and believe this will enable the NHS to leverage precision medicine to improve outcomes and reduce costs, significantly drive research into new treatments and diagnostics, as well as foster an ecosystem for the UK Genomics Industry to thrive.

However, this is all contingent on getting the execution right, and we have seen past ambitious genomics projects suffer due to what are "Bio-IT" issues, that is IT problems with practically handling biological datasets. Due to the scope of this initiative, and having had a great deal of experience addressing issues in this domain, we have some recommendations for how to prevent some foreseeable problems, and to make it a real success.

Our recommendations cover key issues in:

    • data security and privacy, including regional encryption and data minimisation
    • reducing technical barriers to ensure efficient IT and lower costs
    • the need for computational reproducibility, and supporting existing pipelines
    • data integrity and information loss

Click here to download the 7-page slide deck.

Alliance Global Appointed Exclusive PetaGene Distributor For Middle East, Africa and Central Asia

Posted on November 29, 2019
Alliance Global AGBL logo

We are pleased to announce that PetaGene has signed an agreement appointing Dubai based Alliance Global (AGBL) as the exclusive distributor of our genomic data management software in Middle East, Africa, Central Asia, Pakistan, Bangladesh and Sri Lanka.

The number of national population-scale genomics initiatives in the region is growing and AGBL is a leading distributor of Illumina sequencing technology in the region

“AGBL already possesses significant expertise in sequencing technology sales and the wider genomics marketplace, making it an ideal partner for us in this region,” commented Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer of PetaGene. “We are excited to be working with AGBL to bring PetaGene’s software to genomics researchers across Africa, the Middle East and Asia, so that they can access their genomic data faster, more efficiently, and store it more cost-effectively.”

PetaGene’s compression software, called PetaSuite, addresses challenges caused by growing volumes of genomics data. It achieves savings of between 60 and 90 percent in both storage costs and data transfer times compared to BAM and gzipped FASTQ files. PetaSuite transparently integrates with existing storage infrastructure and bioinformatics pipelines, and PetaSuite Cloud Edition enables a user’s software tools and pipelines to seamlessly integrate with a wide variety of cloud platforms without modification.

Speaking for AGBL, Group Commercial Director, Dr. Nassim-Marie Hambouz said, “We are delighted to add PetaGene to our portfolio of partners. Their innovative technology will help organizations optimise their on-site or cloud storage costs for the growing volume of genomic data throughout the region.

To contact AGBL, visit their website.

AstraZeneca deploys PetaSuite genomic data compression software in core genomics initiative

Posted on October 24, 2019
AstraZeneca logo
PetaGene’s PetaSuite compression software and cloud-computing solutions speed up data transfers and reduce storage costs for research projects involving genomics data.

We are pleased to announce that Astrazeneca has selected PetaSuite software to compress the genomics data sets for AstraZeneca’s Centre for Genomics Research (CGR). Using genomics data and state-of-the-art methods for genomic analysis, the CGR investigates underlying genetic causes of disease and aims to integrate genomics across the company’s drug discovery platform. PetaSuite accelerates data transfers for cloud computing and reduces storage costs for any research project involving genomics data.

“Using genomic data for biopharmaceutical targets discovery requires large cohorts with massive multi-petabyte data sets. The time required to transfer these data from sequencers to compute clusters as well as the cost of storage can cripple these large initiatives,” said Vaughan Wittorff, Ph.D., Co-founder and Chief Commercial Officer of PetaGene. “PetaSuite addresses the challenges caused by growing volumes of genomics data and achieves up to 10x reductions in storage costs and transfer times, while adhering to the industry-standard BAM and FASTQ genomics file formats.”

More than 200,000 files processed

To date, AstraZeneca’s CGR has processed more than 200,000 genomics datasets, generating over a petabyte of data. One petabyte of data is equivalent to streaming HD movies for 40 years without a break. At this volume of data, problems in processing time, data transfers and storage size can impact the ability to deliver at scale. PetaGene’s compression software will enable the CGR to compress over 200,000 BAM files in a 24-hour period and will add the compressed data to tiered cloud storage.

AstraZeneca needed to minimize storage footprint & maximize data access
Average data size reduction of 76%

“AstraZeneca’s Centre for Genomics Research has the bold ambition to analyse up to two million genomes by 2026. Minimizing the storage footprint and transfer time of genome data while maximizing data access and compute processing is a necessity to enable us to achieve our ambition.” said Slavé Petrovski, Vice President and Head of Genome Analytics and Bioinformatics, Discovery Sciences, R&D, AstraZeneca.

PetaSuite will enable the CGR to achieve an average data reduction of 76% or a 4x expansion of storage capacity. PetaGene’s transparent, lossless compression of files reduces transfer times to less than a quarter, and PetaGene’s software allows unmodified analysis tools to run more quickly.

PetaSuite users typically make it an intrinsic part of their cloud or locally hosted analysis pipeline. As data is processed, it is compressed ready for use in the next stage of analysis without it needing to be decompressed later. PetaSuite Cloud Edition allows for the seamless integration of an organization’s own tools and pipelines in the cloud or local environment of their choosing.

Read more

Click here to read the far-reaching GenomeWeb article of 31st October 2019 about PetaGene which includes this news about AstraZeneca (requires premium subscription).

Princess Máxima Center for Pediatric Oncology chooses PetaSuite for genomic oncology data compression

Posted on October 22, 2019
Princess Maxima Center news story

We are pleased to announce that Princess Máxima Center for Pediatric Oncology, the largest pediatric cancer center in Europe, has chosen to use PetaGene’s transparent, lossless genomic data compression software, called PetaSuite, to reduce its data storage costs while accelerating access to the data. Next-generation sequencing plays an integral role in the Center’s diagnostics and research discoveries. These valuable genomic datasets are large, and their volumes are growing. As such the Center sought to find a compression technology that can store genomic data for longer at a much lower cost while removing bottlenecks in genomic sequence analysis.

PetaGene's PetaSuite software was evaluated by the Center against other compression techniques and unlike these, PetaSuite met and exceeded the criteria for a simple to implement and high compression performance solution, supported to a commercial standard.

Positive evaluation results

Senior Principal Investigator Dr. Patrick Kemmeren at the Princess Máxima Center describing the process, said: “Our tests with PetaGene’s compression software gave very positive results. We tested whole exome samples, RNA-Seq and whole genome sequencing data for different tumor samples. Implementing the software on our high-performance compute cluster is easy, the compression ratios are larger than what we obtain compared to CRAM compression, and accessing data is actually slightly faster compared to non-compressed BAM files. This on top of the added benefits of not having to switch to a different file format, a perpetual license for decompression and the time gains in not doing the BAM to CRAM conversion/retooling (and vice versa for some tools). As a result, we decided to implement PetaGene’s compression software within our computational infrastructure."

Oncology data compression quote
The right software at the right time

Jos Leendertse, Manager Research IDT at Princess Máxima Center, commented “By implementing PetaGene’s compression software we are also able to speed up the migration process to our new storage infrastructure. It’s not only the right software but also at the right time.”

Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer at PetaGene added, “During the evaluation process, the researchers found PetaSuite’s transparent access technology particularly compelling since it meant that the compressed data could integrate seamlessly with the bioinformatics structure Princess Máxima Center already had in place. A key challenge with compression is to ensure that end-users can continue working with the compressed files without having to change their existing, optimised workflows. PetaGene has solved this by ensuring that the compressed files are readable to existing tools and pipelines in the compressed state. This means our customers do not have to change any of their tools and pipelines, making it easy to integrate our compression technology within their infrastructure.”

About Princess Máxima Center for Pediatric Oncology

Opened in 2018, the Princess Máxima Center for Pediatric Oncology, based in Utrecht, The Netherlands, consolidated the work of seven different academic centers across the Netherlands into the largest pediatric cancer center in Europe. As both a hospital and a research institute, the Center has a combination of world-class facilities, leading clinicians and researchers all driven by a passion to cure pediatric cancers. By integrating the research facilities with the hospital, the Center is better equipped to implement novel discoveries into clinical care. For more information, visit www.prinsesmaximacentrum.nl/en.