Posted on October 24, 2019
PetaGene’s PetaSuite compression software and cloud-computing solutions speed up data transfers and reduce storage costs for research projects involving genomics data.
We are pleased to announce that Astrazeneca has selected PetaSuite software to compress the genomics data sets for AstraZeneca’s Centre for Genomics Research (CGR). Using genomics data and state-of-the-art methods for genomic analysis, the CGR investigates underlying genetic causes of disease and aims to integrate genomics across the company’s drug discovery platform. PetaSuite accelerates data transfers for cloud computing and reduces storage costs for any research project involving genomics data.
“Using genomic data for biopharmaceutical targets discovery requires large cohorts with massive multi-petabyte data sets. The time required to transfer these data from sequencers to compute clusters as well as the cost of storage can cripple these large initiatives,” said Vaughan Wittorff, Ph.D., Co-founder and Chief Commercial Officer of PetaGene. “PetaSuite addresses the challenges caused by growing volumes of genomics data and achieves up to 10x reductions in storage costs and transfer times, while adhering to the industry-standard BAM and FASTQ genomics file formats.”
More than 200,000 files processed
To date, AstraZeneca’s CGR has processed more than 200,000 genomics datasets, generating over a petabyte of data. One petabyte of data is equivalent to streaming HD movies for 40 years without a break. At this volume of data, problems in processing time, data transfers and storage size can impact the ability to deliver at scale. PetaGene’s compression software will enable the CGR to compress over 200,000 BAM files in a 24-hour period and will add the compressed data to tiered cloud storage.
Average data size reduction of 76%
“AstraZeneca’s Centre for Genomics Research has the bold ambition to analyse up to two million genomes by 2026. Minimizing the storage footprint and transfer time of genome data while maximizing data access and compute processing is a necessity to enable us to achieve our ambition.” said Slavé Petrovski, Vice President and Head of Genome Analytics and Bioinformatics, Discovery Sciences, R&D, AstraZeneca.
PetaSuite will enable the CGR to achieve an average data reduction of 76% or a 4x expansion of storage capacity. PetaGene’s transparent, lossless compression of files reduces transfer times to less than a quarter, and PetaGene’s software allows unmodified analysis tools to run more quickly.
PetaSuite users typically make it an intrinsic part of their cloud or locally hosted analysis pipeline. As data is processed, it is compressed ready for use in the next stage of analysis without it needing to be decompressed later. PetaSuite Cloud Edition allows for the seamless integration of an organization’s own tools and pipelines in the cloud or local environment of their choosing.
Click here to read the far-reaching GenomeWeb article of 31st October 2019 about PetaGene which includes this news about AstraZeneca (requires premium subscription).
Posted on October 22, 2019
We are pleased to announce that Princess Máxima Center for Pediatric Oncology, the largest pediatric cancer center in Europe, has chosen to use PetaGene’s transparent, lossless genomic data compression software, called PetaSuite, to reduce its data storage costs while accelerating access to the data. Next-generation sequencing plays an integral role in the Center’s diagnostics and research discoveries. These valuable genomic datasets are large, and their volumes are growing. As such the Center sought to find a compression technology that can store genomic data for longer at a much lower cost while removing bottlenecks in genomic sequence analysis.
PetaGene’s PetaSuite software was evaluated by the Center against other compression techniques and unlike these, PetaSuite met and exceeded the criteria for a simple to implement and high compression performance solution, supported to a commercial standard.
Positive evaluation results
Senior Principal Investigator Dr. Patrick Kemmeren at the Princess Máxima Center describing the process, said: “Our tests with PetaGene’s compression software gave very positive results. We tested whole exome samples, RNA-Seq and whole genome sequencing data for different tumor samples. Implementing the software on our high-performance compute cluster is easy, the compression ratios are larger than what we obtain compared to CRAM compression, and accessing data is actually slightly faster compared to non-compressed BAM files. This on top of the added benefits of not having to switch to a different file format, a perpetual license for decompression and the time gains in not doing the BAM to CRAM conversion/retooling (and vice versa for some tools). As a result, we decided to implement PetaGene’s compression software within our computational infrastructure.
The right software at the right time
Jos Leendertse, Manager Research IDT at Princess Máxima Center, commented “By implementing PetaGene’s compression software we are also able to speed up the migration process to our new storage infrastructure. It’s not only the right software but also at the right time.”
Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer at PetaGene added, “During the evaluation process, the researchers found PetaSuite’s transparent access technology particularly compelling since it meant that the compressed data could integrate seamlessly with the bioinformatics structure Princess Máxima Center’s already had in place. A key challenge with compression is to ensure that end-users can continue working with the compressed files without having to change their existing, optimised workflows. PetaGene has solved this by ensuring that the compressed files are readable to existing tools and pipelines in the compressed state. This means our customers do not have to change any of their tools and pipelines, making it easy to integrate our compression technology within their infrastructure.”
About Princess Máxima Center for Pediatric Oncology
Opened in 2018, the Princess Máxima Center for Pediatric Oncology, based in Utrecht, The Netherlands, consolidated the work of seven different academic centers across the Netherlands into the largest pediatric cancer center in Europe. As both a hospital and a research institute, the Center has a combination of world-class facilities, leading clinicians and researchers all driven by a passion to cure pediatric cancers. By integrating the research facilities with the hospital, the Center is better equipped to implement novel discoveries into clinical care. For more information, visit www.prinsesmaximacentrum.nl/en.