Recently our co-founder Vaughan Wittorff and Phil Sweeney from Dell Technologies sat down to discuss how the use of Next-Generation Sequencing is expanding as the costs are coming down, creating an explosion of NGS processing and resulting data.
Find out how PetaGene can address the demands of that scale of data, in a two-part Dell Healthcare PowerChat podcast.
In Part Two (which we recommend you listen to first), Vaughan recaps and completes his review of PetaGene’s capabilities in addressing NGS data challenges and outlines PetaGene’s product set.
Phil discusses PetaGene and Dell’s partnership and Vaughan shares customer success stories.
Click here to listen to the podcast:
(no account required)
Alternatively, here are other ways of listening to the podcast:
Phil then discusses the components of successful NGS Processing. Phil outlines the challenges associated with managing NGS data and Vaughan begins to describe how PetaGene responds to those challenges.
Click here to listen to the podcast:
(no account required)
Alternatively, here are other ways of listening to the podcast:
PetaGene's Dan Greenfield and Vaughan Wittorff talk to Technology Networks regarding the recent Genome UK announcement and PetaGene's insights.
For PetaGene, the one million genome era is underway
We are pleased to announce the reaching of a landmark: PetaGene’s customers have now compressed over one million genome files.
The dramatic drop in the cost of sequencing genomes and the numerous applications of this data to tackle critical diseases such as cancer and rare diseases has led to the rapid growth in genomic data.
PetaGene’s software is deployed across the life sciences industry: biopharma, hospitals and research centres around the world. Our customers adopted our compression technology to mitigate against the rapidly rising storage costs. PetaGene’s compression software uniquely preserves all of the file data in truly lossless compression, giving our customers the guarantee that all their data is retained in a much smaller file. Additionally, our compressed files are transparently presented back in the identical original BAM/FASTQ.gz format to all tools and pipelines, which makes integration trivial and ensures compatibility with all tools. Using the compressed data with PetaGene’s just-in-time decompression in this way actually speeds up pipelines and workflows significantly.
As genomic applications in healthcare grow it is expected that even more data will be generated and this will present a dual challenge of requiring cost-effective storage and also ensure secure, compliant data management. With that in mind, PetaGene has developed an award-winning platform to encrypt and audit all data, with bespoke region-based encryption for BAM and VCF files - the product is now in general availability. These compressed and encrypted files are compatible with all tools and, additionally, all data accesses are captured in a tamper-evident cryptographic ledger. To find out more please contact firstname.lastname@example.org.
Genome UK is an exciting and ambitious new strategy that builds upon the UK's world-leading excellence in genomics to build "the most advanced genomic healthcare system in the world". We applaud this initiative, and believe this will enable the NHS to leverage precision medicine to improve outcomes and reduce costs, significantly drive research into new treatments and diagnostics, as well as foster an ecosystem for the UK Genomics Industry to thrive.
However, this is all contingent on getting the execution right, and we have seen past ambitious genomics projects suffer due to what are "Bio-IT" issues, that is IT problems with practically handling biological datasets. Due to the scope of this initiative, and having had a great deal of experience addressing issues in this domain, we have some recommendations for how to prevent some foreseeable problems, and to make it a real success.
Our recommendations cover key issues in:
- data security and privacy, including regional encryption and data minimisation
- reducing technical barriers to ensure efficient IT and lower costs
- the need for computational reproducibility, and supporting existing pipelines
- data integrity and information loss
We are pleased to announce that PetaGene has signed an agreement appointing Dubai based Alliance Global (AGBL) as the exclusive distributor of our genomic data management software in Middle East, Africa, Central Asia, Pakistan, Bangladesh and Sri Lanka.
The number of national population-scale genomics initiatives in the region is growing and AGBL is a leading distributor of Illumina sequencing technology in the region
“AGBL already possesses significant expertise in sequencing technology sales and the wider genomics marketplace, making it an ideal partner for us in this region,” commented Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer of PetaGene. “We are excited to be working with AGBL to bring PetaGene’s software to genomics researchers across Africa, the Middle East and Asia, so that they can access their genomic data faster, more efficiently, and store it more cost-effectively.”
PetaGene’s compression software, called PetaSuite, addresses challenges caused by growing volumes of genomics data. It achieves savings of between 60 and 90 percent in both storage costs and data transfer times compared to BAM and gzipped FASTQ files. PetaSuite transparently integrates with existing storage infrastructure and bioinformatics pipelines, and PetaSuite Cloud Edition enables a user’s software tools and pipelines to seamlessly integrate with a wide variety of cloud platforms without modification.
Speaking for AGBL, Group Commercial Director, Dr. Nassim-Marie Hambouz said, “We are delighted to add PetaGene to our portfolio of partners. Their innovative technology will help organizations optimise their on-site or cloud storage costs for the growing volume of genomic data throughout the region.
To contact AGBL, visit their website.
PetaGene’s PetaSuite compression software and cloud-computing solutions speed up data transfers and reduce storage costs for research projects involving genomics data.
We are pleased to announce that Astrazeneca has selected PetaSuite software to compress the genomics data sets for AstraZeneca’s Centre for Genomics Research (CGR). Using genomics data and state-of-the-art methods for genomic analysis, the CGR investigates underlying genetic causes of disease and aims to integrate genomics across the company’s drug discovery platform. PetaSuite accelerates data transfers for cloud computing and reduces storage costs for any research project involving genomics data.
“Using genomic data for biopharmaceutical targets discovery requires large cohorts with massive multi-petabyte data sets. The time required to transfer these data from sequencers to compute clusters as well as the cost of storage can cripple these large initiatives,” said Vaughan Wittorff, Ph.D., Co-founder and Chief Commercial Officer of PetaGene. “PetaSuite addresses the challenges caused by growing volumes of genomics data and achieves up to 10x reductions in storage costs and transfer times, while adhering to the industry-standard BAM and FASTQ genomics file formats.”
More than 200,000 files processed
To date, AstraZeneca’s CGR has processed more than 200,000 genomics datasets, generating over a petabyte of data. One petabyte of data is equivalent to streaming HD movies for 40 years without a break. At this volume of data, problems in processing time, data transfers and storage size can impact the ability to deliver at scale. PetaGene’s compression software will enable the CGR to compress over 200,000 BAM files in a 24-hour period and will add the compressed data to tiered cloud storage.
Average data size reduction of 76%
“AstraZeneca’s Centre for Genomics Research has the bold ambition to analyse up to two million genomes by 2026. Minimizing the storage footprint and transfer time of genome data while maximizing data access and compute processing is a necessity to enable us to achieve our ambition.” said Slavé Petrovski, Vice President and Head of Genome Analytics and Bioinformatics, Discovery Sciences, R&D, AstraZeneca.
PetaSuite will enable the CGR to achieve an average data reduction of 76% or a 4x expansion of storage capacity. PetaGene’s transparent, lossless compression of files reduces transfer times to less than a quarter, and PetaGene’s software allows unmodified analysis tools to run more quickly.
PetaSuite users typically make it an intrinsic part of their cloud or locally hosted analysis pipeline. As data is processed, it is compressed ready for use in the next stage of analysis without it needing to be decompressed later. PetaSuite Cloud Edition allows for the seamless integration of an organization’s own tools and pipelines in the cloud or local environment of their choosing.
Click here to read the far-reaching GenomeWeb article of 31st October 2019 about PetaGene which includes this news about AstraZeneca (requires premium subscription).
Princess Máxima Center for Pediatric Oncology chooses PetaSuite for genomic oncology data compression
We are pleased to announce that Princess Máxima Center for Pediatric Oncology, the largest pediatric cancer center in Europe, has chosen to use PetaGene’s transparent, lossless genomic data compression software, called PetaSuite, to reduce its data storage costs while accelerating access to the data. Next-generation sequencing plays an integral role in the Center’s diagnostics and research discoveries. These valuable genomic datasets are large, and their volumes are growing. As such the Center sought to find a compression technology that can store genomic data for longer at a much lower cost while removing bottlenecks in genomic sequence analysis.
PetaGene's PetaSuite software was evaluated by the Center against other compression techniques and unlike these, PetaSuite met and exceeded the criteria for a simple to implement and high compression performance solution, supported to a commercial standard.
Positive evaluation results
Senior Principal Investigator Dr. Patrick Kemmeren at the Princess Máxima Center describing the process, said: “Our tests with PetaGene’s compression software gave very positive results. We tested whole exome samples, RNA-Seq and whole genome sequencing data for different tumor samples. Implementing the software on our high-performance compute cluster is easy, the compression ratios are larger than what we obtain compared to CRAM compression, and accessing data is actually slightly faster compared to non-compressed BAM files. This on top of the added benefits of not having to switch to a different file format, a perpetual license for decompression and the time gains in not doing the BAM to CRAM conversion/retooling (and vice versa for some tools). As a result, we decided to implement PetaGene’s compression software within our computational infrastructure."
The right software at the right time
Jos Leendertse, Manager Research IDT at Princess Máxima Center, commented “By implementing PetaGene’s compression software we are also able to speed up the migration process to our new storage infrastructure. It’s not only the right software but also at the right time.”
Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer at PetaGene added, “During the evaluation process, the researchers found PetaSuite’s transparent access technology particularly compelling since it meant that the compressed data could integrate seamlessly with the bioinformatics structure Princess Máxima Center already had in place. A key challenge with compression is to ensure that end-users can continue working with the compressed files without having to change their existing, optimised workflows. PetaGene has solved this by ensuring that the compressed files are readable to existing tools and pipelines in the compressed state. This means our customers do not have to change any of their tools and pipelines, making it easy to integrate our compression technology within their infrastructure.”
About Princess Máxima Center for Pediatric Oncology
Opened in 2018, the Princess Máxima Center for Pediatric Oncology, based in Utrecht, The Netherlands, consolidated the work of seven different academic centers across the Netherlands into the largest pediatric cancer center in Europe. As both a hospital and a research institute, the Center has a combination of world-class facilities, leading clinicians and researchers all driven by a passion to cure pediatric cancers. By integrating the research facilities with the hospital, the Center is better equipped to implement novel discoveries into clinical care. For more information, visit www.prinsesmaximacentrum.nl/en.
We are pleased to announce that PetaGene has signed an agreement appointing Genique Lifesciences as the exclusive distributor of our genomic data management software in India.
The agreement will allow India-based Genique Lifesciences to act as the exclusive sales channel for PetaGene’s genomic data compression software for the growing Indian market. Genique’s founding team has extensive experience with distribution of Next Generation sequencers in India.
“Genique already possesses significant expertise in sequencing technology sales and the consumer DNA testing market, making it an ideal partner for us in India, one of the fastest growing genomics markets,” commented Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer of PetaGene. “We are excited to be working with Genique to bring PetaGene’s software to genomics researchers in India, so that they can access their genomic data faster, more efficiently, and store it more cost-effectively.”
Speaking for Genique Lifesciences, Cofounder and CEO, Abhishek Das said, “We are delighted to represent PetaGene in India. Their innovative technology will help organisations optimise their on-site or cloud storage costs for the growing volume of genomic data in India.”
Significant developments in India include the Department of Biotechnology launching the Genome India project later this year with the target of sequencing the genome of 10,000 Indian citizens.
To contact Genique, visit their website.
One year ago, Frontline Genomics published Genomic Data 101, its guide to the technology and hardware landscape for genomic data storage and analysis. It proved a valuable primer for anyone looking to find out about compression and general management of genomic data.
The data infrastructure to support genomic research, including compression, has evolved since the original guide. Frontline Genomics have published a new version. It's called Biodata Analysis and Management – Genome Analytics, Interoperability, and Data Life Cycle.
- The landscape of compression options and the enhanced benefits of techniques developed specifically for genomics.
- The state of the art.
- Technical considerations when choosing a compression solution.
- Commercial considerations - ROI.
- Specific considerations when storing data in the cloud
- How different compression techniques integrate with existing and new analysis workflows.
There is new information on the innovations and developments in genomic compression, which include:
- The improved compression ratios now being achieved.
- Data to show how efficient commercially available compression gives better savings than would be obtained using a free open source tool.
- How the plateauing of storage costs mean it’s no longer possible to rely on the historical trend for reductions in storage costs.
You can download the paper here to discover the latest on compression and other aspects of genomic data management.