Genique Lifesciences appointed exclusive PetaGene distributor for India

Posted on October 3, 2019

We are pleased to announce that PetaGene has signed an agreement appointing Genique Lifesciences as the exclusive distributor of our genomic data management software in India.

The agreement will allow India-based Genique Lifesciences to act as the exclusive sales channel for PetaGene’s genomic data compression software for the growing Indian market. Genique’s founding team has extensive experience with distribution of Next Generation sequencers in India.

“Genique already possesses significant expertise in sequencing technology sales and the consumer DNA testing market, making it an ideal partner for us in India, one of the fastest growing genomics markets,” commented Vaughan Wittorff, Ph.D., Cofounder and Chief Commercial Officer of PetaGene. “We are excited to be working with Genique to bring PetaGene’s software to genomics researchers in India, so that they can access their genomic data faster, more efficiently, and store it more cost-effectively.”

Speaking for Genique Lifesciences, Cofounder and CEO, Abhishek Das said, “We are delighted to represent PetaGene in India. Their innovative technology will help organisations optimise their on-site or cloud storage costs for the growing volume of genomic data in India.”

Significant developments in India include the Department of Biotechnology launching the Genome India project later this year with the target of sequencing the genome of 10,000 Indian citizens.

To contact Genique, visit their website.

ASHG 2019 in Houston

Posted on July 16, 2019

ASHG Annual Meeting 2019 graphicWe will be among the exhibitors (on booth #609) at the ASHG Annual Meeting 2019 in Houston, TX from 15th to 19th October. Each year the event attracts around 6,500 scientific attendees, plus 250 exhibiting companies. It’s the world’s largest gathering of human genetics professionals. The meeting provides a forum for the presentation and discussion of cutting-edge science in all areas of human genetics.

Make sure you visit us to talk about the challenges you face with storing and managing large genomic datasets.

We’ll also be hosting an Exhibit Hall Education (CoLab) session on Wednesday 16th October from 4:00 to 4:30pm on booth 345. In this session Michael Hultner, PhD, SVP Strategy & GM US Operations will demonstrate how our new encryption and data management capabilities enable organizations to manage access to their genomic data by internal and external teams.

If you’re attending ASHG 2019 in Houston, drop by booth #609. Even better, book a meeting in advance of the show to make sure we fit into your schedule. We look forward to seeing you there.

If you’d like to know more about the event or register to attend, visit the website.

New genomic data storage and analysis guide

Posted on July 11, 2019
Laptop showing data compression report

One year ago, Frontline Genomics published Genomic Data 101, its guide to the technology and hardware landscape for genomic data storage and analysis. It proved a valuable primer for anyone looking to find out about compression and general management of genomic data.

The data infrastructure to support genomic research, including compression, has evolved since the original guide. Frontline Genomics have published a new version. It's called Biodata Analysis and Management – Genome Analytics, Interoperability, and Data Life Cycle.

The new publication addresses key issues:

  • The landscape of compression options and the enhanced benefits of techniques developed specifically for genomics.
  • The state of the art.
  • Technical considerations when choosing a compression solution.
  • Commercial considerations - ROI.
  • Specific considerations when storing data in the cloud
  • How different compression techniques integrate with existing and new analysis workflows.

There is new information on the innovations and developments in genomic compression, which include:

  • The improved compression ratios now being achieved.
  • Data to show how efficient commercially available compression gives better savings than would be obtained using a free open source tool.
  • How the plateauing of storage costs mean it’s no longer possible to rely on the historical trend for reductions in storage costs.

You can download the paper here to discover the latest on compression and other aspects of genomic data management.

HIMSS Europe Conference 2019

Posted on May 24, 2019
Helsinki skyline with HIMSS logo

PetaGene will be attending the HIMSS European Conference in Helsinki, Finland from 11th to 13th June. It’s a great opportunity to meet and tell us about the challenges you face when storing and working with NGS genomic data.

Come along to learn more about how our dramatic compression ratios, combined with the right storage architecture, can help you substantially reduce your storage costs, speed up transfer of genomic data; and enable collaboration through the cloud.

Schedule a meeting during the event by contacting

To find out more about the event, and register to attend, visit the event's website.

Case study: Optimizing genomic data storage for clinical research facilities

Posted on May 1, 2019
Scientist working in laboratory

In clinical research, next generation sequencing (NGS) allows production of genomic data at an ever increasing rate. Storing genomic data effectively is critical, and while sequencing costs are falling, the cost of storage for the resulting files is increasing. As the amount of data sequenced grows, genomic data storage costs and transfer times can be a blocker on effective research and collaboration.

Clinical Genomics Gothenburg is a translational clinical research facility, performing bioinformatics and sequencing to provide an end-to-end solution to genomic data analysis. Increase in storage requirements was about to place their existing infrastructure under considerable strain. Clinical Genomics Gothenburg found a solution to their problem in PetaGene’s PetaSuite genomic data compression software.

Read more about how genomic data compression allowed Clinical Genomics Gothenburg to increase their storage capacity without modifying their workflows in our case study.

PetaSuite genomic data compression allows 60-90% storage cost and transfer time reductions, increasing storage capacity without adding to it. PetaGene’s software works with existing tools and pipelines, so can be used without workflow disruption. Data is quickly and easily accessed whether in the cloud or on-premise.

Compression is an effective method for managing genomic data generated in clinical research. By compressing genomic data, storage costs and transfer times can be reduced, facilitating analysis and saving valuable resources.  To find out more, get in touch using our contact form.

If you would like to keep up to date with news from us, please complete the form to subscribe to updates.

Bio-IT World 2019 Best of Show winners

Posted on April 26, 2019
PetaGene celebrate their Best of Show award
The PetaGene team toast their Best of Show award

The latest addition to our product range, PetaSuite Protect, won “Best of Show” earlier this month at BioIT World Conference & Expo 2019, the premier conference for IT in Life Sciences. This is our third “Best of Show” win, previously winning in 2016 and 2018. This year, 31 new products were considered by an expert panel of judges who awarded PetaSuite Protect  the ‘Nailed It’ award.

In the award citation, Phillips Kuhl, President at Cambridge Healthtech Institute said, “Our judges believe this is a new, powerful and highly relevant approach to security, driven by a passionate and invested team”.

PetaSuite Protect provides users with the tools to encrypt their genomic data; manage fine-grain access to it, and demonstrate compliance with applicable regulations. As with the established PetaSuite compression technology, the new encryption and access capabilities are completely transparent to genomic tools and analysis pipelines.

Dan Greenfield, our co-founder and CEO said, “We’re thrilled to win this illustrious award for the third time. We’re particularly grateful to the judges for recognizing the relevance of our approach to this important element of genomic data management.”

To find out more about PetaSuite Protect please get in touch via our contact us page.

If you would like to keep up to date with news from us, please complete the form to subscribe to updates.

Solve your genomic data headaches at Bio-IT World

Posted on March 14, 2019
Image of Boston waterfront at night

Bio-IT World Conference & Expo at the Seaport World Trade Center in Boston promises to be the biggest and best yet. Over 3,400 life sciences, clinical, healthcare, and IT professionals from over 40 countries are expected to attend from April 16th to 18th.

Join us on booth #317 to talk about solving the challenges you face with genomic data. We’d be delighted to discuss how the latest developments in our data management technology can help you spend less on genomic data storage. And that means you could spend more on your business goals or research objectives.

If you’re attending Bio-IT World Conference & Expo in Boston, drop by booth #317 or book a meeting in advance of the show to make sure we fit into your schedule. We look forward to seeing you there.

If you’d like to know more about the event or register to attend, visit the website.

If you'd like us to keep you informed with the latest PetaGene news and relevant developments, complete the form below and sign up to our newsletter.

We take your privacy very seriously. We will only ever use the information we collect or receive about you in accordance with our Privacy Policy.

Don’t worry, if you prefer us not to contact you about similar products or services we believe you will be interested in, simply opt out at any time using the unsubscribe buttons in our emails.

Photo by Lance Anderson on Unsplash

PetaGene is hiring

Posted on February 21, 2019

Would you like to join a funded, award-winning and growing Cambridge start-up working in the increasingly vital field of genomic data?   

We are looking for developers and a business support administrator.  For the developer roles you’ll need to be proficient in C/C++ and it would help if you’re comfortable working with algorithms.

For the business support administrator position, we’re looking for someone with experience in a business support role. You will need to be well-organised, persistent and accurate to help make our sales operation as effective as possible.

We’ll be at the Cambridge Network Recruitment Evening on 26th February, so come along to talk about how you could fit into our team.

Full job descriptions are on our careers page, with information on how to apply.  We'd love to hear from you.


Photo by Tim Mossholder on Unsplash

HIMSS Annual Conference and Exhibition

Posted on January 18, 2019
Logo for HIMSS conference and exhibition

HIMSS 2019

PetaGene will be attending the HIMSS show in Orlando from 12th to 14th February. It’s a great opportunity to meet and tell us about the challenges you face when storing and working with NGS genomic data.

Come to our presentation titled ‘Scaling Genomics Workloads for Precision Medicine’ on 12th February at 3:30pm. It's happening on the NetApp booth #2779 .

Come along to learn more about how our dramatic compression ratios combined with the right storage architecture can help you achieve optimum efficiency with genomic data; and enable collaboration through the cloud.

Complete the form on this page to schedule a meeting with us at HIMSS 2019.

To find out more about the event, and register to attend, visit the event's website.

Why Do Community Driven Genomic Data Standards Matter?

Posted on January 15, 2019
Image of a rowing crew

Genomics is a community driven data science, with existing data standards. The ability to exchange data and share results relies on a small number of common file formats; and the software tools to read, process and generate data according to these conventions. Many of the common data formats are represented in flat text files; some pre-date the internet, but they are the glue of genomics because the community supports them as de facto standards.

PetaGene Honors and Respects Community Driven Genomics Standards

PetaGene genomic data compression technology reduces the size of genomic data contained in these common file formats, without any loss of information. The compressed data is stored in a unique binary file format, but we add a transparency technology that enables the compressed data to look and interact as the original data files. Through this innovation we always present PetaGene compressed data in compliance with community accepted standards. Compressed BAM and FASTQ files appear as BAM and FASTQ files; and we provide the option to output CRAM files that are readable by any open-source tool that understands CRAM v3. PetaGene compression will never result in yet-another-file-format.

This design choice was made deliberately. By presenting our hyper-compressed data as native BAM, FASTQ, or CRAM, we honor and respect the efforts of the genomics community to maintain stable data exchange media.

Timeline of developments in genomic data formats

Timeline from 1970 to 2018 showing genomic data formats

A Brief History of A C G T

The Needleman-Wunsch algorithm for global sequence alignment was published in 1970; and the Smith-Waterman algorithm for local alignment was published in 1981. Practical implementations of local alignment and search software were developed in the 1980’s. FASTP was a program for local protein sequence alignment and similarity searches published by Lipman and Pearson in 1985. Soon after, FASTA (1988) and BLAST (1990) were developed to provide fast and sensitive similarity searches for nucleotide and protein sequences. The software could read text files as input that became known as FASTA files. The structure of the text data expected by the FASTA programs became the FASTA format. This format still exists and is the format for the human genome reference sequences.

High throughput sequencing came along with the need to track the confidence of each base call in a re-sequencing result. Each sequencing technology has its own “Quality Score” and kept these scores in a separate file. Eventually sequence and quality scores were merged into one file, the FASTQ (circa 2000). There were competing versions of the format (Sanger, Solexa, and Illumina) until some convergence on quality scores occurred around 2009.

The 1000 Genomes Project drove the development of and consensus around data formats for mapped and aligned reads (SAM) and its binary, compressed form (BAM) by Heng Li and others in 2009. The 1000 Genomes Project also produced the Variant Call Format (VCF) in 2009 which captures variant information for a genome or set of genomes. CRAM is the latest addition as a format for compressed genomic data, introduced in 2011.

These formats became open specifications maintained by the 1000 Genomes Project until the Global Alliance for Genomics and Health (GA4GH) took over stewardship in 2016.

Global Alliance for Genomics & Health logo, the standards setting body for genomic data file formats.

Community Standards

FASTA, FASTQ, BAM and VCF formats have persisted and become de facto standards mainly because they are simple, human readable, and an ecosystem of software tools that process these data has grown over time due to the support of many individuals and groups. There was no process to draft a standard, create an optimal representation, and approve the standard.

The informal process by which these standards have emerged may be their most important values:
  • they are not the product of modern data science but they are universally understood by the community, which is more important;
  • there is a legacy and diversity of tools, algorithms, and pipelines that support these formats; and
  • there is a supportive group of people that will help newcomers and fix bugs when they are identified.

The adoption of new file formats or standards will happen slowly in the genomics user base due to the social and academic dynamics of the community. Due to technical and social inertia, replacing these formats is going to be very difficult for the foreseeable future. Perhaps a more formal standards process, driven by the GA4GH, will provide innovations, but this remains to be demonstrated.

At PetaGene we honor and respect these community standards. PetaGene technology provides extreme compression of genomic data without requiring the adoption of a new data structure or file format. Instead we respect the community standards by providing community compliant interfaces to the compressed data. Introducing yet another file format would provide no benefit to the community and only stifle adoption. This is why our products were engineered to present our compressed data as native FASTQ, BAM. or CRAM files.

How Does PetaGene Genomic Data Compression Support the Standards?

While we compress genomic data beyond what GZIP and CRAM can do, we present the data back as the original BAM or FASTQ files. Users and applications never see the compressed data and never need to interact with the compressed file format. Instead we employ functional interposition with the aid of an LD_PRELOAD library that provides dynamic decompression and format translation for all command line tools, applications, and pipelines. In fact, the filesystem representation of the data is also the original .bam or .fastq file names.

Our compression software also has the option to output CRAM formatted files that are created by our compressor and written to a CRAM 3.0 specification compliant file. As such, any CRAM aware tool or application can read the file without the aid of our decompression library (PetaLink). We are completely interoperable with the community standards and there is no lock-in with our compression technology.

Thus PetaGene compression technology supports the existing standards by providing users with perfectly GA4GH compliant FASTQ, BAM, or CRAM data.

What Are The Benefits of This Approach?

Supporting the community standards makes our technology immediately interoperable with all bioinformatics tools. There are no barriers to adoption and we fit right into the ecosystem of tools and technologies for processing, storing, and retrieving genomic data.

We also eliminate a major integration problem that any new format would create: being compatible with existing tools and avoiding any modification or coding to add the technology into existing workflows. By presenting data back to a tool as data it already knows, we eliminate the integration -- it just works straight out of the box. This is essentially zero-code integration.

The last reason is because BAM and FASTQ are stable formats and widely used. There is no need to propose a new format to fit our needs and expect the rest of the community to bend to our will. This does not advance bioinformatics or our business.

Open Access

Our compression technology is not open-source, it is open access. The software requires a commercial license but the basic read-back library (PetaLink) is “open access” such that it is always free and always available via The paid license is required for compression but not decompression. For most applications the files are compressed once to achieve storage savings and then decompressed many times. PetaLink remains free to use after the compression license has expired or depleted.

The cloud edition version of PetaLink has many additional features and requires a license.

Sustainable Commercial Support

PetaGene provides business value by making genomic data smaller and faster. Smaller data files translate into reduced storage costs and more budget for primary research activities. Faster data movement reduces processing time which accelerates discovery or provides a clinical result sooner. In our business model, we earn revenues when clients save time and money.

We charge only for compression and license fees are based on compression savings. Clients recover these fees from only a few months of storage savings. Afterwards, clients accrue 100% savings, month after month. Within one year, clients save an average of 50% in storage costs. In 5 years, these savings are over 10x of our original fee to compress the data.

We use these revenues to provide prompt and responsive support to users, fix defects, and continue to improve the product is a sustainable manner.

PetaGene also provides a fully supported, commercial implementation option for CRAM genomics data compression, should you require CRAM. Our CRAM implementation has some additional features: our reference-free compression, storage of CRAM compliant files, and transparent read-back of CRAM files to BAM with universal support of tools that don’t support CRAM. Our CRAM files can be read and processed by any tools that support CRAM v3. We will support your integration, operations, and provide technical assistance should you ever encounter problems with CRAM.

PetaSuite is a fully supported, commercial option for genomic data compression and a commercially supported implementation of CRAM. We provide clients with full warranty and  support while using our software or using CRAM.


  • Extreme compression that saves money.
  • Transparent read-back that eliminates integration and speeds up data transfers.
  • Commercial support for PetaGene and CRAM compression workflows.
  • Auditable and verifiable data integrity for lossless compression.
  • Indemnity Insurance for data loss.


PetaGene technology is 100% compliant with community standards and GA4GH. We can provide you the best compression technology and support even if you choose to store open-source CRAM files.