PetaSuite losslessly compresses BAM and FASTQ files
for storage on-premises or in the cloud.


PetaGene’s compression software addresses challenges caused by growing volumes of genomics data. It achieves up to a 10x reduction in both storage costs and data transfer times compared to BAM and gzipped FASTQ files – this is a 96% reduction compared to raw FASTQ files. It transparently integrates with existing storage infrastructure and bioinformatics pipelines. PetaSuite is a set of scalable complementary software tools that significantly reduce the size and cost of NGS data for storage and transfer.

Petasuite Cloud Edition

PetaSuite Cloud Edition (CE) does everything that standard PetaSuite does, with the additional innovation of enabling a user’s software tools and pipelines to seamlessly integrate with a wide variety of cloud platforms without modification. AWS, Azure, GCP, private cloud and hybrid cloud are all supported transparently.


Lossless Compression

Our robust, high performance FASTQ.gz and BAM compression will decompress back to exactly match the original file content. There is full validation and MD5 matching, meaning that not only is the internal content of FASTQ.gz and BAM files preserved, but the gzip wrappers will exactly match, allowing simpler archiving procedures to be used.


PetaLink is a powerful virtual file access system. It enables migration of BAM and FASTQ.gz data to more efficient compression formats. For example, after the PetaSuite binary has been used to losslessly compress a BAM file, validate that all data in the BAM has been preserved, and remove the original BAM file, PetaLink makes available a high performance virtual BAM file view of the compressed file, with the filename of the original file, in the same location. This virtual file can then be used just like the original BAM file by Linux toolchains, pipelines and genome browsers transparently.

The Cloud Edition of PetaLink also allows files stored remotely in the cloud to be accessed as if they are local, without downloading them first!

Bayescal Quality Score Refinement

BayesCal uses a Bayesian approach to calculate a more complete posterior estimation of sequencer error. Genotyping accuracy is preserved across the ROC curve, with a net increase. Improved compression is a side effect, increasing compression ratios by a further 30-70%  compared with straight lossless compression.

Table showing size of files created using Fastq.gz, bam, cram and PetaGene compression

PetaGene lossless compression ratios, compared with CRAM

Source data
(human 30x WGS)
Pipeline PetaGene
CRAM (latest)
FASTQ.gz, HiSeq X 3.0 67% Not applicable
FASTQ.gz, NovaSeq 4.3 77% Not applicable
BAM, HiSeq X BWA-mem only 2.2 55% 1.9
BAM, HiSeq X GATK 5.2 81% 1.5
BAM, NovaSeq Isaac only 2.8 64% 2.3
BAM, NovaSeq BWA-mem only 3.2 69% 2.4
BAM, NovaSeq GATK 10.9 91% 1.5

Note: using PetaGene’s optional BayesCal quality score refinement increases the compression ratio by a further 30%–70%.

Save up to 90% of storage costs and transfer times
without compromising data quality

Free Evaluation Licence Trial

Please complete the form below to request your free trial of PetaSuite or PetaSuite Cloud Edition.