Michael Hultner, our SVP Strategy and General Manager, US Operations recently attended a Special Interest Group event (SIG) organised by FrontLine Genomics. SIGs bring together senior-level research, clinical and business professionals from across the genomics community to discuss relevant issues and work towards finding solutions to common problems.
In the session on data security, privacy and consent; the subjects of sharing datasets and security provided opportunities to explain how compression technology can help. In this blog post Michael shares his insights on how access, safeguarding and cloud storage security relate to compression of genomic data.
Accessing data is difficult for researchers and can take a long time.
Lack of easy access to a dataset, or information about it, is a significant reason why research projects are time-consuming. The size of the files is a major factor in this. Genomic files can present challenges which regular data storage systems are not set up to solve. It is possible to store data using compression formats which take into account the specific nature of genomic data. This makes life easier for researchers by decreasing transfer and access times. It is also possible to speed up analysis thanks to lower I/O demands.
Data is best protected using standard safeguards.
While compressing data by itself doesn’t make the files any more or less secure, the benefits of compression can help to enable better security or make adopting best practice simpler. Requiring researchers to travel to where the data is stored in order to access it is a common approach for data stored on-premises. This means that the organisation holding the data cannot enjoy the benefits of cloud storage. It also places demands on the individual researcher and their institution, whether academic or commercial, that might not be practical. Compressing genomic data using appropriate tools gives the flexibility to enable data sharing and collaboration without exposing it to avoidable security risks.
There are still many misconceptions about the security of the cloud.
Security worries are the reason why some research institutions store their data on their own hard drives. These are then transported to individual laboratories. In the age of GDPR and protected health information, the thought of hard drives containing genomic datasets being transported by individual researchers is probably enough to give data stewards sleepless nights. Despite developments in hard drive technology, it’s an impractical approach for today's genomic datasets. A better technique would be to use established data storage solutions in the cloud or on-premises. That approach allows appropriate access and sharing protocols to be set up as well as suitable backup and restore options should the worst happen. In this case, compression reduces the cost of these established storage solutions. And if the right kind of compression is used, there is no need to change existing pipelines or bioinformatics systems.