The evolution of genomics in recent decades has seen the volume of sequencing rise dramatically as a result of lower costs. Massive growth in the quantities of data created by sequencing has greatly increased analytical challenges, and placed ever-increasing demands on compute and storage infrastructure. Researchers have leveraged high-performance computing environments and cluster computing to meet demands, but today even the fastest compute environments are constrained by the lagging performance of underlying storage.
Published By: Vertica
Published Date: Oct 30, 2009
Independent research firm Knowledge Integrity Inc. examine two high performance computing technologies that are transitioning into the mainstream: high performance massively parallel analytical database management systems (ADBMS) and distributed parallel programming paradigms, such as MapReduce, (Hadoop, Pig, and HDFS, etc.). By providing an overview of both concepts and looking at how the two approaches can be used together, they conclude that combining a high performance batch programming and execution model with an high performance analytical database provides significant business benefits for a number of different types of applications.