University of California-Berkeley researchers employ the latest tools built on Apache Spark to accelerate DNA sequencing in pursuit of precision medicine. Adding Pure FlashBlade™
from Pure Storage® has significantly reduced the time needed to sequence data-intensive DNA samples and analyze results.
Pure Storage, a pioneer in block-based flash arrays, has developed a technology called FlashBlade, designed specifically for file and object storage environments. With FlashBlade, IT teams now have a simple-to-manage shared storage solution that delivers the performance and capacity needed to bring Spark deployments on premise.
To help gain a deeper understanding of the storage challenges related to Spark and how FlashBlade addresses them, Brian Gold of Pure Storage sat down with veteran technology journalist Al Perlman of TechTarget for a far-reaching discussion on Spark trends and opportunities.
In the new age of big data, applications are leveraging large farms of powerful servers and extremely fast networks to access petabytes of data served for everything from data analytics to scientific discovery to movie rendering. These new applications demand fast and efficient storage, which legacy solutions are no longer capable of providing.
The verification workload comprises hundreds of millions of small files, very high metadata, and extremely high performance read, write, and delete requirements.
The Pure Storage FlashBlade product’s innovative design provides high IOPS and throughput, and low latency and fast deletes – yielding an average 25% faster wall clock completion time.
The evolution of genomics in recent decades has seen the volume of sequencing rise dramatically as a result of lower costs. Massive growth in the quantities of data created by sequencing has greatly increased analytical challenges, and placed ever-increasing demands on compute and storage infrastructure. Researchers have leveraged high-performance computing environments and cluster computing to meet demands, but today even the fastest compute environments are constrained by the lagging performance of underlying storage.
FlashBlade fabric modules implement a unified network that connects all blades to each other and to the data center network. With full connectivity, all blades can serve as client connection endpoints, as authorities that process client requests, and as storage managers that transfer data to and from flash and NVRAM.
Pure Storage has significant expertise creating scalable, enterprise-class, flash-optimized storage platforms, and with FlashBlade, Pure Storage has crafted a turnkey, purpose-built platform that is well suited to cost effectively handle the performance and capacity requirements of genomics workflows. Pure Storage has differentiated itself from more established enterprise storage providers by delivering an industry-leading customer experience, as shown by its extremely high NPS, indicating it knows how to meet and is committed to meeting customer requirements. Whether genomics practitioners plan an on-premises deployment or a cloud-based deployment for their genomics workflows, they should consider the performance, cost, and patient care advantages of the Pure Storage FlashBlade when choosing a platform, particularly if they plan to retain data for a long time and use it frequently.
Man AHL is a pioneer in the field of systematic quantitative investing. Its entire business is based
on creating and executing computer models to make investment decisions. The firm has adopted the
Pure FlashBlade™ solution from Pure Storage to deliver the massive storage throughput and scalability
required to meet its most demanding simulation applications.
This document describes the technical reasons for and benefits of an end-to-end training system and why the Pure Storage® FlashBlade™ product is an essential platform. It also shows performance benchmarks based on a system that combines the NVIDIA® DGX-1™, a multi-GPU server purpose-built for deep learning applications and Pure FlashBlade, a scale out, high performance, dynamic data hub for the entire AI data pipeline.
Advances in deep neural networks have ignited a new wave of algorithms and tools for data scientists to tap into their data with artificial intelligence (AI). With improved algorithms, larger data sets, and frameworks such as TensorFlow, data scientists are tackling new use cases like autonomous driving vehicles and natural language processing. Read this technical white paper to learn reasons for and benefits of an end-to-end training system. It also shows performance benchmarks based on a system that combines the NVIDIA® DGX-1™, a multi-GPU server purpose-built for deep learning applications and FlashBlade, a scale-out, high performance, dynamic data hub for the entire AI data pipeline.
With the growth of unstructured data and the challenges of modern workloads such as Apache Spark™, IT teams have seen a clear need during the past few years for a new type of all-flash storage solution, one that has been designed specifically for users requiring high levels of performance in file- and object-based environments. With FlashBlade™, it addresses performance challenges in Spark environments by delivering the consistent performance of all-flash storage with no caching or tiering, as well as fast metadata operations and instant metadata queries.
Veritas' NetBackup software has long been a favorite for data protection in the enterprise, and is now fully integrated with the market-leading all-flash data storage platform: Pure Storage. NetBackup leverages the FlashArray API for fast and simple snapshot management, and protection copies can be stored on FlashBlade for rapid restores and consolidation of file and object storage tiers. This webinar features architecture overviews as well as 2 live demo's on the aforementioned integration points.
Man AHL is a pioneer in the field of systematic quantitative investing. Its entire business is based on creating and executing computer models to make investment decisions. The firm has adopted the Pure FlashBlade™ solution from Pure Storage to deliver the massive storage throughput and scalability required to meet its most demanding simulation applications.