Building the Fastest DNASeq Pipeline at Scale

Posted Leave a commentPosted in Benchmark, DNA, DNASeq, Engineering Blog, GATK, Genomics, Platform, Unified Analytics Platform

In June, we announced the Unified Analytics Platform for Genomics with a simple goal: accelerate discovery with a collaborative platform for interactive genomic data processing, analytics and AI at massive scale.  In this post, we’ll go into more detail about one component of the platform: a scalable DNASeq pipeline that is concordant with GATK4 at […]

A Guide to Apache Spark Use Cases, Streaming, and Research Talks at Spark + AI Summit Europe

Posted Leave a commentPosted in Apache Spark, Company Blog, Data Science, Events, Genomics, Machine Learning, PySpark, Spark + AI Summit Europe, Structured Streaming, Unified Analytics Platform

For much of Apache Spark’s history, its capacity to process data at scale and capability to unify disparate workloads has led Spark developers to tackle new use cases. Through innovation and extension of its ecosystem, developers combine data and AI to develop new applications. So it befits developers to come to this summit not just […]