Accurately Building Genomic Cohorts at Scale with Delta Lake and Spark SQL

Posted Leave a commentPosted in Apache Spark, Delta, Delta Lake, Ecosystem, Engineering Blog, Genomics, HLS, Joint Genotyping, SparkSQL

This is the second post in our “Genomic Analysis at Scale”  series.  In our first post, we explored a simple problem: how to provide real-time aggregates when sequencing large volumes of genomes. We solved this problem by using Delta Lake and a streaming pipeline built using Spark SQL. In this blog, we focus on the more advanced […]

Efficient Upserts into Data Lakes with Databricks Delta

Posted Leave a commentPosted in Announcements, Change Data Capture, Company Blog, Databricks Delta, Delta, Merge, Product

Simplify building big data pipelines for change data capture (CDC) and GDPR use cases. Databricks Delta, the next-generation unified analytics engine built on top of Apache Spark™, now supports the MERGE command, which allows you to efficiently upsert and delete records in your data lakes. MERGE dramatically simplifies how a number of common data pipelines […]

Introducing Delta Time Travel for Large Scale Data Lakes

Posted Leave a commentPosted in Announcements, Company Blog, Databricks Delta, Delta

Data versioning for reproducing experiments, rolling back, and auditing data We are thrilled to introduce time travel capabilities in Databricks Delta, the next-gen unified analytics engine built on top of Apache Spark, for all of our users. With this new feature, Delta automatically versions the big data that you store in your data lake, and […]

Announcing Databricks Runtime 4.2! – The Databricks Blog

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Customers, Databricks, Delta, Engineering Blog, Platform, Product, Runtime, Streaming

We’re excited to announce Databricks Runtime 4.2, powered by Apache Spark™.  Version 4.2 includes updated Spark internals, new features, and major performance upgrades to Databricks Delta, as well as general quality improvements to the platform.  We are moving quickly toward the Databricks Delta general availability (GA) release and we recommend you upgrade to Databricks Runtime […]