Apache Spark Tutorials at 2019 Spark + AI Summit

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Delta Lake, Events, MLflow, Spark SQL, Structured Streaming

You might have heard the famous saying, “Why software is eating the world.” But if software is eating the world, you may ask, where does software come from? Naturally, Developers! Some software developers advocate that the “Developers are eating the world.” A research report by Stripe indicates that “developers have the ability to raise global […]

AWS Data Lake Delta Transformation Using AWS Glue

Posted Leave a commentPosted in AWS Data Lake, AWS Glue Catalog, Company Blog, Data Bricks Runtime, Data Lake, Delta Lake

In this blog post we will explore how to reliably and efficiently transform your AWS Data Lake into a Delta Lake seamlessly using the AWS Glue Data Catalog service. The AWS Glue service is an Apache compatible Hive serverless metastore which allows you to easily share table metadata across AWS services, applications, or AWS accounts. […]

Diving Into Delta Lake: Unpacking The Transaction Log

Posted Leave a commentPosted in Apache Spark, Company Blog, Customers, Data Engineering, Delta Lake, Ecosystem, Education, Engineering Blog, Internals, Optimistic Concurrency Control, Platform, Product, Transaction Log

The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this article, we’ll explore what the transaction log is, how it works at the file level, and how it offers […]

Productionizing Machine Learning with Delta Lake

Posted Leave a commentPosted in AI, Apache Spark, Company Blog, Data Engineering, Delta Lake, Ecosystem, Education, Engineering Blog, Machine Learning, Platform

Try out this notebook series in Databricks – part 1 (Delta Lake), part 2 (Delta Lake + ML) For many data scientists, the process of building and tuning machine learning models is only a small portion of the work they do every day. The vast majority of their time is spent doing the less-than-glamorous (but […]

Announcing the Delta Lake 0.3.0 Release

Posted Leave a commentPosted in Commit History, Data Engineering, Delete, Delta Lake, Engineering Blog, Merge, Platform, Update, Vacuum

We are excited to announce the release of Delta Lake 0.3.0 which introduces new programmatic APIs for manipulating and managing data in Delta tables. The key features in this release are: Scala/Java APIs for DML commands – You can now modify data in Delta tables using programmatic APIs for Delete (#44), Update (#43) and Merge […]

Migrating Transactional Data to a Delta Lake using AWS DMS

Posted Leave a commentPosted in Company Blog, Delta Lake, Partners

Try this notebook in Databricks Note: We also recommend you read Efficient Upserts into Data Lakes with Databricks Delta which explains the use of MERGE command to do efficient upserts and deletes. Challenges with moving data from databases to data lakes Large enterprises are moving transactional data from scattered data marts in heterogeneous locations to a […]

Accurately Building Genomic Cohorts at Scale with Delta Lake and Spark SQL

Posted Leave a commentPosted in Apache Spark, Delta, Delta Lake, Ecosystem, Engineering Blog, Genomics, HLS, Joint Genotyping, SparkSQL

This is the second post in our “Genomic Analysis at Scale”  series.  In our first post, we explored a simple problem: how to provide real-time aggregates when sequencing large volumes of genomes. We solved this problem by using Delta Lake and a streaming pipeline built using Spark SQL. In this blog, we focus on the more advanced […]

Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in ACID Transactions, Apache Spark, Company Blog, Delta Lake, Education, Engineering Blog, Financial Services, Product, Streaming, Structured Streaming, Time Travel, Unified Batch and Streaming Sync

On June 13th, we hosted a live webinar — Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark — with Junta Nakai, Industry Leader – Financial Services at Databricks, John O’Dwyer, Solution Architect at Databricks, and Denny Lee, Technical Product Marketing Manager at Databricks. This is the first webinar in a series of financial […]

Spark + AI Summit 2019 Product Announcements and Recap. Watch the keynote recordings today!

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Delta Lake, Events, Koalas, MLflow, Product, Spark + AI Summit

Spark + AI Summit 2019, the world’s largest data and machine learning conference for the Apache Spark™ Community, brought nearly 5000 data scientists, engineers, and business leaders to San Francisco’s Moscone Center to find out what’s coming next. Watch the keynote recordings today and learn more about the latest product announcements for Apache Spark, MLflow, […]