Simplifying Genomics Pipelines at Scale with Databricks Delta

Posted Leave a commentPosted in Apache Spark, data pipeline, Engineering Blog, Genomics, HLS, Machine Learning, Streaming

Try this notebook in Databricks This blog is the first blog in our “Genomics Analysis at Scale” series. In this series, we will demonstrate how the Databricks UAP4Genomics enables customers to analyze population-scale genomic data. Starting from the output of our genomics pipeline, this series will provide a tutorial on using Databricks to run sample […]

Managed MLflow on Databricks now in public preview

Posted Leave a commentPosted in Announcements, Company Blog, Data Science, Ecosystem, Engineering Blog, Machine Learning, Managed MLflow, MLflow, Platform, Product

Try this tutorial in Databricks Building production machine learning applications is challenging because there is no standard way to record experiments, ensure reproducible runs, and manage and deploy models. To address these challenges, last June we introduced MLflow, an open source platform to manage the ML lifecycle that works with any machine learning library and […]

Speedy Scala Builds with Bazel at Databricks

Posted Leave a commentPosted in Apache Spark, Bazel, Company Blog, Databricks, Engineering Blog, SBT, Scala

Databricks migrated over from the standard Scala Build Tool (SBT) to using the Bazel to build, test and deploy our Scala code. Through improvements in our build infrastructure, Scala compilation workflows that previously took minutes to tens of minutes now complete in seconds. This post will walk you through the improvements we made to achieve […]

How to Work with Avro, Kafka, and Schema Registry in Databricks

Posted Leave a commentPosted in Apache Avro, Apache Kafka, Apache Spark, Company Blog, DBR 5.2, Ecosystem, Engineering Blog, Product, Streaming, Structured Streaming

In the previous blog post, we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can use it to build streaming data pipelines with the from_avro and to_avro functions. Apache Kafka and Apache Avro are commonly used to build a scalable and near-real-time data pipeline. In this blog post, […]

Announcing Databricks Runtime 5.2 – The Databricks Blog

Posted Leave a commentPosted in Announcements, Engineering Blog, Product

We are excited to announce the release of Databricks Runtime 5.2 which introduces several new features including the following: Delta Time Travel Fast Parquet Import Databricks Advisor Let’s unpack each of these features in more detail: Delta Time Travel Time Travel, released as an Experimental feature, adds the ability to query a snapshot of a table […]

Accelerating Machine Learning on Databricks: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Company Blog, Data Science, Databricks Runtime, Deep Learning, Ecosystem, Engineering Blog, Horovod, HorovodRunner, Keras, Machine Learning, MLflow, Platform, Product, TensorFlow

Try this notebook in Databricks On January 15th, we hosted a live webinar—Accelerating Machine Learning on Databricks—with Adam Conway, VP of Product Management, Machine Learning, at Databricks and Hossein Falaki, Software Development Engineer and Data Scientist at Databricks. In this webinar, we covered some of the latest innovations brought into the Databricks Unified Analytics Platform […]

Databricks Runtime 5.2 ML Features Multi-GPU Workflow, Pregel API, and Performant GraphFrames

Posted Leave a commentPosted in Apache Spark, Databricks Runtime 5.2 ML, Deep Learning, Engineering Blog, GraphFrames, HorovodRunner, Machine Learning, Platform, PyTorch, TensorFlow

We are excited to announce the release of Databricks Runtime 5.2 for Machine Learning. This release includes several new features and performance improvements to help developers easily use machine learning on the Databricks Unified Analytics Platform. Continuing our efforts to make developers’ lives easy to build deep learning applications, this release includes the following features […]

5 Reasons to Become an Apache Spark Expert

Posted Leave a commentPosted in Apache Spark, Company Blog, Education, Engineering Blog, Spark, Spark + AI Summit, training

Apache Spark™ has fast become the most popular unified analytics engine for big data and machine learning. It was originally developed at UC Berkeley in 2009 by the team who later founded Databricks. Since its release, Apache Spark has seen rapid adoption. Today’s most cutting-edge companies such as Apple, Netflix, Facebook, and Uber have deployed Spark […]

Kicking Off 2019 with an MLflow User Survey

Posted Leave a commentPosted in Apache Spark, Ecosystem, Engineering Blog, Machine Learning, MLflow

It’s been six months since we launched MLflow, an open source platform to manage the machine learning (ML) lifecycle, and the project has been moving quickly since then. MLflow fills a role that hasn’t been served well in the open source community so far: managing the development lifecycle for ML, including tracking experiments and metrics, […]

Introducing Databricks Library Utilities for Notebooks

Posted Leave a commentPosted in Announcements, Engineering Blog

Databricks has introduced a new feature, Library Utilities for Notebooks, as part of Runtime version 5.1. It allows you to install and manage Python dependencies from within a notebook.   This provides several important benefits: Install libraries when and where they’re needed, from within a notebook.  This eliminates the need to globally install libraries on a […]