Managed MLflow on Databricks now in public preview

Posted Leave a commentPosted in Announcements, Company Blog, Data Science, Ecosystem, Engineering Blog, Machine Learning, Managed MLflow, MLflow, Platform, Product

Try this tutorial in Databricks Building production machine learning applications is challenging because there is no standard way to record experiments, ensure reproducible runs, and manage and deploy models. To address these challenges, last June we introduced MLflow, an open source platform to manage the ML lifecycle that works with any machine learning library and […]

How to Work with Avro, Kafka, and Schema Registry in Databricks

Posted Leave a commentPosted in Apache Avro, Apache Kafka, Apache Spark, Company Blog, DBR 5.2, Ecosystem, Engineering Blog, Product, Streaming, Structured Streaming

In the previous blog post, we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can use it to build streaming data pipelines with the from_avro and to_avro functions. Apache Kafka and Apache Avro are commonly used to build a scalable and near-real-time data pipeline. In this blog post, […]

Accelerating Machine Learning on Databricks: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Company Blog, Data Science, Databricks Runtime, Deep Learning, Ecosystem, Engineering Blog, Horovod, HorovodRunner, Keras, Machine Learning, MLflow, Platform, Product, TensorFlow

Try this notebook in Databricks On January 15th, we hosted a live webinar—Accelerating Machine Learning on Databricks—with Adam Conway, VP of Product Management, Machine Learning, at Databricks and Hossein Falaki, Software Development Engineer and Data Scientist at Databricks. In this webinar, we covered some of the latest innovations brought into the Databricks Unified Analytics Platform […]

Kicking Off 2019 with an MLflow User Survey

Posted Leave a commentPosted in Apache Spark, Ecosystem, Engineering Blog, Machine Learning, MLflow

It’s been six months since we launched MLflow, an open source platform to manage the machine learning (ML) lifecycle, and the project has been moving quickly since then. MLflow fills a role that hasn’t been served well in the open source community so far: managing the development lifecycle for ML, including tracking experiments and metrics, […]

MLflow v0.8.1 Features Faster Experiment UI and Enhanced Python Model

Posted Leave a commentPosted in Apache Spark, Data Science, Ecosystem, Engineering Blog, Machine Learning, Machine Learning Life Cycle, MLflow, Model Management, Platform, Spark UDF

Try this notebook in Databricks MLflow v0.8.1 was released this week. It introduces several UI enhancements, including faster load times for thousands of runs and improved responsiveness when navigating runs with many metrics and parameters. Additionally, it expands support for evaluating Python models as Apache Spark UDFs and automatically captures model dependencies as Conda environments. […]

Introducing Built-in Image Data Source in Apache Spark 2.4

Posted Leave a commentPosted in Apache Spark, Data Source, Databricks Runtime, DataFrames, Deep Learning Pipelines, Ecosystem, Engineering Blog, Machine Learning

Introduction With recent advances in deep learning frameworks for image classification and object detection, the demand for standard image processing in Apache Spark has never been greater. Image handling and preprocessing have their specific challenges – for example, images come in different formats (eg., jpeg, png, etc.), sizes, and color schemes, and there is no […]

Apache Avro as a Built-in Data Source in Apache Spark 2.4

Posted Leave a commentPosted in Apache Avro, Apache Spark, Apache Spark 2.4, Data Source, Ecosystem, Engineering Blog, Spark SQL, Streaming, Structured Streaming

Try this notebook in Databricks Apache Avro is a popular data serialization format. It is widely used in the Apache Spark and Apache Hadoop ecosystem, especially for Kafka-based data pipelines. Starting from Apache Spark 2.4 release, Spark provides built-in support for reading and writing Avro data. The new built-in spark-avro module is originally from Databricks’ […]

Introducing Databricks Runtime 5.0 for Machine Learning

Posted Leave a commentPosted in Announcements, Company Blog, Databricks Runtime 5.0 ML, Deep Learning, Ecosystem, Engineering Blog, Machine Learning, Platform

Six months ago we introduced the Databricks Runtime for Machine Learning with the goal of making machine learning performant and easy on the Databricks Unified Analytics Platform. The Databricks Runtime for ML comes pre-packaged with many ML frameworks and enables distributed training and inference. Today we are excited to release the second iteration including Conda […]

Applying your Convolutional Neural Network: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Deep Learning, Ecosystem, Engineering Blog, Keras, Machine Learning, Neural Networks, Platform, TensorFlow

Try this notebook in Databricks On October 25th, we hosted a live webinar—Applying your Convolutional Neural Network—with Denny Lee, Technical Product Marketing Manager at Databricks. This is the third webinar of a free deep learning fundamental series from Databricks. In this webinar, we dived deeper into Convolutional Neural Networks (CNNs), a particular type of neural […]

Introducing Apache Spark 2.4 – The Databricks Blog

Posted Leave a commentPosted in Apache Spark, Apache Spark 2.4, Databricks Runtime 5.0, Ecosystem, Engineering Blog, Machine Learning, Pandas UDF, Platform, SparkSQL, Streaming, Structured Streaming, Unified Analytics Platform

We are excited to announce the availability of Apache Spark 2.4 on Databricks as part of the Databricks Runtime 5.0. We want to thank the Apache Spark community for all their valuable contributions to the Spark 2.4 release. Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.4 extends its scheduler to […]