A Guide to MLflow Talks at Spark + AI Summit 2019

Posted Leave a commentPosted in Company Blog, Events, Machine Learning, MLflow, Open Source, Product, Spark + AI Summit

In less than a year, MLflow has reached almost 500K monthly downloads, and gathered over 80 code contributors and 40 contributing organizations, confirming the need for an open source approach to help standardize the machine learning lifecycle across tools, teams, and processes. We are thrilled to host some of our key contributors and customers next […]

Managing the Complete Machine Learning Lifecycle: On-Demand Webinar now available!

Posted Leave a commentPosted in Company Blog, Data Science, Ecosystem, Education, Machine Learning, Managed MLflow, MLflow, Model Management, Open Source, Product, Webinar

On March 7th, our team hosted a live webinar—Managing the Complete Machine Learning Lifecycle—with Andy Konwinski, Co-Founder and VP of Product at Databricks. In this webinar, we walked you through how MLflow, an open source framework for the complete Machine Learning lifecycle, helps solve for challenges around experiment tracking, reproducible projects and model deployment. Specifically, […]

Databricks Runtime 5.3 ML Now Generally Available

Posted Leave a commentPosted in Announcements, Company Blog, Data Science, Databricks Runtime 5.3 ML, Deep Learning, Ecosystem, Engineering Blog, Machine Learning, Product

We are excited to announce the general availability (GA) of Databricks Runtime for Machine Learning, as part of the release of Databricks Runtime 5.3 ML. Built on top of Databricks Runtime, Databricks Runtime ML is the optimized runtime for developing ML/DL applications in Databricks. It offers native integration with popular ML/DL frameworks, such as scikit-learn, […]

Improving Data Management and Analytics in the Federal Government

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Engineering Blog, Federal Government, government, Platform, Product, Public Sector, Unified analytics, Warehouse

From Static Data Warehouse to Scalable Insights and AI On-Demand Government agencies today are dealing with a wider variety of data at a much larger scale. From satellite imagery to sensor data to citizen records, petabytes of semi- and unstructured data is collected each day. Unfortunately, traditional data warehouses are failing to provide government agencies […]

Efficient Upserts into Data Lakes with Databricks Delta

Posted Leave a commentPosted in Announcements, Change Data Capture, Company Blog, Databricks Delta, Delta, Merge, Product

Simplify building big data pipelines for change data capture (CDC) and GDPR use cases. Databricks Delta, the next-generation unified analytics engine built on top of Apache Spark™, now supports the MERGE command, which allows you to efficiently upsert and delete records in your data lakes. MERGE dramatically simplifies how a number of common data pipelines […]

Simple Steps to Distributed Deep Learning: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Deep Learning, Horovod, HorovodRunner, Keras, Machine Learning, Platform, Product, PyTorch, TensorFlow

Try this notebook in Databricks On February 12th, we hosted a live webinar—Simple Steps to Distributed Deep Learning on Databricks—with Yifan Cao, Senior Product Manager, Machine Learning and Bago Amirbekian, Machine Learning Software engineer at Databricks. In this webinar, we covered some of the latest innovations brought into the Databricks Unified Analytics Platform for Machine […]

Managed MLflow on Databricks now in public preview

Posted Leave a commentPosted in Announcements, Company Blog, Data Science, Ecosystem, Engineering Blog, Machine Learning, Managed MLflow, MLflow, Platform, Product

Try this tutorial in Databricks Building production machine learning applications is challenging because there is no standard way to record experiments, ensure reproducible runs, and manage and deploy models. To address these challenges, last June we introduced MLflow, an open source platform to manage the ML lifecycle that works with any machine learning library and […]

New Databricks Delta Features Simplify Data Pipelines

Posted Leave a commentPosted in Announcements, Company Blog, Data Engineering, Data Lakes, Databricks Delta, Product, Time Travel, Unified Analytics Engine

Continued Innovation and Expanded Availability for the Next-gen Unified Analytics Engine Databricks Delta the next generation unified analytics engine, built on top of Apache Spark, and aimed at helping data engineers build robust production data pipelines at scale is continuing to make strides. Already a powerful approach to building data pipelines, new capabilities and performance […]

How to Work with Avro, Kafka, and Schema Registry in Databricks

Posted Leave a commentPosted in Apache Avro, Apache Kafka, Apache Spark, Company Blog, DBR 5.2, Ecosystem, Engineering Blog, Product, Streaming, Structured Streaming

In the previous blog post, we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can use it to build streaming data pipelines with the from_avro and to_avro functions. Apache Kafka and Apache Avro are commonly used to build a scalable and near-real-time data pipeline. In this blog post, […]

Near-Real-Time Hardware Failure Rate Estimation with Bayesian Reasoning

Posted Leave a commentPosted in Education, Machine Learning, Product

Try this notebook in Databricks You might be using Bayesian techniques in your data science without knowing it! And if you’re not, then it could enhance the power of your analysis. This blog follows the introduction to Bayesian reasoning on Data Science Central, and will demonstrate how these ideas can improve a real-world use case: […]