Efficient Databricks Deployment Automation with Terraform

Posted Leave a commentPosted in CI/CD, cloud automation, Company Blog, Customers, Ecosystem, Education, Engineering Blog, Platform

Managing cloud infrastructure and provisioning resources can be a headache that DevOps engineers are all too familiar with. Even the most capable cloud admins can get bogged down with managing a bewildering number of interconnected cloud resources – including data streams, storage, compute power, and analytics tools. Take, for example, the following scenario: a customer […]

Announcing General Availability of Managed MLflow on Databricks

Posted Leave a commentPosted in Announcements, Company Blog, Ecosystem, Engineering Blog, Machine Learning, Managed MLflow, MLflow, Platform, Product

Try this tutorial in Databricks MLflow is an open source platform to help manage the complete machine learning lifecycle. With MLflow, data scientists can track and share experiments locally or in the cloud, package and share models across frameworks, and deploy models virtually anywhere. Today at the Spark + AI Summit, we announced the General […]

Koalas: Easy Transition from pandas to Apache Spark

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Data Science, Ecosystem, Education, Engineering Blog, Machine Learning, Open Source, Pandas, python

Today at Spark + AI Summit, we announced Koalas, a new open source project that augments PySpark’s DataFrame API to make it compatible with pandas. Python data science has exploded over the past few years and pandas has emerged as the lynchpin of the ecosystem. When data scientists get their hands on a data set, […]

Managing the Complete Machine Learning Lifecycle: On-Demand Webinar now available!

Posted Leave a commentPosted in Company Blog, Data Science, Ecosystem, Education, Machine Learning, Managed MLflow, MLflow, Model Management, Open Source, Product, Webinar

On March 7th, our team hosted a live webinar—Managing the Complete Machine Learning Lifecycle—with Andy Konwinski, Co-Founder and VP of Product at Databricks. In this webinar, we walked you through how MLflow, an open source framework for the complete Machine Learning lifecycle, helps solve for challenges around experiment tracking, reproducible projects and model deployment. Specifically, […]

Databricks Runtime 5.3 ML Now Generally Available

Posted Leave a commentPosted in Announcements, Company Blog, Data Science, Databricks Runtime 5.3 ML, Deep Learning, Ecosystem, Engineering Blog, Machine Learning, Product

We are excited to announce the general availability (GA) of Databricks Runtime for Machine Learning, as part of the release of Databricks Runtime 5.3 ML. Built on top of Databricks Runtime, Databricks Runtime ML is the optimized runtime for developing ML/DL applications in Databricks. It offers native integration with popular ML/DL frameworks, such as scikit-learn, […]

Introducing Brickchain: Planet-scale Unified Analytics

Posted Leave a commentPosted in Apache Spark, Ecosystem, Engineering Blog, Machine Learning, Unified Analytics Engine

Today we are excited to announce Brickchain, the next generation technology for zettabyte-scale analytics, by harnessing all the compute power on the planet. Brickchain is the most scalable, secure, and collaborative data technology ever invented. As you may know, Databricks was founded by the original creators of Apache Spark, a unified analytics engine that uses […]

Managed MLflow on Databricks now in public preview

Posted Leave a commentPosted in Announcements, Company Blog, Data Science, Ecosystem, Engineering Blog, Machine Learning, Managed MLflow, MLflow, Platform, Product

Try this tutorial in Databricks Building production machine learning applications is challenging because there is no standard way to record experiments, ensure reproducible runs, and manage and deploy models. To address these challenges, last June we introduced MLflow, an open source platform to manage the ML lifecycle that works with any machine learning library and […]

How to Work with Avro, Kafka, and Schema Registry in Databricks

Posted Leave a commentPosted in Apache Avro, Apache Kafka, Apache Spark, Company Blog, DBR 5.2, Ecosystem, Engineering Blog, Product, Streaming, Structured Streaming

In the previous blog post, we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can use it to build streaming data pipelines with the from_avro and to_avro functions. Apache Kafka and Apache Avro are commonly used to build a scalable and near-real-time data pipeline. In this blog post, […]

Accelerating Machine Learning on Databricks: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Company Blog, Data Science, Databricks Runtime, Deep Learning, Ecosystem, Engineering Blog, Horovod, HorovodRunner, Keras, Machine Learning, MLflow, Platform, Product, TensorFlow

Try this notebook in Databricks On January 15th, we hosted a live webinar—Accelerating Machine Learning on Databricks—with Adam Conway, VP of Product Management, Machine Learning, at Databricks and Hossein Falaki, Software Development Engineer and Data Scientist at Databricks. In this webinar, we covered some of the latest innovations brought into the Databricks Unified Analytics Platform […]