Productionizing Machine Learning: From Deployment to Drift Detection

Posted Leave a commentPosted in AI, Company Blog, Data and ML Industry Use Case, Data Science and Machine Learning, Machine Learning, Machine Learning Life Cycle, MLflow, Model Drift, Product, Tutorials

Try this notebook to reproduce the steps outlined below and watch our on-demand webinar to learn more. In many literature and blogs, a machine learning workflow starts with data prep and ends with deploying a model to production. But in reality, that’s just the beginning of the lifecycle of a machine learning model. As they say, […]

Guest Blog: Using Databricks, MLflow, and Amazon SageMaker at Brandless to Bring Recommendation Systems to Production

Posted Leave a commentPosted in Company Blog, Customer Stories, Customers, Data Science and Machine Learning, Lifecycle, Machine Learning, MLflow, Operationalization, Partners, Product, Recommendation System

This is a guest blog from Adam Barnhard, Head of Data at Brandless, Inc., and Bing  Liang, Data Scientist at Brandless, Inc. Launched in July 2017, Brandless makes hundreds of high-quality items, curated for every member of your family and room of your home, and all sold at more accessible price points than similar products on the market. We […]

Diving Into Delta Lake: Unpacking The Transaction Log

Posted Leave a commentPosted in Apache Spark, Company Blog, Customers, Data Engineering, Delta Lake, Ecosystem, Education, Engineering Blog, Internals, Optimistic Concurrency Control, Platform, Product, Transaction Log

The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this article, we’ll explore what the transaction log is, how it works at the file level, and how it offers […]

AutoML on Databricks: Augmenting Data Science from Data Prep to Operationalization

Posted Leave a commentPosted in Announcements, AutoML, Company Blog, Data Science, Data Science and Machine Learning, Databricks Labs, Engineering Blog, Hyperopt, Hyperparameter Tuning, Machine Learning, MLflow, Model Search, Product

Thousands of data science jobs are going unfilled today as global demand for the talent greatly outstrips supply. Every day, businesses pay the price of the data scientist shortage in missed opportunities and slow innovation. For organizations to realize the full potential of machine learning, data teams have to build hundreds of predictive models a […]

Network performance regressions from TCP SACK vulnerability fixes

Posted Leave a commentPosted in Company Blog, CVE-2019-11477, CVE-2019-11478, CVE-2019-11479, Network performance, Product, Security, TCP SACK vulnerability

On June 17, three vulnerabilities in Linux’s networking stack were published. The most severe one could allow remote attackers to impact the system’s availability. We believe in offering the most secure image available to our customers, so we quickly applied a kernel patch to address the issues. Since the kernel patch was applied, we have […]

Announcing Databricks Runtime 5.5 with Conda (Beta)

Posted Leave a commentPosted in Company Blog, Conda, Databricks Runtime, Ecosystem, Engineering Blog, Machine Learning, Platform, Product

Databricks is pleased to announce the release of Databricks Runtime 5.5 with Conda (Beta).  We introduced Databricks Runtime 5.4 with Conda (Beta), with the goal of making Python library and environment management very easy.  This release includes several important improvements and bug fixes as noted in the latest release notes [Azure|AWS].  We recommend all users […]

Announcing Databricks Runtime 5.4 – The Databricks Blog

Posted Leave a commentPosted in Announcements, Company Blog, Databricks Connect, Library Utilities, Product, Runtime, Runtime 5.4

Databricks is pleased to announce the release of Databricks Runtime 5.4.  This release includes Apache Spark 2.4.3 along with several important improvements and bug fixes .   We recommend all users upgrade to take advantage of this new runtime release.  This blog post gives a brief overview of some of the new high value features that […]

Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in ACID Transactions, Apache Spark, Company Blog, Delta Lake, Education, Engineering Blog, Financial Services, Product, Streaming, Structured Streaming, Time Travel, Unified Batch and Streaming Sync

On June 13th, we hosted a live webinar — Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark — with Junta Nakai, Industry Leader – Financial Services at Databricks, John O’Dwyer, Solution Architect at Databricks, and Denny Lee, Technical Product Marketing Manager at Databricks. This is the first webinar in a series of financial […]

Databricks Connect: Bringing the capabilities of hosted Apache Spark™ to applications and microservices

Posted Leave a commentPosted in Announcements, CoLab, Company Blog, Connect, Databricks Connect, Eclipse, Intellij, jupyter, Platform, Product, PyCharm, RStudio, Zeppelin

In this blog post we introduce Databricks Connect, a new library that allows you to leverage native Apache Spark APIs from any Notebook, IDE, or custom application. Overview Over the last several years, many custom application connectors have been written for Apache Spark. This includes tools like spark-submit, REST job servers, notebook gateways, and so […]

Announcing the MLflow 1.0 Release

Posted Leave a commentPosted in Announcements, Company Blog, Data Science, Ecosystem, Engineering Blog, Lifecycle, Machine Learning, MLflow, Model Management, Product

MLflow is an open source platform to help manage the complete machine learning lifecycle. With MLflow, data scientists can track and share experiments locally (on a laptop) or remotely (in the cloud), package and share models across frameworks, and deploy models virtually anywhere. Today we are excited to announce the release of MLflow 1.0. Since […]