Databricks Connect: Bringing the capabilities of hosted Apache Spark™ to applications and microservices

Posted Leave a commentPosted in Announcements, CoLab, Company Blog, Connect, Databricks Connect, Eclipse, Intellij, jupyter, Platform, Product, PyCharm, RStudio, Zeppelin

In this blog post we introduce Databricks Connect, a new library that allows you to leverage native Apache Spark APIs from any Notebook, IDE, or custom application. Overview Over the last several years, many custom application connectors have been written for Apache Spark. This includes tools like spark-submit, REST job servers, notebook gateways, and so […]

Enhanced Hyperparameter Tuning and Optimized AWS Storage with Databricks Runtime 5.4 ML

Posted Leave a commentPosted in Announcements, AutoML, Company Blog, Data Science, Databricks Runtime 5.4 ML, Deep Learning, Ecosystem, Engineering Blog, Hyperopt, Hyperparameter Tuning, Machine Learning, MLflow, MLlib, Platform, Product

We are excited to announce the release of Databricks Runtime 5.4 ML (Azure | AWS). This release includes two Public Preview features to improve data science productivity, optimized storage in AWS for developing distributed applications, and a number of Python library upgrades. To get started, you simply select the Databricks Runtime 5.4 ML from the […]

Efficient Databricks Deployment Automation with Terraform

Posted Leave a commentPosted in CI/CD, cloud automation, Company Blog, Customers, Ecosystem, Education, Engineering Blog, Platform

Managing cloud infrastructure and provisioning resources can be a headache that DevOps engineers are all too familiar with. Even the most capable cloud admins can get bogged down with managing a bewildering number of interconnected cloud resources – including data streams, storage, compute power, and analytics tools. Take, for example, the following scenario: a customer […]

Detecting Financial Fraud at Scale with Decision Trees and MLflow on  Databricks

Posted Leave a commentPosted in Apache Spark, Company Blog, Decision tree, Education, Engineering Blog, financial, Financial Markets, Financial Services, Fraud, Fraud Detection, Machine Leanring, Machine Learning, Platform

Try this notebook in Databricks Detecting fraudulent patterns at scale is a challenge, no matter the use case. The massive amounts of data to sift through, the complexity of the constantly evolving techniques, and the very small number of actual examples of fraudulent behavior are comparable to finding a needle in a haystack while not […]

Understanding Dynamic Time Warping – The Databricks Blog

Posted Leave a commentPosted in Apache Spark, Company Blog, Dynamic Time Warping, Education, Engineering Blog, Machine Learning, Platform

Try this notebook in Databricks This blog is part 1 of our two-part series Using Dynamic Time Warping and MLflow to Detect Sales Trends. To go to part 2, go to Using Dynamic Time Warping and MLflow to Detect Sales Trends. The phrase “dynamic time warping,” at first read, might evoke images of Marty McFly […]

Using Dynamic Time Warping and MLflow to Detect Sales Trends

Posted Leave a commentPosted in Apache Spark, Company Blog, Dynamic Time Warping, Education, Engineering Blog, Machine Learning, MLflow, Platform

Try this notebook series in Databricks This blog is part 2 of our two-part series Using Dynamic Time Warping and MLflow to Detect Sales Trends.  The phrase “dynamic time warping,” at first read, might evoke images of Marty McFly driving his DeLorean at 88 MPH in the Back to the Future series. Alas, dynamic time warping does […]

Introducing MLflow Run Sidebar in Databricks Notebooks

Posted Leave a commentPosted in Announcements, Company Blog, Engineering Blog, Machine Learning, Managed MLflow, MLflow, Platform, Sidebar

At Spark+AI Summit 2019, we announced the GA of Managed MLflow on Databricks in which we take the latest and greatest of open source MLflow and make it easily accessible to all users of Databricks. In that blog post, we promised to build features which bridge Databricks and MLflow concepts to create a seamless integration […]

Announcing General Availability of Managed MLflow on Databricks

Posted Leave a commentPosted in Announcements, Company Blog, Ecosystem, Engineering Blog, Machine Learning, Managed MLflow, MLflow, Platform, Product

Try this tutorial in Databricks MLflow is an open source platform to help manage the complete machine learning lifecycle. With MLflow, data scientists can track and share experiments locally or in the cloud, package and share models across frameworks, and deploy models virtually anywhere. Today at the Spark + AI Summit, we announced the General […]

Improving Data Management and Analytics in the Federal Government

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Engineering Blog, Federal Government, government, Platform, Product, Public Sector, Unified analytics, Warehouse

From Static Data Warehouse to Scalable Insights and AI On-Demand Government agencies today are dealing with a wider variety of data at a much larger scale. From satellite imagery to sensor data to citizen records, petabytes of semi- and unstructured data is collected each day. Unfortunately, traditional data warehouses are failing to provide government agencies […]

MLflow v0.9.0 Features SQL Backend, Projects in Docker, and Customization in Python Models

Posted Leave a commentPosted in Apache Spark, docker, Engineering Blog, Machine Learning, Machine Learning Life Cycle, MLflow, Platform, python, SQLAlchemy

MLflow v0.9.0 was released this week. It introduces a set of new features and community contributions, including SQL store for tracking server, support for MLflow projects in Docker containers, and simple customization in Python models. Additionally, this release adds a plugin scheme to customize MLflow backend store for tracking and artifacts. Now available on PyPI […]