Announcing Databricks Runtime 5.2 – The Databricks Blog

Posted Leave a commentPosted in Announcements, Engineering Blog, Product

We are excited to announce the release of Databricks Runtime 5.2 which introduces several new features including the following: Delta Time Travel Fast Parquet Import Databricks Advisor Let’s unpack each of these features in more detail: Delta Time Travel Time Travel, released as an Experimental feature, adds the ability to query a snapshot of a table […]

Accelerating Machine Learning on Databricks: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Company Blog, Data Science, Databricks Runtime, Deep Learning, Ecosystem, Engineering Blog, Horovod, HorovodRunner, Keras, Machine Learning, MLflow, Platform, Product, TensorFlow

Try this notebook in Databricks On January 15th, we hosted a live webinar—Accelerating Machine Learning on Databricks—with Adam Conway, VP of Product Management, Machine Learning, at Databricks and Hossein Falaki, Software Development Engineer and Data Scientist at Databricks. In this webinar, we covered some of the latest innovations brought into the Databricks Unified Analytics Platform […]

Apparate: Managing Libraries in Databricks with CI/CD

Posted Leave a commentPosted in Apache Spark, apparate, CI/CD, continuous delivery, continuous integration, Continuous Processing, Customers, Education, Partners, Product

This is a guest blog from Hanna Torrence, Data Scientist at ShopRunner. Introduction As leveraging data becomes a more vital component of organizations’ tech stacks, it becomes increasingly important for data teams to make use of software engineering best-practices. The Databricks platform provides excellent tools for exploratory Apache Spark workflows in notebooks as well as […]

CIO Survey: Top 3 Challenges Adopting AI and How to Overcome Them

Posted Leave a commentPosted in AI, Announcements, CIO, Company Blog, data, Events, ML, Product, Survey, Unified analytics

  We recently hosted the webinar — CIO Survey: Enterprise Challenges to AI and How to Overcome Them — featuring Jen Garofalo, Research Director at IDG, the parent company to CIO.com, and Pat McDonough, VP of Customer Success at Databricks. This webinar covered key findings from a recent CIO.com survey of 200 executives on the […]

Announcing Databricks Runtime 5.0 – The Databricks Blog

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Product

We’re excited to announce the general availability of Databricks Runtime 5.0. Included in this release is Spark 2.4. This release offers substantial performance increases within key areas of the platform. Benchmarking workloads have shown a 16% improvement in total execution time and Databricks Delta benefits from substantial improvements to metadata caching, improving query latency by […]

Simplifying Change Data Capture with Databricks Delta

Posted Leave a commentPosted in Apache Spark, CDC, Change Data Capture, Company Blog, Databricks Delta, Education, Engineering Blog, Product

A common use case that we run into at Databricks is that customers looking to perform change data capture (CDC) from one or many sources into a set of Databricks Delta tables. These sources may be on-premises or in the cloud, operational transactional stores, or data warehouses. The common glue that binds them all is […]

Simplify Market Basket Analysis using FP-growth on Databricks

Posted Leave a commentPosted in Apache Spark, Company Blog, Education, Engineering Blog, FP-Growth, Machine Learning, Market Basket Analysis, Product

Try this notebook in Databricks When providing recommendations to shoppers on what to purchase, you are often looking for items that are frequently purchased together (e.g. peanut butter and jelly). A key technique to uncover associations between different items is known as market basket analysis. In your recommendation engine toolbox, the association rules generated by […]

Identify Suspicious Behavior in Video with Databricks Runtime for Machine Learning

Posted Leave a commentPosted in Apache Spark, Company Blog, Deep Learning, Deep Learning Pipelines, Education, Engineering Blog, Machine Learning, OpenCV, Platform, Product, TensorFlow, Video Analytics

Try this notebook series in Databricks With the exponential growth of cameras and visual recordings, it is becoming increasingly important to operationalize and automate the process of video identification and categorization. Applications ranging from identifying the correct cat video to visually categorizing objects are becoming more prevalent.  With millions of users around the world generating […]

MLflow On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Data Science, Deep Learning, Ecosystem, Engineering Blog, Machine Learning, MLflow, Model Management, Platform, Product, Unified Analytics Platform

On August 30th, our team hosted a live webinar—Introducing MLflow: Infrastructure for a complete Machine Learning lifecycle—with Matei Zaharia, Co-Founder and Chief Technologist at Databricks. In this webinar, we walked you through MLflow, a new open source project from Databricks that aims to design an open ML platform where organizations can use any ML library […]

Introducing Cluster-scoped init scripts – The Databricks Blog

Posted Leave a commentPosted in Cluster Management, DBFS, Engineering Blog, Platform, Product, Summer Internship, Unified Analytics Platform

Introduction This summer, I worked at Databricks as a software engineering intern on the Clusters team. As part of my internship project, I designed and implemented Cluster-scoped init scripts, improving scalability and ease of use. In this blog, I will discuss various benefits of Cluster-scoped init scripts, followed by my internship experience at Databricks, and […]