Introducing Built-in Image Data Source in Apache Spark 2.4

Posted Leave a commentPosted in Apache Spark, Data Source, Databricks Runtime, DataFrames, Deep Learning Pipelines, Ecosystem, Engineering Blog, Machine Learning

Introduction With recent advances in deep learning frameworks for image classification and object detection, the demand for standard image processing in Apache Spark has never been greater. Image handling and preprocessing have their specific challenges – for example, images come in different formats (eg., jpeg, png, etc.), sizes, and color schemes, and there is no […]

Scalable End-to-End Deep Learning using TensorFlow™ and Databricks: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Apache Spark, Data Science, Databricks Runtime, Deep Learning, Ecosystem, Engineering Blog, Horovod, Machine Learning, Platform, Product, python, TensorFlow

On July 9th, our team hosted a live webinar—Scalable End-to-End Deep Learning using TensorFlow™ and Databricks—with Brooke Wenig, Data Science Solutions Consultant at Databricks and Sid Murching, Software Engineer at Databricks. In this webinar, we walked you through how to use TensorFlow™ and Horovod (an open-source library from Uber to simplify distributed model training) on […]

Announcing Databricks Runtime for Machine Learning

Posted Leave a commentPosted in Announcements, Company Blog, Databricks Runtime, Deep Learning, Engineering Blog, GPU, Machine Learning, Product, TensorFlow

Databricks is thrilled to announce the Databricks Runtime for Machine Learning, including ready-to-use Machine Learning frameworks, simplified distributed training, and GPU Support. Register for our upcoming webinar to learn more. Today more than ever, data scientists and Machine Learning practitioners have the opportunity to transform their business by implementing sophisticated models for recommendation engines, ads targeting, […]

Accelerating Innovation With Unified Analytics

Posted Leave a commentPosted in Announcements, Company Blog, Databricks, Databricks Runtime, Machine Learning, Open Source, Platform, Product, Unified Analytics Platform

The AI Dilemma Artificial Intelligence (AI) has massive potential to drive disruptive innovations affecting most enterprises on the planet. However, most enterprises are struggling to succeed with AI​. Why is that? Simply put, AI and Data are siloed in different systems and different organizations.​ Enterprise data is siloed across hundreds of systems such as data […]

Announcing Databricks Runtime 4.1 – The Databricks Blog

Posted Leave a commentPosted in Company Blog, Databricks Runtime, Engineering Blog, Product

We have recently shipped the new Databricks Runtime version 4.1 powered by Apache Spark™. Version 4.1 brings improved performance on read/write from sources like S3 or Parquet, improved caching, and a great deal of quality and feature improvements for the preview of Databricks Delta focused on faster query execution and adaptive schema and type validation. […]

5 Reasons to Attend Spark + AI Summit

Posted Leave a commentPosted in Apache Spark, Company Blog, Databricks Runtime, Deep Learning, Events, Machine Learning, Spark + AI Summit

Spark + AI Summit will be held in San Francisco on June 4-6, 2018. Check out the full agenda and get your ticket before it sells out! Register today with the discount code 5Reasons and get 15% off. Convergence of Knowledge For any Apache Spark enthusiast, these summits are the convergence of Spark knowledge. Used […]

Introducing Stream-Stream Joins in Apache Spark 2.3

Posted Leave a commentPosted in Apache Spark, Databricks Runtime, Engineering Blog, Streaming, Structured Streaming

Since we introduced Structured Streaming in Apache Spark 2.0, it has supported joins (inner join and some type of outer joins) between a streaming and a static DataFrame/Dataset. With the release of Apache Spark 2.3.0, now available in Databricks Runtime 4.0 as part of Databricks Unified Analytics Platform, we now support stream-stream joins. In this […]

Introducing Apache Spark 2.3 – The Databricks Blog

Posted Leave a commentPosted in Apache Spark, Databricks Runtime, Engineering Blog, Machine Learning, Streaming

Today we are happy to announce the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0. We want to thank the Apache Spark community for all their valuable contributions to Spark 2.3 release. Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.3 marks a major milestone […]