Applying your Convolutional Neural Network: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Deep Learning, Ecosystem, Engineering Blog, Keras, Machine Learning, Neural Networks, Platform, TensorFlow

Try this notebook in Databricks On October 25th, we hosted a live webinar—Applying your Convolutional Neural Network—with Denny Lee, Technical Product Marketing Manager at Databricks. This is the third webinar of a free deep learning fundamental series from Databricks. In this webinar, we dived deeper into Convolutional Neural Networks (CNNs), a particular type of neural […]

Open Sourcing Databricks Integration Tools at Edmunds

Posted Leave a commentPosted in Apache Spark, Company Blog, Customers, Engineering Blog, Platform

This is a guest post from Shaun Elliott, Data Engineering Tech Lead and Sam Shuster, Staff Engineer at Edmunds. What is Databricks and How is it Useful for Edmunds? Databricks is a cloud-based, fully managed, big data and analytics processing platform that leverages Apache SparkTM and the JVM. The big selling point of the Databricks Unified […]

Introducing Apache Spark 2.4 – The Databricks Blog

Posted Leave a commentPosted in Apache Spark, Apache Spark 2.4, Databricks Runtime 5.0, Ecosystem, Engineering Blog, Machine Learning, Pandas UDF, Platform, SparkSQL, Streaming, Structured Streaming, Unified Analytics Platform

We are excited to announce the availability of Apache Spark 2.4 on Databricks as part of the Databricks Runtime 5.0. We want to thank the Apache Spark community for all their valuable contributions to the Spark 2.4 release. Continuing with the objectives to make Spark faster, easier, and smarter, Spark 2.4 extends its scheduler to […]

SQL Pivot: Converting Rows to Columns

Posted Leave a commentPosted in Apache Spark, DataFrames, Engineering Blog, Spark SQL, sql, Unified Analytics Platform

Try this notebook in Databricks Pivot was first introduced in Apache Spark 1.6 as a new DataFrame feature that allows users to rotate a table-valued expression by turning the unique values from one column into individual columns. The upcoming Apache Spark 2.4 release extends this powerful functionality of pivoting data to our SQL users as […]

Democratizing Cloud Infrastructure with Terraform and Jenkins

Posted Leave a commentPosted in Ecosystem, Engineering Blog, Infrastructure, Monitoring, Platform, Provisioning, Unified Analytics Platform

This blog post is part of our series of internal engineering blogs on the Databricks platform, infrastructure management, integration, tooling, monitoring, and provisioning. This summer at Databricks I designed and implemented a service for coordinating and deploying cloud provider infrastructure resources that significantly improved the velocity of operations on our self-managed cloud platform. The service […]

Simplifying Change Data Capture with Databricks Delta

Posted Leave a commentPosted in Apache Spark, CDC, Change Data Capture, Company Blog, Databricks Delta, Education, Engineering Blog, Product

A common use case that we run into at Databricks is that customers looking to perform change data capture (CDC) from one or many sources into a set of Databricks Delta tables. These sources may be on-premises or in the cloud, operational transactional stores, or data warehouses. The common glue that binds them all is […]

Training your Neural Network: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Deep Learning, Ecosystem, Engineering Blog, Keras, Machine Learning, Neural Networks, Platform, TensorFlow

Try this notebook in Databricks On October 9th, we hosted a live webinar—Training your Neural Network—on Data Science Central with Denny Lee, Technical Product Marketing Manager at Databricks. This is the second webinar of a free deep learning fundamental series from Databricks. In this webinar, we covered the principles for training your neural network including […]

Writing a faster Jsonnet compiler

Posted Leave a commentPosted in Engineering Blog

Databricks uses the Jsonnet configuration language for managing the configuration of its various services. We recently implemented our own Sjsonnet compiler for the language, in order to avoid Jsonnet’s performance pitfalls and greatly speed up development workflows at Databricks. What is Jsonnet? We have previously written up why we use Jsonnet here, but in short: […]

MLflow v0.7.0 Features New R API by RStudio

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Deep Learning, Ecosystem, Education, Engineering Blog, GPyOpt, Hyperopt, Java, Keras, Machine Learning, MLflow, multistep workflow, Partners, python, R, RStudio

Today, we’re excited to announce MLflow v0.7.0, released with new features, including a new MLflow R client API contributed by RStudio. A testament to MLflow’s design goal of an open platform with adoption in the community, RStudio’s contribution extends the MLflow platform to a larger R community of data scientists who use RStudio and R […]

What’s New for Apache Spark on Kubernetes in the Upcoming Apache Spark 2.4 Release

Posted Leave a commentPosted in Apache Spark, Ecosystem, Engineering Blog, Kubernetes

This is a community blog from Yinan Li, a software engineer at Google, working in the Kubernetes Engine team. He is part of the group of companies that have contributed to Kubernetes support in the upcoming Apache Spark 2.4. Since the Kubernetes cluster scheduler backend was initially introduced in Apache Spark 2.3, the community has […]