Speedy Scala Builds with Bazel at Databricks

Posted Leave a commentPosted in Apache Spark, Bazel, Company Blog, Databricks, Engineering Blog, SBT, Scala

Databricks migrated over from the standard Scala Build Tool (SBT) to using the Bazel to build, test and deploy our Scala code. Through improvements in our build infrastructure, Scala compilation workflows that previously took minutes to tens of minutes now complete in seconds. This post will walk you through the improvements we made to achieve […]

A Guide to Developer, Deep Dive, and Continuous Streaming Applications Talks at Spark + AI Summit

Posted Leave a commentPosted in Apache Spark, Company Blog, Databricks, Databricks Delta, Education, Events, Spark + AI Summit, Spark SQL, Spark Training, Structured Streaming

In January 2013 when Stephen O’Grady, an analyst at RedMonk, published “The New Kingmakers: How Developers Conquered the World,” the book’s central argument (then and still now) universally resonated with an emerging open-source community. He convincingly charts developers’ movement “out of the shadows and into the light as new influencers on society’s [technical landscape].” Using […]

100x Faster Bridge between Apache Spark and R with User-Defined Functions on Databricks

Posted Leave a commentPosted in Apache Spark, Databricks, Engineering Blog, Machine Learning, R, SparkR, Unified Analytics Platform

SparkR User-Defined Function (UDF) API opens up opportunities for big data workloads running on Apache Spark to embrace R’s rich package ecosystem. Some of our customers that have R experts on board use SparkR UDF API to blend R’s sophisticated packages into their ETL pipeline, applying transformations that go beyond Spark’s built-in functions on the […]

Announcing Databricks Runtime 4.2! – The Databricks Blog

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Customers, Databricks, Delta, Engineering Blog, Platform, Product, Runtime, Streaming

We’re excited to announce Databricks Runtime 4.2, powered by Apache Spark™.  Version 4.2 includes updated Spark internals, new features, and major performance upgrades to Databricks Delta, as well as general quality improvements to the platform.  We are moving quickly toward the Databricks Delta general availability (GA) release and we recommend you upgrade to Databricks Runtime […]

Sharing R Notebooks using RMarkdown

Posted Leave a commentPosted in Announcements, Company Blog, Databricks, Engineering Blog, Machine Learning, Partners, Product, RStudio

At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics Platform. You can try it out now with this RMarkdown notebook (Rmd | HTML) or visit us at www.databricks.com/rstudio. IntroductionDatabricks Unified Analytics Platform now supports RStudio Server (press release). Users often ask if they can move notebooks between RStudio and Databricks workspace using RMarkdown […]

Announcing RStudio and Databricks Integration

Posted Leave a commentPosted in Announcements, Company Blog, Databricks, Engineering Blog, Machine Learning, Partners, Product, RStudio

At Databricks, we are thrilled to announce the integration of RStudio with the Databricks Unified Analytics Platform. You can try it out now at with this notebook (Rmd | HTML) or visit us at www.databricks.com/rstudio. For R practitioners looking at scaling out R-based advanced analytics to big data, Databricks provides a Unified Analytics Platform that […]

Accelerating Innovation With Unified Analytics

Posted Leave a commentPosted in Announcements, Company Blog, Databricks, Databricks Runtime, Machine Learning, Open Source, Platform, Product, Unified Analytics Platform

The AI Dilemma Artificial Intelligence (AI) has massive potential to drive disruptive innovations affecting most enterprises on the planet. However, most enterprises are struggling to succeed with AI​. Why is that? Simply put, AI and Data are siloed in different systems and different organizations.​ Enterprise data is siloed across hundreds of systems such as data […]

Introducing Databricks Optimized Auto-Scaling – The Databricks Blog

Posted Leave a commentPosted in Announcements, Auto-scaling, autoscaling, Company Blog, Data Engineering, Databricks, Engineering Blog, Product

Databricks is thrilled to announce our new optimized auto-scaling feature. The new Apache Spark™-aware resource manager leverages Spark shuffle and executor statistics to resize a cluster intelligently, improving resource utilization. When we tested long-running big data workloads, we observed cloud cost savings of up to 30%. What’s the problem with current state-of-the-art auto-scaling approaches? Today, […]

Women in Big Data and Apache Spark: Bay Area Apache Spark Meetup Summary

Posted Leave a commentPosted in Apache Spark, Company Blog, Data Visualization, Databricks, DevOps, Events, WiBD

In collaboration with the local chapter of Women in Big Data Meetup and our continuing effort by Databricks diversity team to have more women in the big data space as speakers to share their subject matter expertise, we hosted our second meetup with a diverse and highly-accomplished women in their respective technical fields as speakers […]

Introducing Click: The Command Line Interactive Controller for Kubernetes

Posted Leave a commentPosted in Databricks, Engineering Blog, Infrastructure, Kubernetes, Open Source, Platform, Unified Analytics Platform

Click is an open-source tool that lets you quickly and easily run commands against Kubernetes resources, without copy/pasting all the time, and that easily integrates into your existing command line workflows. At Databricks we use Kubernetes, a lot. We deploy our services (of which there are many) in unique namespaces, across multiple clouds, in multiple […]