Koalas: Easy Transition from pandas to Apache Spark

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Data Science, Ecosystem, Education, Engineering Blog, Machine Learning, Open Source, Pandas, python

Today at Spark + AI Summit, we announced Koalas, a new open source project that augments PySpark’s DataFrame API to make it compatible with pandas. Python data science has exploded over the past few years and pandas has emerged as the lynchpin of the ecosystem. When data scientists get their hands on a data set, […]

A Guide to MLflow Talks at Spark + AI Summit 2019

Posted Leave a commentPosted in Company Blog, Events, Machine Learning, MLflow, Open Source, Product, Spark + AI Summit

In less than a year, MLflow has reached almost 500K monthly downloads, and gathered over 80 code contributors and 40 contributing organizations, confirming the need for an open source approach to help standardize the machine learning lifecycle across tools, teams, and processes. We are thrilled to host some of our key contributors and customers next […]

Managing the Complete Machine Learning Lifecycle: On-Demand Webinar now available!

Posted Leave a commentPosted in Company Blog, Data Science, Ecosystem, Education, Machine Learning, Managed MLflow, MLflow, Model Management, Open Source, Product, Webinar

On March 7th, our team hosted a live webinar—Managing the Complete Machine Learning Lifecycle—with Andy Konwinski, Co-Founder and VP of Product at Databricks. In this webinar, we walked you through how MLflow, an open source framework for the complete Machine Learning lifecycle, helps solve for challenges around experiment tracking, reproducible projects and model deployment. Specifically, […]

Introducing MLflow: an Open Source Machine Learning Platform

Posted Leave a commentPosted in Announcements, Company Blog, Ecosystem, Engineering Blog, Machine Learning, MLflow, Open Source, Product

Everyone who has tried to do machine learning development knows that it is complex. Beyond the usual concerns in the software development, machine learning (ML) development comes with multiple new challenges. At Databricks, we work with hundreds of companies using ML, and we have repeatedly heard the same concerns: There are a myriad tools. Hundreds […]

Accelerating Innovation With Unified Analytics

Posted Leave a commentPosted in Announcements, Company Blog, Databricks, Databricks Runtime, Machine Learning, Open Source, Platform, Product, Unified Analytics Platform

The AI Dilemma Artificial Intelligence (AI) has massive potential to drive disruptive innovations affecting most enterprises on the planet. However, most enterprises are struggling to succeed with AI​. Why is that? Simply put, AI and Data are siloed in different systems and different organizations.​ Enterprise data is siloed across hundreds of systems such as data […]

Introducing Click: The Command Line Interactive Controller for Kubernetes

Posted Leave a commentPosted in Databricks, Engineering Blog, Infrastructure, Kubernetes, Open Source, Platform, Unified Analytics Platform

Click is an open-source tool that lets you quickly and easily run commands against Kubernetes resources, without copy/pasting all the time, and that easily integrates into your existing command line workflows. At Databricks we use Kubernetes, a lot. We deploy our services (of which there are many) in unique namespaces, across multiple clouds, in multiple […]