Spark + AI Summit 2019 Product Announcements and Recap. Watch the keynote recordings today!

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Delta Lake, Events, Koalas, MLflow, Product, Spark + AI Summit

Spark + AI Summit 2019, the world’s largest data and machine learning conference for the Apache Spark™ Community, brought nearly 5000 data scientists, engineers, and business leaders to San Francisco’s Moscone Center to find out what’s coming next. Watch the keynote recordings today and learn more about the latest product announcements for Apache Spark, MLflow, […]

Detecting Financial Fraud at Scale with Decision Trees and MLflow on  Databricks

Posted Leave a commentPosted in Apache Spark, Company Blog, Decision tree, Education, Engineering Blog, financial, Financial Markets, Financial Services, Fraud, Fraud Detection, Machine Leanring, Machine Learning, Platform

Try this notebook in Databricks Detecting fraudulent patterns at scale is a challenge, no matter the use case. The massive amounts of data to sift through, the complexity of the constantly evolving techniques, and the very small number of actual examples of fraudulent behavior are comparable to finding a needle in a haystack while not […]

Understanding Dynamic Time Warping – The Databricks Blog

Posted Leave a commentPosted in Apache Spark, Company Blog, Dynamic Time Warping, Education, Engineering Blog, Machine Learning, Platform

Try this notebook in Databricks This blog is part 1 of our two-part series Using Dynamic Time Warping and MLflow to Detect Sales Trends. To go to part 2, go to Using Dynamic Time Warping and MLflow to Detect Sales Trends. The phrase “dynamic time warping,” at first read, might evoke images of Marty McFly […]

Using Dynamic Time Warping and MLflow to Detect Sales Trends

Posted Leave a commentPosted in Apache Spark, Company Blog, Dynamic Time Warping, Education, Engineering Blog, Machine Learning, MLflow, Platform

Try this notebook series in Databricks This blog is part 2 of our two-part series Using Dynamic Time Warping and MLflow to Detect Sales Trends.  The phrase “dynamic time warping,” at first read, might evoke images of Marty McFly driving his DeLorean at 88 MPH in the Back to the Future series. Alas, dynamic time warping does […]

Koalas: Easy Transition from pandas to Apache Spark

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Data Science, Ecosystem, Education, Engineering Blog, Machine Learning, Open Source, Pandas, python

Today at Spark + AI Summit, we announced Koalas, a new open source project that augments PySpark’s DataFrame API to make it compatible with pandas. Python data science has exploded over the past few years and pandas has emerged as the lynchpin of the ecosystem. When data scientists get their hands on a data set, […]

A Guide to Healthcare and Life Sciences Talks at Spark + AI Summit 2019

Posted Leave a commentPosted in Apache Spark, bioinformatics, Company Blog, Events, Genomics, healthcare, life sciences, Spark + AI Summit, Summit

Data and AI are ushering in a new era of precision medicine. The scale of the cloud, combined with advancements in machine learning, are enabling healthcare and life sciences organizations to use their mountains of data—such as electronic health records, genomics, real-world evidence, claims, and more—to drive innovation across the entire ecosystem, from accelerating drug […]

A Guide to Financial Services Talks at Spark + AI Summit 2019

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Events, Financial Services, Spark + AI Summit, Summit

The financial services industry is transforming rapidly. Customers today demand more personalized experiences, better returns on their investments and improved protection against fraud. As a result,  every bank, insurance company and institutional investor is turning towards big data and AI to meet these demands and outmaneuver the competition. Spark + AI Summit is the premier […]

Introducing Brickchain: Planet-scale Unified Analytics

Posted Leave a commentPosted in Apache Spark, Ecosystem, Engineering Blog, Machine Learning, Unified Analytics Engine

Today we are excited to announce Brickchain, the next generation technology for zettabyte-scale analytics, by harnessing all the compute power on the planet. Brickchain is the most scalable, secure, and collaborative data technology ever invented. As you may know, Databricks was founded by the original creators of Apache Spark, a unified analytics engine that uses […]

Improving Data Management and Analytics in the Federal Government

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Engineering Blog, Federal Government, government, Platform, Product, Public Sector, Unified analytics, Warehouse

From Static Data Warehouse to Scalable Insights and AI On-Demand Government agencies today are dealing with a wider variety of data at a much larger scale. From satellite imagery to sensor data to citizen records, petabytes of semi- and unstructured data is collected each day. Unfortunately, traditional data warehouses are failing to provide government agencies […]