A Guide to Training Sessions at Spark + AI Summit, Europe

Posted Leave a commentPosted in Apache Spark, Company Blog, Data and ML Industry Use Case, Data and ML Research, Data Engineering, Data Science, Data Science and Machine Learning, data-science, Delta Lake, Education, Events, Keras, MLflow, Productionizing Machine Learning, PyTorch, Spark + AI Summit, Spark SQL, Structured Streaming, TensorFlow, training

Education and the pursuit of knowledge are lifelong journeys: they never complete; there is always something new to learn; a new professional certification to add to your credit; a knowledge gap to fill. Training at Spark + AI Summit, Europe is not only about becoming an Apache Spark expert. Nor is it only about being […]

Diving Into Delta Lake: Schema Enforcement & Evolution

Posted Leave a commentPosted in Apache Spark, Company Blog, Data Engineering, Delta Lake, Developer, Ecosystem, Education, Engineering Blog, Schema Enforcement, Schema Evolution

Try this notebook series in Databricks Think back to when you were in high school – so fresh and full of ideas. Since then, undoubtedly, both the world and the way you see things have changed in many ways, as you have gained new experiences. Data, like our experiences, is always evolving and accumulating. To […]

Engineering population scale Genome-Wide Association Studies with Apache Spark, Delta Lake, and MLflow

Posted Leave a commentPosted in AI, Apache Spark, Company Blog, Customers, Data and ML Industry Use Case, Data Engineering, Data Science and Machine Learning, Delta Lake, Education, Engineering Blog, genome sequencing, GWAS, Managed MLflow, MLflow

Try this notebook series in Databricks The advent of genome-wide association studies (GWAS) in the late 2000s enabled scientists to begin to understand the causes of complex diseases such as diabetes and Crohn’s disease at their most fundamental level. However, academic bioinformatics tools to perform GWAS have not kept pace with the growth of genomic […]

Using AutoML Toolkit to Automate Loan Default Predictions

Posted Leave a commentPosted in AI, AutoML, Company Blog, Data Science and Machine Learning, Developer, Education, Engineering Blog, Machine Leanring, Machine Learning, MLflow, XGBoost

Download the following notebooks and try the AutoML Toolkit today: Evaluating Risk for Loan Approvals using XGBoost (0.90) | Using AutoML Toolkit to Simplify Loan Risk Analysis XGBoost Model Optimization In a previous blog and notebook, Loan Risk Analysis with XGBoost, we explored the different stages of how to build a Machine Learning model to improve […]

Diving Into Delta Lake: Unpacking The Transaction Log

Posted Leave a commentPosted in Apache Spark, Company Blog, Customers, Data Engineering, Delta Lake, Ecosystem, Education, Engineering Blog, Internals, Optimistic Concurrency Control, Platform, Product, Transaction Log

The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this article, we’ll explore what the transaction log is, how it works at the file level, and how it offers […]

Productionizing Machine Learning with Delta Lake

Posted Leave a commentPosted in AI, Apache Spark, Company Blog, Data Engineering, Delta Lake, Ecosystem, Education, Engineering Blog, Machine Learning, Platform

Try out this notebook series in Databricks – part 1 (Delta Lake), part 2 (Delta Lake + ML) For many data scientists, the process of building and tuning machine learning models is only a small portion of the work they do every day. The vast majority of their time is spent doing the less-than-glamorous (but […]

Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in ACID Transactions, Apache Spark, Company Blog, Delta Lake, Education, Engineering Blog, Financial Services, Product, Streaming, Structured Streaming, Time Travel, Unified Batch and Streaming Sync

On June 13th, we hosted a live webinar — Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark — with Junta Nakai, Industry Leader – Financial Services at Databricks, John O’Dwyer, Solution Architect at Databricks, and Denny Lee, Technical Product Marketing Manager at Databricks. This is the first webinar in a series of financial […]

Detecting Bias with SHAP – The Databricks Blog

Posted Leave a commentPosted in Apache Spark, Bias, Deep Learning, Education, Engineering Blog, Machine Learning, MLflow, SHAP, Stack Overflow

StackOverflow’s annual developer survey concluded earlier this year, and they have graciously published the (anonymized) 2019 results for analysis. They’re a rich view into the experience of software developers around the world — what’s their favorite editor? how many years of experience? tabs or spaces? and crucially, salary. Software engineers’ salaries are good, and sometimes […]

New videos from Databricks Academy: Introduction to Natural Language Processing—Latent Semantic Analysis

Posted Leave a commentPosted in Announcements, Company Blog, Education

Databricks’ commitment to education is at the center of the work we do. Through Instructor-Led Training, Certification, and Self-Paced Training, Databricks Academy provides strong pathways for users to learn Apache Spark™ and Databricks to push their knowledge to the next level. Our latest offering is a series of short videos introducing the Natural Language Processing […]

Efficient Databricks Deployment Automation with Terraform

Posted Leave a commentPosted in CI/CD, cloud automation, Company Blog, Customers, Ecosystem, Education, Engineering Blog, Platform

Managing cloud infrastructure and provisioning resources can be a headache that DevOps engineers are all too familiar with. Even the most capable cloud admins can get bogged down with managing a bewildering number of interconnected cloud resources – including data streams, storage, compute power, and analytics tools. Take, for example, the following scenario: a customer […]