Databricks Engineering Interns & Impact in Summer 2018

Posted Leave a commentPosted in Announcements, Company Blog, Education, Intern

Thanks to our awesome interns! This summer, our Engineering interns at Databricks did amazing work.  Our interns, working on teams from Developer Tools to Machine Learning, built features and improvements which are already impacting our customers and the Apache Spark and AI communities. Spending a summer at Databricks Databricks Engineering internships are a mix of […]

MLflow v0.7.0 Features New R API by RStudio

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Deep Learning, Ecosystem, Education, Engineering Blog, GPyOpt, Hyperopt, Java, Keras, Machine Learning, MLflow, multistep workflow, Partners, python, R, RStudio

Today, we’re excited to announce MLflow v0.7.0, released with new features, including a new MLflow R client API contributed by RStudio. A testament to MLflow’s design goal of an open platform with adoption in the community, RStudio’s contribution extends the MLflow platform to a larger R community of data scientists who use RStudio and R […]

Simplify Market Basket Analysis using FP-growth on Databricks

Posted Leave a commentPosted in Apache Spark, Company Blog, Education, Engineering Blog, FP-Growth, Machine Learning, Market Basket Analysis, Product

Try this notebook in Databricks When providing recommendations to shoppers on what to purchase, you are often looking for items that are frequently purchased together (e.g. peanut butter and jelly). A key technique to uncover associations between different items is known as market basket analysis. In your recommendation engine toolbox, the association rules generated by […]

Identify Suspicious Behavior in Video with Databricks Runtime for Machine Learning

Posted Leave a commentPosted in Apache Spark, Company Blog, Deep Learning, Deep Learning Pipelines, Education, Engineering Blog, Machine Learning, OpenCV, Platform, Product, TensorFlow, Video Analytics

Try this notebook series in Databricks With the exponential growth of cameras and visual recordings, it is becoming increasingly important to operationalize and automate the process of video identification and categorization. Applications ranging from identifying the correct cat video to visually categorizing objects are becoming more prevalent.  With millions of users around the world generating […]

Introducing Flint: A time-series library for Apache Spark

Posted Leave a commentPosted in Apache Spark, Company Blog, Customers, Education, Engineering Blog, Flint, python, Spark SQL, Time Series

This is a joint guest community blog by Li Jin at Two Sigma and Kevin Rasmussen at Databricks; they share how to use Flint with Apache Spark. Introduction The volume of data that data scientists face these days increases relentlessly, and we now find that a traditional, single-machine solution is no longer adequate to the demands […]

Examining The Positive And Negative Impacts Of AI On Education

Posted Leave a commentPosted in AI, Artificial Intelligence, Education, learning, Machine Learning, SmartData Collective Exclusive

As investments into machine learning and AI continue to push the boundaries of what a machine is capable of, the possible applications for artificial intelligence are beginning to creep into sectors that were previously only possible in the realm of fiction. To some, the idea of a machine helping humans learn in a procedurally generated […]

How Big Data And Education Can Work Together To Help Students Thrive

Posted Leave a commentPosted in Big Data, Business Intelligence, Decision Management, Education, Workforce Data

Between 2015 and 2017, consumers and enterprises have created more data than the sum of all data created prior to that time, and technology experts forecast the sum of all data will expand from 130 to 40,000 exabytes in the decade and a half preceding the year 2020. By then, researchers estimate that society will […]

Building a Real-Time Attribution Pipeline with Databricks Delta

Posted Leave a commentPosted in Adhoc Analysis, Advertising Analytics, Apache Spark, bi, Company Blog, Databricks Delta, Ecosystem, Education, Engineering Blog, Kinesis, Machine Learning, Platform, Product, Spark Streaming, Streaming, Structured Streaming, Tableau

Try this notebook in Databricks In digital advertising, one of the most important things to be able to deliver to clients is information about how their advertising spend drove results.  The more quickly we can provide this, the better. To tie conversions or engagements to the impressions served in an advertising campaign, companies must perform […]

Loan Risk Analysis with XGBoost and Databricks Runtime for Machine Learning

Posted Leave a commentPosted in Apache Spark, Company Blog, data pipeline, Data Visualization, Ecosystem, Education, Engineering Blog, financial, Machine Learning, MLlib, Platform, Product, XGBoost

Try this notebook series in Databricks For companies that make money off of interest on loans held by their customer, it’s always about increasing the bottom line. Being able to assess the risk of loan applications can save a lender the cost of holding too many risky assets. It is the data scientist’s job to […]

How Artificial Intelligence Is Helping Children And The Elderly

Posted Leave a commentPosted in aging population, AI, Artificial Intelligence, childcare, Education, Paro, Poppy, Robear

Although most people don’t realize it, AI is already helping humanity to meet some growing challenges. It’s being used to help us care for the elderly and take care of our children, and that’s probably just the beginning. These days, there’s plenty of talk about how artificial intelligence (AI) is going to impact the business […]