100x Faster Bridge between Apache Spark and R with User-Defined Functions on Databricks

Posted Leave a commentPosted in Apache Spark, Databricks, Engineering Blog, Machine Learning, R, SparkR, Unified Analytics Platform

SparkR User-Defined Function (UDF) API opens up opportunities for big data workloads running on Apache Spark to embrace R’s rich package ecosystem. Some of our customers that have R experts on board use SparkR UDF API to blend R’s sophisticated packages into their ETL pipeline, applying transformations that go beyond Spark’s built-in functions on the […]

rquery: Practical Big Data Transforms for R-Spark Users

Posted Leave a commentPosted in Apache Spark, Big Data, Data Science, Engineering Blog, Machine Learning, R, Spark SQL, SparkR

This is a guest community blog from Nina Zumel and John Mount, data scientists and consultants at Win-Vector. They share how to use rquery with Apache Spark on Databricks Try this notebook in Databricks Introduction In this blog, we will introduce rquery, a powerful query tool that allows R users to implement powerful data transformations […]