Introducing Built-in Image Data Source in Apache Spark 2.4

Posted Leave a commentPosted in Apache Spark, Data Source, Databricks Runtime, DataFrames, Deep Learning Pipelines, Ecosystem, Engineering Blog, Machine Learning

Introduction With recent advances in deep learning frameworks for image classification and object detection, the demand for standard image processing in Apache Spark has never been greater. Image handling and preprocessing have their specific challenges – for example, images come in different formats (eg., jpeg, png, etc.), sizes, and color schemes, and there is no […]

SQL Pivot: Converting Rows to Columns

Posted Leave a commentPosted in Apache Spark, DataFrames, Engineering Blog, Spark SQL, sql, Unified Analytics Platform

Try this notebook in Databricks Pivot was first introduced in Apache Spark 1.6 as a new DataFrame feature that allows users to rotate a table-valued expression by turning the unique values from one column into individual columns. The upcoming Apache Spark 2.4 release extends this powerful functionality of pivoting data to our SQL users as […]