Apparate: Managing Libraries in Databricks with CI/CD

Posted Leave a commentPosted in Apache Spark, apparate, CI/CD, continuous delivery, continuous integration, Continuous Processing, Customers, Education, Partners, Product

This is a guest blog from Hanna Torrence, Data Scientist at ShopRunner. Introduction As leveraging data becomes a more vital component of organizations’ tech stacks, it becomes increasingly important for data teams to make use of software engineering best-practices. The Databricks platform provides excellent tools for exploratory Apache Spark workflows in notebooks as well as […]

Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3

Posted Leave a commentPosted in Apache Spark, Apache Spark 2.0, Continuous Processing, Databricks Runtime 4.0, Streaming, Structured Streaming

Import this notebook on Databricks Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons. First, it made developer’s experience with the APIs simpler: the APIs did not have to account for micro-batches. Second, it allowed developers to treat a stream as an infinite table to which […]