As organizations move to the cloud, the architecture for a Modern Data Warehouse (MDW) allows a new level of performance and scalability. A modern data warehouse enables bringing together data at any scale easily, and to get insights through analytical dashboards, operational reports, or advanced analytics. But what does an MDW look like? The following diagram from our partner Microsoft shows the MDW architecture pattern that we see adopted by more and more of our customers. This architecture becomes all the more compelling in light of the new Azure SQL Datawarehouse price-performance benchmarks just released by GigaOM.
Source: Azure Modern Data Warehouse
As the diagram shows, the MDW enables you to converge relational and non-relational, or structured and unstructured data into a single hub. You can use Azure Data Factory to automate movement and transformation of data from over 70 data sources, then load data into Azure Data Lake Storage as a highly scalable and cost-effective data lake. Azure Databricks then provides the processing capability for data preparation, such as transformation and cleansing. The cleansed data is then loaded into Azure SQL Data Warehouse to combine with your existing data and make it all readily available for analysis through visual tools like Power BI.
Let’s focus on performance and scalability that is critical to building a successful MDW. Databricks has always been focused on high performance and scalability, it’s part of the original vision for the platform and we’ve published several resources to showcase our world-class benchmarks, performance and capabilities, such as Performance Benchmarking Big Data Platforms in the Cloud webinar (with Databricks Chief Architect & Co-Founder Reynold Xin), Simplify Advertising Analytics Click Prediction with Databricks Unified Analytics Platform and Introducing Databricks Optimized Autoscaling on Apache Spark.
The latest benchmark for Azure SQL Data Warehouse was just released with a new level of price-performance that further reinforces why the combination of Azure Databricks and Azure SQL Data Warehouse are a powerful solution for customers to modernize their data warehouses in the cloud. By the way, if you want to learn more about Azure SQL Data Warehouse (and try it for free) see here.
There is real engineering behind the integration of Azure Databricks and Azure SQL Data Warehouse to ensure they work together seamlessly. You can access Azure SQL DW from Azure Databricks through the specialized Azure SQL Data Warehouse connector that enables you to transfer large volumes of data efficiently between an Azure Databricks cluster and a SQL DW instance. We also recently announced that Azure Databricks users can directly stream data into Azure SQL Data Warehouse using Structured Streams. This enables customers to visualize and report on near real-time data in SQL DW backed by real-time streaming pipelines built with Structured Streams, resulting in faster decision making across the enterprise. Our tutorial spells out the requirements and steps to set up this optimized connection between Azure Databricks and Azure SQL Data Warehouse.
In summary, we see our customers moving to the cloud and modernizing their data warehouse environments to operate with performance at scale. There is a real focus as an end-to-end platform to support this trend by providing the high-performance, scalable engines to implement the Modern Data Warehouse architecture.
As part of this solution, Azure Databricks is well integrated with the broader Azure Data services ecosystem to enable customers to build end to end solution on a single platform. This includes fine-grained security and control through Azure Active Directory integration and the industry-leading SLAs and enterprise-grade support by Azure.
Get started today!
We are excited for you to try Azure Databricks and Azure SQL Data Warehouse to modernize your data warehouse!