EC2 versus EMR : bigdata

Posted Leave a commentPosted in BigData

For all bigdata gurus everywhere from hedgefunds (quant finance) to biotech (drug discovery) to social media (twitter) to discuss the latest trends, topics, career opportunities and tricks of the trade! Rules: No advertising, don’t blatantly link to your own product(s). Posts must be relevant to big data technologies or discussions. Related subreddits: r/datascience r/bigdatajobs r/machinelearning […]

Azure Databricks, industry-leading analytics platform powered by Apache Spark™

Posted Leave a commentPosted in Company Blog, Partners

The confluence of cloud, data, and AI is driving unprecedented change. The ability to utilize data and turn it into breakthrough insights is foundational to innovation today. Our goal is to empower organizations to unleash the power of data and reimagine possibilities that will improve our world. To enable this journey, we are excited to […]

The 5 A's of AI

Posted Leave a commentPosted in Artificial Intelligence

Summary — Fundamentally, there are five ways to work with Artificial Intelligence. The 5 A’s of AI are the five ways for us to choose how we want to work with AI: are we guiding the AI or do we trust the AI enough to let it guide us. Artificial Intelligence (AI) will have a […]

Introducing Low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3

Posted Leave a commentPosted in Apache Spark, Apache Spark 2.0, Continuous Processing, Databricks Runtime 4.0, Streaming, Structured Streaming

Import this notebook on Databricks Structured Streaming in Apache Spark 2.0 decoupled micro-batch processing from its high-level APIs for a couple of reasons. First, it made developer’s experience with the APIs simpler: the APIs did not have to account for micro-batches. Second, it allowed developers to treat a stream as an infinite table to which […]

Selected Sessions to Watch for at Spark + AI Summit 2018

Posted Leave a commentPosted in AI, Apache Spark, Company Blog, Deep Learning, Events, Machine Learning, Spark Summit + AI 2018

Early last month, we announced our agenda for Spark + AI Summit 2018, with over 180 selected talks with 11 tracks and training courses. For this summit, we have added four new tracks to expand its scope to include Deep Learning Frameworks, AI, Productionizing Machine Learning, Hardware in the Cloud, and Python and Advanced Analytics. […]

Moving from hype to deployment

Posted Leave a commentPosted in Data Science

Although cognitive computing, which is many a times referred to as AI or Artificial Intelligence, is not a new concept, the hype surrounding it and the level of interest pertaining to it is definitely new. The combination of hype surrounding robot overlords, vendor marketing and concerns regarding job losses has fueled the hype into where […]

Introducing Stream-Stream Joins in Apache Spark 2.3

Posted Leave a commentPosted in Apache Spark, Databricks Runtime, Engineering Blog, Streaming, Structured Streaming

Since we introduced Structured Streaming in Apache Spark 2.0, it has supported joins (inner join and some type of outer joins) between a streaming and a static DataFrame/Dataset. With the release of Apache Spark 2.3.0, now available in Databricks Runtime 4.0 as part of Databricks Unified Analytics Platform, we now support stream-stream joins. In this […]

Embedded Self-Service Enhances the Growing Culture of Analytics

Posted Leave a commentPosted in Big Data Week 2018

Not long ago, effective data analytics were seen as a form of differentiation. Today they are table stakes, as companies become increasingly data-driven to help grow their businesses and remain competitive. As business intelligence in the enterprise shifts from the purview of data scientists and IT-led reporting into the hands of non-technical business users, agile […]