A Guide to Apache Spark Use Cases, Streaming, and Research Talks at Spark + AI Summit Europe

Posted Leave a commentPosted in Apache Spark, Company Blog, Data Science, Events, Genomics, Machine Learning, PySpark, Spark + AI Summit Europe, Structured Streaming, Unified Analytics Platform

For much of Apache Spark’s history, its capacity to process data at scale and capability to unify disparate workloads has led Spark developers to tackle new use cases. Through innovation and extension of its ecosystem, developers combine data and AI to develop new applications. So it befits developers to come to this summit not just […]

New Features in MLflow v0.5.0 Release

Posted Leave a commentPosted in Apache Spark, Data Science, Engineering Blog, Machine Learning, MLflow, Model Management, Platform, PySpark

Today, we’re excited to announce MLflow v0.5.0, which we released last week with some new features. MLflow 0.5.0 is already available on PyPI and docs are updated. If you do pip install mlflow as described in the MLflow quickstart guide, you will get the recent release. In this post, we’ll describe new features and fixes […]

A Guide to Developer, Apache Spark Use Cases, and Deep Dives Talks at Spark + AI Summit

Posted Leave a commentPosted in Apache Spark, Company Blog, Events, Kubernetes, Machine Learning, PySpark, Spark + AI Summit, Structured Streaming

Apache Spark is tackling new frontiers through innovations by unifying new workloads. This enables developers to combine data and AI to develop intelligent applications. Developers come to this summit not just to hear about innovations from contributors. They come to share their use cases, experiences, and absorb knowledge. @matei_zaharia just announced support of #kubernetes in […]

A Guide to AI, Machine Learning, and Data Science Talks at Spark + AI Summit

Posted Leave a commentPosted in Apache Spark, Company Blog, Data Science, Deep Learning, Deep Learning Pipelines, Events, Machine Learning, PySpark

By any measurement today, in the digital media, technical conferences and citations, or searches on Google trends, the frequency of terms like artificial intelligence, machine learning, deep learning or data science is on the rise. A special reportin The Economist makes the case that Artificial Intelligence (AI) and Machine Learning (ML), and its purported tectonic […]

Benchmarking Apache Spark on a Single Node Machine

Posted Leave a commentPosted in Apache Spark, Apache Spark 2.3, Ecosystem, Engineering Blog, Pandas, PySpark, python

Apache Spark has become the de facto unified analytics engine for big data processing in a distributed environment. Yet we are seeing more users choosing to run Spark on a single machine, often their laptops, to process small to large data sets, than electing a large Spark cluster. This choice is primarily because of the […]