Koalas: Easy Transition from pandas to Apache Spark

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Data Science, Ecosystem, Education, Engineering Blog, Machine Learning, Open Source, Pandas, python

Today at Spark + AI Summit, we announced Koalas, a new open source project that augments PySpark’s DataFrame API to make it compatible with pandas. Python data science has exploded over the past few years and pandas has emerged as the lynchpin of the ecosystem. When data scientists get their hands on a data set, […]

MLflow v0.9.0 Features SQL Backend, Projects in Docker, and Customization in Python Models

Posted Leave a commentPosted in Apache Spark, docker, Engineering Blog, Machine Learning, Machine Learning Life Cycle, MLflow, Platform, python, SQLAlchemy

MLflow v0.9.0 was released this week. It introduces a set of new features and community contributions, including SQL store for tracking server, support for MLflow projects in Docker containers, and simple customization in Python models. Additionally, this release adds a plugin scheme to customize MLflow backend store for tracking and artifacts. Now available on PyPI […]

A Guide to Data Science, Python, and Advanced Analytics Talks at Spark + AI Summit 2019

Posted Leave a commentPosted in Advanced Analytics, Company Blog, Data Science, Education, python, Spark + AI Summit

With a tsunami of data, scale of computing resources available, and rapid development of easy-to-learn open source Machine Learning frameworks, data science and machine learning concepts are much easier to learn and implement today than they were a decade ago. As a result, across all industries, practitioners are using cutting-edge ML algorithms to solve tough […]

An Introduction To Hands-On Text Analytics In Python

Posted Leave a commentPosted in Analytics, Big Data, NLP, python, SmartData Collective Exclusive, Text Analytics, tokenisation

Python is a high-level, object-oriented development tool. Here is a quick, hands-on tutorial on how to use the text analytics function. Python enables four kinds of analytics: Text matching Text classification Topic modelling Summarization Let’s begin by understanding some of the NLP features of Python, how it is set up and how to read the […]

7 Lessons To Teach You All You Need To Know About Machine Learning

Posted Leave a commentPosted in Education, Machine Learning, Programming, python, SmartData Collective Exclusive, Software

Before discussing the ways in which you can learn all you need to know about machine learning, we would like to discuss what the subject matter actually is. Machine learning is essentially teaching a computer how to make decisions with the help of relevant data. It is very important for the computer to be able […]

MLflow v0.7.0 Features New R API by RStudio

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Deep Learning, Ecosystem, Education, Engineering Blog, GPyOpt, Hyperopt, Java, Keras, Machine Learning, MLflow, multistep workflow, Partners, python, R, RStudio

Today, we’re excited to announce MLflow v0.7.0, released with new features, including a new MLflow R client API contributed by RStudio. A testament to MLflow’s design goal of an open platform with adoption in the community, RStudio’s contribution extends the MLflow platform to a larger R community of data scientists who use RStudio and R […]

Introducing Flint: A time-series library for Apache Spark

Posted Leave a commentPosted in Apache Spark, Company Blog, Customers, Education, Engineering Blog, Flint, python, Spark SQL, Time Series

This is a joint guest community blog by Li Jin at Two Sigma and Kevin Rasmussen at Databricks; they share how to use Flint with Apache Spark. Introduction The volume of data that data scientists face these days increases relentlessly, and we now find that a traditional, single-machine solution is no longer adequate to the demands […]

How to Use MLflow to Experiment a Keras Network Model: Binary Classification for Movie Reviews

Posted Leave a commentPosted in Apache Spark, Data Science, Engineering Blog, Machine Learning, MLflow, Model Management, Platform, python, Unified Analytics Platform

In the last blog post, we demonstrated the ease with which you can get started with MLflow, an open-source platform to manage machine learning lifecycle. In particular, we illustrated a simple Keras/TensorFlow model using MLflow and PyCharm. This time we explore a binary classification Keras network model. Using MLflow’s Tracking APIs, we will track metrics—accuracy […]

Data Science Professional Certificate – Cognitive Class

Posted Leave a commentPosted in AI, Coursera, Data Science, Data Visualization, Databases, Machine Learning, Partnerships, Professional Certificate, python, Relational databases

Today IBM and Coursera launched an online Data Science Professional Certificate to address the shortage of skills in data-related professions. This certificate is designed for those interested in a career in Data Science or AI, and equips people to become job-ready through hands-on, practical learning. IBM Data Science Professional Certificate In this post we look […]

Introducing mlflow-apps: A Repository of Sample Applications for MLflow

Posted Leave a commentPosted in Apache Spark, Data Science, Engineering Blog, Machine Learning, MLflow, Platform, python, TensorFlow, Unified Analytics Platform

Introduction This summer, I was a software engineering intern at Databricks on the Machine Learning (ML) Platform team. As part of my intern project, I built a set of MLflow apps that demonstrate MLflow’s capabilities and offer the community examples to learn from. In this blog, I’ll discuss this library of pluggable ML applications, all […]