An Introduction To Hands-On Text Analytics In Python

Posted Leave a commentPosted in Analytics, Big Data, NLP, python, SmartData Collective Exclusive, Text Analytics, tokenisation

Python is a high-level, object-oriented development tool. Here is a quick, hands-on tutorial on how to use the text analytics function. Python enables four kinds of analytics: Text matching Text classification Topic modelling Summarization Let’s begin by understanding some of the NLP features of Python, how it is set up and how to read the […]

7 Lessons To Teach You All You Need To Know About Machine Learning

Posted Leave a commentPosted in Education, Machine Learning, Programming, python, SmartData Collective Exclusive, Software

Before discussing the ways in which you can learn all you need to know about machine learning, we would like to discuss what the subject matter actually is. Machine learning is essentially teaching a computer how to make decisions with the help of relevant data. It is very important for the computer to be able […]

MLflow v0.7.0 Features New R API by RStudio

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Deep Learning, Ecosystem, Education, Engineering Blog, GPyOpt, Hyperopt, Java, Keras, Machine Learning, MLflow, multistep workflow, Partners, python, R, RStudio

Today, we’re excited to announce MLflow v0.7.0, released with new features, including a new MLflow R client API contributed by RStudio. A testament to MLflow’s design goal of an open platform with adoption in the community, RStudio’s contribution extends the MLflow platform to a larger R community of data scientists who use RStudio and R […]

Introducing Flint: A time-series library for Apache Spark

Posted Leave a commentPosted in Apache Spark, Company Blog, Customers, Education, Engineering Blog, Flint, python, Spark SQL, Time Series

This is a joint guest community blog by Li Jin at Two Sigma and Kevin Rasmussen at Databricks; they share how to use Flint with Apache Spark. Introduction The volume of data that data scientists face these days increases relentlessly, and we now find that a traditional, single-machine solution is no longer adequate to the demands […]

How to Use MLflow to Experiment a Keras Network Model: Binary Classification for Movie Reviews

Posted Leave a commentPosted in Apache Spark, Data Science, Engineering Blog, Machine Learning, MLflow, Model Management, Platform, python, Unified Analytics Platform

In the last blog post, we demonstrated the ease with which you can get started with MLflow, an open-source platform to manage machine learning lifecycle. In particular, we illustrated a simple Keras/TensorFlow model using MLflow and PyCharm. This time we explore a binary classification Keras network model. Using MLflow’s Tracking APIs, we will track metrics—accuracy […]

Data Science Professional Certificate – Cognitive Class

Posted Leave a commentPosted in AI, Coursera, Data Science, Data Visualization, Databases, Machine Learning, Partnerships, Professional Certificate, python, Relational databases

Today IBM and Coursera launched an online Data Science Professional Certificate to address the shortage of skills in data-related professions. This certificate is designed for those interested in a career in Data Science or AI, and equips people to become job-ready through hands-on, practical learning. IBM Data Science Professional Certificate In this post we look […]

Introducing mlflow-apps: A Repository of Sample Applications for MLflow

Posted Leave a commentPosted in Apache Spark, Data Science, Engineering Blog, Machine Learning, MLflow, Platform, python, TensorFlow, Unified Analytics Platform

Introduction This summer, I was a software engineering intern at Databricks on the Machine Learning (ML) Platform team. As part of my intern project, I built a set of MLflow apps that demonstrate MLflow’s capabilities and offer the community examples to learn from. In this blog, I’ll discuss this library of pluggable ML applications, all […]

Why Choosing Python For Data Science Is An Important Move

Posted Leave a commentPosted in Big Data, business analytics, Business Intelligence, data analysis, Data Science, datasets, Deep Learning, PixieDust, python

In this article, we are going to discuss about why to choose Python for data science. We’ll introduce PixieDust, an open source library, that focuses on three simple goals: Democratize data science by lowering the barrier to entry for non-data scientists Increase collaboration between developers and data scientists Make it easier to operationalize data science […]

Bay Area Apache Spark Meetup Summary @ Databricks HQ

Posted Leave a commentPosted in Apache Spark, Company Blog, Deep Learning, Events, Machine Learning, MLflow, Model Management, python, TensorFlow

On July 19, we held our monthly Bay Area Spark Meetup (BASM) at Databricks, HQ in San Francisco. At the Spark + AI Summit in June, we announced two open-source projects: Project Hydrogen and MLflow. Partly to continue sharing the progress of these open-source projects with the community and partly to encourage community contributions, two […]

Scalable End-to-End Deep Learning using TensorFlow™ and Databricks: On-Demand Webinar and FAQ Now Available!

Posted Leave a commentPosted in Apache Spark, Data Science, Databricks Runtime, Deep Learning, Ecosystem, Engineering Blog, Horovod, Machine Learning, Platform, Product, python, TensorFlow

On July 9th, our team hosted a live webinar—Scalable End-to-End Deep Learning using TensorFlow™ and Databricks—with Brooke Wenig, Data Science Solutions Consultant at Databricks and Sid Murching, Software Engineer at Databricks. In this webinar, we walked you through how to use TensorFlow™ and Horovod (an open-source library from Uber to simplify distributed model training) on […]