How to Work with Avro, Kafka, and Schema Registry in Databricks

Posted Leave a commentPosted in Apache Avro, Apache Kafka, Apache Spark, Company Blog, DBR 5.2, Ecosystem, Engineering Blog, Product, Streaming, Structured Streaming

In the previous blog post, we introduced the new built-in Apache Avro data source in Apache Spark and explained how you can use it to build streaming data pipelines with the from_avro and to_avro functions. Apache Kafka and Apache Avro are commonly used to build a scalable and near-real-time data pipeline. In this blog post, […]

Analytical and Problem-Solving Skill Set Necessary to Start Working with Big Data

Posted Leave a commentPosted in Analytics, Big Data, career, careers, data, data analytics, data scientists, data skills, Jobs, Predictive Analytics, skills, SmartData Collective Exclusive

A data analyst is a professional whose work involves collecting, cleaning, visualizing, and transforming or modeling raw data into the blocks of information that are used by marketers, developers and even accountants. The workflow of a data analyst is defined by the needs of a company or organization, but the final deliverable is always the […]

Why It’s So Hard to Fight Instinct vs. Data in BI Decision-Making

Posted Leave a commentPosted in bi, BI decision making, Big Data, business decisions, Business Intelligence, SmartData Collective Exclusive

In the days before robust business intelligence (BI) platforms existed, professionals usually had to rely on experience and gut instinct when making decisions. Now, data analysis tools can give solid conclusions gleaned from months or years of compiled information. Even so, the people who dig through data still have difficulty trusting what the data says […]

5 Innovative Ways To Reduce Instagram Data Usage

Posted Leave a commentPosted in Big Data, data analytics, data usage, facebook, Instagram, SmartData Collective Exclusive, Social Data, Social media, twitter

Instagram is among the apps that the internet uses accuse of consuming excessive data bundles. Loading the images, the videos and the stories that are posted consistently consumes more data than you may think. Videos simply pop up without warning unlike on Facebook where you have to click on the videos to view. Instagram videos […]

How Big Data Helped Russia Become A Leader In Car Sharing

Posted Leave a commentPosted in Big Data, car sharing, carsharing, data, data analytics, ride sharing, ridesharing, Russia, SmartData Collective Exclusive

Big data is playing a more important role in Russia than ever before. A 2017 report by Science Direct illustrated this point. One of the ways big data is being applied is in one of the fastest growing new industries – car sharing. It still might be a bit difficult for the average driver to […]

Near-Real-Time Hardware Failure Rate Estimation with Bayesian Reasoning

Posted Leave a commentPosted in Education, Machine Learning, Product

Try this notebook in Databricks You might be using Bayesian techniques in your data science without knowing it! And if you’re not, then it could enhance the power of your analysis. This blog follows the introduction to Bayesian reasoning on Data Science Central, and will demonstrate how these ideas can improve a real-world use case: […]

Customers Share Their Machine Learning Success Stories in the Data Mastery Tour

Posted Leave a commentPosted in Company Blog, Events

The most important factor to successful machine learning is having the right data at scale, but organizations struggle how to get the right datasets combined into the right format for their projects. At Databricks, we’ve seen customers achieve success by bringing data and ML together, and we’ve partnered with Snowflake to share their stories across […]