Koalas: Easy Transition from pandas to Apache Spark

Posted Leave a commentPosted in Announcements, Apache Spark, Company Blog, Data Science, Ecosystem, Education, Engineering Blog, Machine Learning, Open Source, Pandas, python

Today at Spark + AI Summit, we announced Koalas, a new open source project that augments PySpark’s DataFrame API to make it compatible with pandas. Python data science has exploded over the past few years and pandas has emerged as the lynchpin of the ecosystem. When data scientists get their hands on a data set, […]

Benchmarking Apache Spark on a Single Node Machine

Posted Leave a commentPosted in Apache Spark, Apache Spark 2.3, Ecosystem, Engineering Blog, Pandas, PySpark, python

Apache Spark has become the de facto unified analytics engine for big data processing in a distributed environment. Yet we are seeing more users choosing to run Spark on a single machine, often their laptops, to process small to large data sets, than electing a large Spark cluster. This choice is primarily because of the […]