Benchmarking Apache Spark on a Single Node Machine

Posted Leave a commentPosted in Apache Spark, Apache Spark 2.3, Ecosystem, Engineering Blog, Pandas, PySpark, python

Apache Spark has become the de facto unified analytics engine for big data processing in a distributed environment. Yet we are seeing more users choosing to run Spark on a single machine, often their laptops, to process small to large data sets, than electing a large Spark cluster. This choice is primarily because of the […]