Spark is a “lightning-fast cluster computing” framework for Big Data that provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop.
This course is for data science enthusiast learners who will use PySpark, a Python package for Spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc.
At the end of this course, you will have gained an in-depth understanding of PySpark and its application to general Big Data analysis.
What you will learn
You will learn the following topics in this course