Big Data Analytics with PySpark

December 29, 2023

For whom is this course?

Spark is a “lightning-fast cluster computing” framework for Big Data that provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop.

This course is for data science enthusiast learners who will use PySpark, a Python package for Spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc.

At the end of this course, you will have gained an in-depth understanding of PySpark and its application to general Big Data analysis.

What will you learn?

You will learn the following topics in this course

Pyspark Installation
Introduction to Big Data analysis with Spark
Programming in PySpark RDD’s
PySpark SQL & Data Frames
Machine Learning with PySpark MLlib

Prerequisites

Python
Deep Learning Basis
SQL
Pandas (Data Frame)

Syllabus

Introduction to Big Data analysis with Spark
Programming in PySpark RDD’s
PySpark SQL & Data Frames
Machine Learning with PySpark MLlib

Instructors

QASIM HASSAN

LinkedIn Profile

Big Data Analytics with PySpark

For whom is this course?

What will you learn?

Prerequisites

Syllabus

Instructors

Course Info

View more Courses

Let us co-create the AI future