Big Data with Spark

Master big data processing with our dynamic Spark course. Delve into RDDs, DataFrames, Python scripting, cluster computing, machine learning with MLlib, streaming analytics, and graph processing. Enroll now for hands-on expertise in cutting-edge data technologies.

Face-to-Face Apr 28, 2025 - Apr 29, 2025
updated
beginner
Big Data with Spark
MYR 3500

Training Provider Pricing

Material Fees: MYR 400

Pax:

MYR 4800

Features

2 days (9:00 AM - 5:00 PM)
14 modules
15 intakes
English

Subsidies

HRDC Claimable logo

What you'll learn

  • Troubleshoot common issues encountered while running Spark on a cluster.
  • Implement machine learning algorithms using Spark MLlib.
  • Gain proficiency in working with Resilient Distributed Datasets (RDDs) and DataFrames.
  • Configure and manage Spark on cloud-based clusters like AWS ElasticReduce.
  • Perform real-time data processing using Spark Streaming and understand graph analytics with GraphX.
  • Learn how to install Apache Spark and its necessary dependencies.
  • Develop skills in Python relevant to Spark applications.
  • Understand the differences between unstructured, semi-structured, and structured data.

Why should you attend?

This Big Data with Spark course offers a deep dive into the world of large-scale data processing, utilizing Apache Spark as the core framework. Participants will begin by grasping fundamental terminology and concepts, including the nature of unstructured, semi-structured, and structured data, along with an introduction to the Apache Spark ecosystem and its relationship with MapReduce. Installation procedures for Spark and its dependencies set the stage for practical engagement. As learners progress, they'll explore Spark's primary abstractions - RDDs (Resilient Distributed Datasets) and DataFrames - understanding their creation, manipulation, and optimization techniques. The course also covers key operations like filtering RDDs and flatmap transformations. A dedicated module introduces Python, emphasizing its relevance in the Spark context and providing hands-on experience with tools like Jupyter Notebook. Advanced topics include configuring and running Spark on various clusters such as AWS ElasticReduce and Cloudera CDH, troubleshooting techniques, and dependency management. Participants will engage in practical exercises simulating real-world scenarios like movie recommendation scripts using Spark MLLib, logistic regression implementations, natural language processing (NLP), K-means clustering, streaming data processing with DStream transformations, GraphX programming for graph-based analysis, and more.

Course Syllabus

Unstructured, semi-structured, and structured data
Apache Spark ecosystem and MapReduce
Installing Spark and Its Dependencies
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 1
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 2

Minimum Qualification

graduate

Target Audience

entry level
engineers

Methodologies

lecture
slides
case studies
labs
group discussion

Why should you attend?

This Big Data with Spark course offers a deep dive into the world of large-scale data processing, utilizing Apache Spark as the core framework. Participants will begin by grasping fundamental terminology and concepts, including the nature of unstructured, semi-structured, and structured data, along with an introduction to the Apache Spark ecosystem and its relationship with MapReduce. Installation procedures for Spark and its dependencies set the stage for practical engagement. As learners progress, they'll explore Spark's primary abstractions - RDDs (Resilient Distributed Datasets) and DataFrames - understanding their creation, manipulation, and optimization techniques. The course also covers key operations like filtering RDDs and flatmap transformations. A dedicated module introduces Python, emphasizing its relevance in the Spark context and providing hands-on experience with tools like Jupyter Notebook. Advanced topics include configuring and running Spark on various clusters such as AWS ElasticReduce and Cloudera CDH, troubleshooting techniques, and dependency management. Participants will engage in practical exercises simulating real-world scenarios like movie recommendation scripts using Spark MLLib, logistic regression implementations, natural language processing (NLP), K-means clustering, streaming data processing with DStream transformations, GraphX programming for graph-based analysis, and more.

What you'll learn

  • Troubleshoot common issues encountered while running Spark on a cluster.
  • Implement machine learning algorithms using Spark MLlib.
  • Gain proficiency in working with Resilient Distributed Datasets (RDDs) and DataFrames.
  • Configure and manage Spark on cloud-based clusters like AWS ElasticReduce.
  • Perform real-time data processing using Spark Streaming and understand graph analytics with GraphX.
  • Learn how to install Apache Spark and its necessary dependencies.
  • Develop skills in Python relevant to Spark applications.
  • Understand the differences between unstructured, semi-structured, and structured data.

Course Syllabus

Unstructured, semi-structured, and structured data
Apache Spark ecosystem and MapReduce
Installing Spark and Its Dependencies
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 1
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 2
MYR 3500

Training Provider Pricing

Material Fees: MYR 400

Pax:

MYR 4800

Features

2 days (9:00 AM - 5:00 PM)
14 modules
15 intakes
English

Subsidies

HRDC Claimable logo

Minimum Qualification

graduate

Target Audience

entry level
engineers

Methodologies

lecture
slides
case studies
labs
group discussion
Close menu