BDW453

Big Data with Spark

Master big data processing with our dynamic Spark course. Delve into RDDs, DataFrames, Python scripting, cluster computing, machine learning with MLlib, streaming analytics, and graph processing. Enroll now for hands-on expertise in cutting-edge data technologies.

Available in:
Malaysia

Face-to-Face Jul 14-15, 2025 9:00 AM - 5:00 PM Tarun Sukhani

Last Modified: July 12th 2025, 6:00 AM MYT

updated

beginner

We price match

Public Pricing

MYR 3500

Corporate Pricing

Pax:

Training Fees: MYR 6500/day

Total Fees: MYR 13000 ++

Training Provider Pricing

Pax:

Training Fees: MYR 4800

Material Fees: MYR 400

Total Fees: MYR 5200

Features

2 days

14 modules

17 intakes

English

Subsidies

What you'll learn

Troubleshoot common issues encountered while running Spark on a cluster.
Implement machine learning algorithms using Spark MLlib.
Gain proficiency in working with Resilient Distributed Datasets (RDDs) and DataFrames.
Configure and manage Spark on cloud-based clusters like AWS ElasticReduce.
Perform real-time data processing using Spark Streaming and understand graph analytics with GraphX.
Learn how to install Apache Spark and its necessary dependencies.
Develop skills in Python relevant to Spark applications.
Understand the differences between unstructured, semi-structured, and structured data.

Why should you attend?

This Big Data with Spark course offers a deep dive into the world of large-scale data processing, utilizing Apache Spark as the core framework. Participants will begin by grasping fundamental terminology and concepts, including the nature of unstructured, semi-structured, and structured data, along with an introduction to the Apache Spark ecosystem and its relationship with MapReduce. Installation procedures for Spark and its dependencies set the stage for practical engagement. As learners progress, they'll explore Spark's primary abstractions - RDDs (Resilient Distributed Datasets) and DataFrames - understanding their creation, manipulation, and optimization techniques. The course also covers key operations like filtering RDDs and flatmap transformations. A dedicated module introduces Python, emphasizing its relevance in the Spark context and providing hands-on experience with tools like Jupyter Notebook. Advanced topics include configuring and running Spark on various clusters such as AWS ElasticReduce and Cloudera CDH, troubleshooting techniques, and dependency management. Participants will engage in practical exercises simulating real-world scenarios like movie recommendation scripts using Spark MLLib, logistic regression implementations, natural language processing (NLP), K-means clustering, streaming data processing with DStream transformations, GraphX programming for graph-based analysis, and more.

Course Syllabus

Unstructured, semi-structured, and structured data

Apache Spark ecosystem and MapReduce

Installing Spark and Its Dependencies

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

Lunch

1 hour

Short Break

15 mins

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

End of Day 1

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

Lunch

1 hour

Short Break

15 mins

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

End of Day 2

Ratings and Reviews

( 4 ratings )

75%

25%

( 0 reviews )

Instructor

Tarun Sukhani Founder & CTO Teaching

Tarun Sukhani is a distinguished professional trainer and consultant with nearly 30 years of experience in the IT and business sectors, having worked internationally across AMER, APAC, and EMEA. His expertise spans a wide array of domains such as Agile methodologies (Scrum, SAFe, Kanban), enterprise architecture frameworks like TOGAF and COBIT, IT service management standards including ITIL and ISO27001, as well as cybersecurity certifications like CISO and CISSP. Tarun's proficiency extends to project management frameworks such as PRINCE2 and PMP, along with cutting-edge technologies in Big Data analytics using Hadoop and Spark, data science with Python or R, and data visualization tools like Tableau. Tarun has held numerous senior development and executive roles including CIO/CTO positions where he managed large-scale IT operations for multinational corporations such as Dell, AMD, and Experian. His leadership skills have been instrumental in enhancing business operations across various functions including HR, Finance, Operations, Sales, Risk Management, Engineering/Manufacturing, and Accounting. He has also contributed significantly to regional conglomerates like Indra in the Asia Pacific region. A passionate educator at heart, Tarun has facilitated training workshops throughout Asia Pacific countries such as Malaysia, Indonesia, Philippines, Thailand, and Singapore. His training sessions cover a broad spectrum of topics from project management to strategic leadership and soft skills development. Additionally, he specializes in advanced technical subjects like software architecture design patterns for reactive microservices architectures on cloud platforms. Tarun's academic credentials are equally impressive; he graduated summa cum laude with an MSc in Information Systems and an MBA in Finance and Operations Management from Loyola University Chicago. He holds multiple Bachelor's degrees in Biology, Math & Computer Science alongside Business Administration. Further enriching his knowledge base are certifications as an Agile/Scrum trainer; Java/.NET programmer; Machine Learning specialist; InfoSec expert; Business Intelligence professional; complemented by advanced studies in AI & Blockchain from prestigious institutions like MIT & Stanford. His client portfolio includes renowned organizations such as Western Digital/Sandisk for machine learning product management projects; Singtel & CIMB for agile product development initiatives; Tenaga Nasional & JPJ for agile project management engagements among others. Tarun's dynamic approach combined with his deep-rooted passion for developing people makes him a sought-after speaker at international conferences where he shares insights on digital transformation strategies.

18 Students

231 Courses

English, Malay, Spanish

( 13 ratings )

100%

( 13 reviews )

Instructor

Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer Teaching

Mohammad Mehdi Lotfinejad is an accomplished Chief Data Officer with a profound expertise in data science and engineering, amassing over ten years of experience in developing data processing pipelines for enterprise insights. He is a proven leader with exceptional communication, presentation, and leadership skills, certified as an HRDF trainer with more than 15 years of experience as a computer science instructor both in academia and as a professional data science/engineering trainer in the industry. His academic credentials include a PhD in Computer Science from Universiti Malaya, Malaysia, and he holds certifications from Harvard Business School in Business Analytics. Mohammad specializes in cutting-edge technologies such as Apache Spark, MySQL, PostgreSQL, MongoDB, Snowflake, Redshift, Apache Airflow, API and Microservices, and Amazon Web Services. Currently serving as the Chief Data and Knowledge Officer at Magna.ai since February 2024, Mohammad leads the development of graph databases and data warehouses to support AI-driven law case analysis services. He architects robust API and microservice solutions to enhance system interoperability and scalability while ensuring data security and compliance with legal standards. Prior to this role, he has been contributing his expertise as a Senior Data Engineer at AXIATA Digital Advertising (ADA) since March 2020. Here, he collaborates on designing automated data pipelines using AWS RedShift and Snowflake for storing telco data and implements BI dashboards leveraging Google BigQuery. From June 2018 to February 2020, Mohammad was the Lead Senior Data Scientist Professional Trainer at The Center of Applied Data Science in Kuala Lumpur. He led teams of data scientists and engineers to design professional training programs for prominent clients like CIMB, PETRONAS, SHELL, and TNB. His earlier roles include leading big data engineering teams at RAHA in Iran where he developed large-scale analytics pipelines using Hadoop Ecosystem tools like Hive and Spark. In academia, Mohammad served as a faculty member at Payame Noor University from September 2014 to June 2018 where he supervised graduate research projects and contributed significantly to curriculum development. His tenure also includes leadership positions at Islamic Azad University where he managed departments to achieve high academic standards. With technical proficiencies spanning RDBMS like MySQL and PostgreSQL to programming languages such as Python and C++, Mohammad is adept at web design using HTML/CSS/Bootstrap alongside project management skills including Scrum Master certification. His published works include books on Object-Oriented Programming and Project Management Fundamentals along with numerous journal articles on topics ranging from solar radiation prediction models to machine learning algorithms for intrusion detection systems.

5 Students

61 Courses

( 3 ratings )

100%

( 3 reviews )

Minimum Qualification

graduate

Target Audience

entry level

engineers

Methodologies

lecture

slides

case studies

labs

group discussion

Instructor Reviews

Tarun Sukhani Founder & CTO

Michael Wong Shen Kai

3 years ago

He was indeed very skilled, knowledgeable and passionate in the data science realm. I was impressed with his business know-how (how the world economy works and how all things can be explain with data, with/without bias) and technical skills in converting data into insights. I will not hesitate to recommend Tarun for any data science related training as I would like to attend more classes myself to learn from the best of the best.

Anak Agung

3 years ago

I attended one of Tarun's Data Science course in Jakarta (CDSS). He was a professional trainer & very knowledgeable in Data Science. In his course, Tarun gave many practical examples & valuable information regarding how to conduct Data Science & it's related components (e.g. Software & Deployment Architecture). In addition to those lessons, he also gave very useful insights on building a career as a Data Scientist.

Pei Cher Chai

3 years ago

Attended "Blockchain Training: An Overview for Business Professionals" conducted by Dr. Tarun. The reference materials are very comprehensive and an excellent means of conveying information. I was very impressed with how this technology works and adapted into business

LJ Ong

3 years ago

He shared his professional insights on data science with a sense of humor that cleared up so many of my questions about the content and real-world applications. Information, tools, and resources given are very useful

Aamer S

3 years ago

His knowledge of multiple subjects exceeds far greater than that of any IT or non-IT person I have met or interacted with in a long time. The breadth and depth of the subject matter he has acquired is exemplary.

Jovyn Kim

3 years ago

Training with Tarun has been awesome. He’s super knowledgable, funny, empathetic and a great educator in general. As someone who didn’t come from a computer science background, his teachings didn’t make me feel stupid or impossible to eventually arrive at being a competent developer. I could understand him as he communicates well & has helped me see the big picture of the computer science field beyond the scope of syntaxes. If you similarly did not come from a CS background and hope to transition into the world of programming but struggle to learn on your own, understand all the foreign & abstract concepts and connect the dots, I think the right person to guide you on your journey would make a big difference. Having someone who’s deep in the field with many years of experience narrow and communicate the relevant areas to focus would also close a big gap from having to struggle and figure out a lot of things on your own. Being able to maintain your interest during your learning journey is important too, thus finding that someone is important. All in all, I would wholeheartedly recommend Tarun and the backend course I took.

Srikanth K

3 years ago

Tarun is a results-driven & inspirational technology leader with a clear vision, direction, and broad-based technology expertise. He is passionate, intuitive, engaged, pragmatic, systematic, agile. His experiences span from small start-ups to complex, global companies, from being technical lead to technical strategist to being the leader of larger group of architecture and engineering teams. Much of his experiences are in the area of Java, Scala, Machine Learning, Neural Networks, Cloud Computing, Data Science and what not. I am truly amazed to experience his breadth & depth of technological expertise and pleasure to be part of his team.

Zulfikri Y

3 years ago

Tarun is very passionate on the domains and gave numerous insights to support critical business decisions and develop data products to transform daily encounters and processes. He was a professional trainer & very knowledgeable in Data Science. His material is presented through a sequence of brief lectures, interactive demonstrations, great hands-on exercises, and discussions.

Marti Sigi

5 years ago

We’ve been collaborated many times in doing courses for the accountants. He spoke to quiet number of event in our company with various topic regards to accountants need. The collaboration was very smooth and his session definitely made a huge impact on our success. Mr Tarun is a great Professional!

Pravena K

3 years ago

Mr. Tarun is a driven, hardworking, and knowledgeable entrepreneur in his field." A broad-minded trainer who embraces change and inspires people to do better every day. Mr. Tarun sets a good example by being enthusiastic and dedicated, and he inspires and motivates others. I am delighted to be working for such personnel

Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer

Michael Ogheneme

1 year ago

Mehdi and I worked on several projects with company such as Petronas , Shell and CIMB Regional ETC. I must say Mehdi's training was highly appreciated by our clients as he was able to exhibit in full display his vast knowledge as a Data professional. I would highly recommend him to anyone looking for a top tier training expert.

Amin Jula

1 year ago

Not only knowledgeable but also having hands dirty on what he knows Friendly and building networks quickly.

Kennedy Okonkwo

1 year ago

I had the pleasure of working with Mehdi together on some high-level initiatives such as the Petronas data scientist program and Shell's project to become a data-driven organization. During these projects, Mehdi received numerous accolades for his ability to share his knowledge and mentor up-and-coming data scientists. Based on our shared experiences, I have no hesitation in recommending Mehdi for any project or position he may be considered for.

Why should you attend?

What you'll learn

Troubleshoot common issues encountered while running Spark on a cluster.
Implement machine learning algorithms using Spark MLlib.
Gain proficiency in working with Resilient Distributed Datasets (RDDs) and DataFrames.
Configure and manage Spark on cloud-based clusters like AWS ElasticReduce.
Perform real-time data processing using Spark Streaming and understand graph analytics with GraphX.
Learn how to install Apache Spark and its necessary dependencies.
Develop skills in Python relevant to Spark applications.
Understand the differences between unstructured, semi-structured, and structured data.

Course Syllabus

Unstructured, semi-structured, and structured data

Apache Spark ecosystem and MapReduce

Installing Spark and Its Dependencies

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

Lunch

1 hour

Short Break

15 mins

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

End of Day 1

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

Lunch

1 hour

Short Break

15 mins

Short Break

15 mins

Short Break

15 mins

Recap and Q&A

15 mins

End of Day 2

Instructor Reviews

Tarun Sukhani Founder & CTO

Michael Wong Shen Kai

3 years ago

Anak Agung

3 years ago

Pei Cher Chai

3 years ago

LJ Ong

3 years ago

Aamer S

3 years ago

Jovyn Kim

3 years ago

Srikanth K

3 years ago

Zulfikri Y

3 years ago

Marti Sigi

5 years ago

Pravena K

3 years ago

Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer

Michael Ogheneme

1 year ago

Amin Jula

1 year ago

Not only knowledgeable but also having hands dirty on what he knows Friendly and building networks quickly.

Kennedy Okonkwo

1 year ago

We price match

Public Pricing

MYR 3500

Corporate Pricing

Pax:

Training Fees: MYR 6500/day

Total Fees: MYR 13000 ++

Training Provider Pricing

Pax:

Training Fees: MYR 4800

Material Fees: MYR 400

Total Fees: MYR 5200

Features

2 days

14 modules

17 intakes

English

Subsidies

Ratings and Reviews

( 4 ratings )

75%

25%

( 0 reviews )

Instructors

Tarun Sukhani Founder & CTO Teaching

18 Students

231 Courses

English, Malay, Spanish

( 13 ratings )

100%

( 13 reviews )

Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer Teaching

5 Students

61 Courses

( 3 ratings )

100%

( 3 reviews )

Minimum Qualification

graduate

Target Audience

entry level

engineers

Methodologies

lecture

slides

case studies

labs

group discussion

Big Data with Spark

Share this course

Public Pricing

Corporate Pricing

Training Provider Pricing

Features

Subsidies

What you'll learn

Why should you attend?

Course Syllabus

Fundamental Terminology and Concepts

Short Break

RDD and DataFrames

Short Break

Python overview

Recap and Q&A

Lunch

Running Spark on a cluster

Short Break

Running Spark on a cluster

Short Break

Hands-on Exercises

Short Break

AWS EC2 PySpark

Recap and Q&A

End of Day 1

Spark MLLib

Short Break

Spark MLLib

Short Break

Logistic Regression

Recap and Q&A

Lunch

NLP and K-Means

Short Break

Spark Streaming and GraphX

Short Break

Spark Streaming and GraphX

Short Break

Hands-on Exercises

Recap and Q&A

End of Day 2

Ratings and Reviews

Instructor

Instructor

Minimum Qualification

Target Audience

Methodologies

Instructor Reviews

Why should you attend?

What you'll learn

Course Syllabus

Fundamental Terminology and Concepts

Short Break

RDD and DataFrames

Short Break

Python overview

Recap and Q&A

Lunch

Running Spark on a cluster

Short Break

Running Spark on a cluster

Short Break

Hands-on Exercises

Short Break

AWS EC2 PySpark

Recap and Q&A

End of Day 1

Spark MLLib

Short Break

Spark MLLib

Short Break

Logistic Regression

Recap and Q&A

Lunch

NLP and K-Means

Short Break

Spark Streaming and GraphX

Short Break

Spark Streaming and GraphX