Big Data Mastery: Data Pipelines, Processing, and Messaging

Master the art of building robust data pipelines with our Big Data Mastery course. Dive deep into Hadoop architecture, Spark optimization, and Kafka messaging under expert guidance. This comprehensive training program equips you with essential skills for designing efficient data workflows tailored to industry demands.

Face-to-Face Oct 1-2, 2025 9:00 AM - 5:00 PM Tarun Sukhani
updated
intermediate
Big Data Mastery: Data Pipelines, Processing, and Messaging
We price match

Public Pricing

MYR 3500

Corporate Pricing

Pax:

Training Fees: MYR 6500/day
Total Fees: MYR 13000 ++

Training Provider Pricing

Pax:

Training Fees: MYR 5600
Material Fees: MYR 600
Total Fees: MYR 6200

Features

2 days
14 modules
11 intakes
3 enrolled
Full life-time access
English

Subsidies

HRDC Claimable logo

What you'll learn

  • Gain expertise in Hive data warehousing for efficient large dataset analysis.
  • Explore Spark core concepts and optimization strategies for enhanced performance.
  • Develop proficiency in MapReduce programming with a focus on design patterns and optimization techniques.
  • Master Kafka core concepts and Streams API for real-time data processing.
  • Understand the fundamentals of Hadoop architecture including HDFS storage and YARN resource management.
  • Design end-to-end pipelines integrating multiple big data technologies.
  • Implement security measures within the Hadoop ecosystem using Kerberos and Apache Ranger.
  • Learn advanced batch processing patterns to manage real-time data effectively.

Why should you attend?

This course offers an in-depth exploration of big data technologies and methodologies, focusing on the creation and optimization of data pipelines, processing frameworks, and messaging systems. Participants will begin by understanding the foundational architecture of Hadoop, including HDFS storage mechanisms and YARN resource management. Through hands-on exercises, learners will set up pseudo-clusters using Docker to gain practical experience. The course progresses into MapReduce programming, where students will learn about mapper, reducer, and combiner design patterns. Optimization techniques for the shuffle/sort phase are covered to enhance performance. Learners will also delve into Hive data warehousing concepts, comparing HiveQL with ANSI SQL syntax and exploring partitioning strategies to efficiently analyze large datasets. Spark's core concepts are thoroughly examined, highlighting the tradeoffs between RDDs and DataFrames. The Catalyst optimizer is explored in detail to understand how Spark executes queries efficiently. Students will engage in hands-on ETL pipeline development with complex joins to solidify their understanding. Advanced topics include Spark optimization techniques such as memory management and handling data skew. The course also covers batch processing patterns like Lambda and Kappa architectures, ensuring participants can handle late-arriving data effectively. Security within the Hadoop ecosystem is addressed through Kerberos authentication and Apache Ranger policy control. Finally, learners will explore Kafka's core concepts and Streams API for real-time data processing, culminating in a comprehensive end-to-end pipeline design project that integrates multiple big data technologies.

Course Syllabus

Day 1 - Hadoop Architecture Fundamentals
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 1
Day 2 - MapReduce Programming Patterns
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 2

Instructor

Loading...
Tarun Sukhani Founder & CTO Teaching

Tarun Sukhani is a distinguished professional trainer and consultant with over 25 years of comprehensive industry experience spanning multinational corporations across the US, Europe, Asia, South America, and the Middle East. His extensive background encompasses senior executive roles including CIO/CTO, director, and board member positions at renowned organizations such as Dell, AMD, and Experian, as well as regional conglomerates like Indra in Asia Pacific. This diverse corporate experience provides him with unique insights into enterprise-level challenges and solutions across multiple business functions including HR, Finance, Operations, Sales, Risk Management, Engineering, and Accounting. As a highly sought-after trainer, Tarun specializes in an impressive array of cutting-edge technologies and methodologies. His expertise spans Agile/Scrum/SAFe frameworks, enterprise architecture (TOGAF/COBIT/ITIL), cybersecurity (CISSP/CEH/CISO), project management (PRINCE2/PMP), Big Data technologies (Hadoop/Spark), Data Science with Python and R, DevOps practices, Machine Learning/AI, cloud computing, blockchain technologies, and modern development frameworks. This comprehensive skill set enables him to deliver training across the entire technology spectrum, from foundational concepts to advanced implementations. His training delivery extends throughout the Asia Pacific region, including Malaysia, Indonesia, Philippines, Thailand, and Singapore, where he has successfully conducted workshops and training programs for both large enterprises and SMEs. Tarun's client portfolio includes industry leaders such as Dell, AMD, Western Digital, Singtel, CIMB, Digi, Tenaga Nasional, and Sime Darby, demonstrating his ability to work with diverse organizational cultures and technical requirements. Academically, Tarun holds exceptional credentials including an MSc in Information Systems and MBA in Finance and Operations Management from Loyola University Chicago, where he graduated summa cum laude with Beta Gamma Sigma and Alpha Sigma Nu honors. His educational foundation is further strengthened by Bachelor's degrees in Biology, Math, Computer Science, and Business Administration, plus advanced programs from MIT and Stanford in AI, Blockchain, and Entrepreneurship. His extensive certifications as an Agile/Scrum trainer, Java/.NET developer, Machine Learning specialist, and InfoSec expert validate his technical proficiency and commitment to continuous learning, making him an ideal trainer for organizations seeking comprehensive technology education and transformation guidance.'

53 Students
243 Courses
English, Malay, Spanish
25 Years

Instructor

Loading...
Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer Teaching

Mohammad Mehdi Lotfinejad is an accomplished Chief Data Officer and certified HRDF trainer with over 15 years of experience in computer science instruction and professional data science/engineering training. He combines extensive academic credentials with deep industry expertise, holding a PhD in Computer Science from Universiti Malaya and Harvard Business School certification in Business Analytics. His comprehensive technical background spans Apache Spark, MySQL, PostgreSQL, MongoDB, Snowflake, Redshift, Apache Airflow, API development, microservices, and Amazon Web Services. Currently serving as Chief Data and Knowledge Officer at Magna.ai, a Florida-based lawtech company, Lotfinejad leads the development of AI-driven legal case analysis systems, architecting graph databases, data warehouses, and workflow engines while ensuring compliance with legal standards. His concurrent role as Senior Data Engineer at AXIATA Digital Advertising (ADA) in Malaysia demonstrates his ability to manage complex, multi-regional data operations across Southeast Asian markets, designing automated pipelines using AWS RedShift, Snowflake, and Google BigQuery. His training expertise was honed during his tenure as Lead Senior Data Scientist Professional Trainer at The Center of Applied Data Science, where he designed and delivered comprehensive training programs for major corporations including CIMB, PETRONAS, SHELL, and TNB. He successfully led teams of data scientists and engineers in developing cutting-edge curriculum and migrating legacy systems to modern data management solutions. His academic foundation includes faculty positions at multiple universities where he taught computer architecture, programming languages, software engineering, and data structures while publishing numerous high-impact research papers and books. Lotfinejad's unique combination of technical leadership, educational expertise, and industry experience makes him exceptionally qualified to deliver sophisticated software training programs. His proven track record of leading cross-functional teams, developing enterprise-level solutions, and translating complex technical concepts into accessible learning materials positions him as an ideal trainer for organizations seeking to advance their technical capabilities in data science, engineering, and modern software development practices.'

8 Students
70 Courses
18 Years

Minimum Qualification

undergraduate

Target Audience

students
entry level
engineers

Methodologies

lecture
slides
labs
q&A

Instructor Reviews

Tarun Sukhani Founder & CTO
MK
Michael Wong Shen Kai
3 years ago
3 years ago

He was indeed very skilled, knowledgeable and passionate in the data science realm. I was impressed with his business know-how (how the world economy works and how all things can be explain with data, with/without bias) and technical skills in converting data into insights. I will not hesitate to recommend Tarun for any data science related training as I would like to attend more classes myself to learn from the best of the best.

AA
Anak Agung
3 years ago
3 years ago

I attended one of Tarun's Data Science course in Jakarta (CDSS). He was a professional trainer & very knowledgeable in Data Science. In his course, Tarun gave many practical examples & valuable information regarding how to conduct Data Science & it's related components (e.g. Software & Deployment Architecture). In addition to those lessons, he also gave very useful insights on building a career as a Data Scientist.

PC
Pei Cher Chai
3 years ago
3 years ago

Attended "Blockchain Training: An Overview for Business Professionals" conducted by Dr. Tarun. The reference materials are very comprehensive and an excellent means of conveying information. I was very impressed with how this technology works and adapted into business

LO
LJ Ong
3 years ago
3 years ago

He shared his professional insights on data science with a sense of humor that cleared up so many of my questions about the content and real-world applications. Information, tools, and resources given are very useful

AS
Aamer S
3 years ago
3 years ago

His knowledge of multiple subjects exceeds far greater than that of any IT or non-IT person I have met or interacted with in a long time. The breadth and depth of the subject matter he has acquired is exemplary.

JK
Jovyn Kim
3 years ago
3 years ago

Training with Tarun has been awesome. He’s super knowledgable, funny, empathetic and a great educator in general. As someone who didn’t come from a computer science background, his teachings didn’t make me feel stupid or impossible to eventually arrive at being a competent developer. I could understand him as he communicates well & has helped me see the big picture of the computer science field beyond the scope of syntaxes. If you similarly did not come from a CS background and hope to transition into the world of programming but struggle to learn on your own, understand all the foreign & abstract concepts and connect the dots, I think the right person to guide you on your journey would make a big difference. Having someone who’s deep in the field with many years of experience narrow and communicate the relevant areas to focus would also close a big gap from having to struggle and figure out a lot of things on your own. Being able to maintain your interest during your learning journey is important too, thus finding that someone is important. All in all, I would wholeheartedly recommend Tarun and the backend course I took.

SK
Srikanth K
3 years ago
3 years ago

Tarun is a results-driven & inspirational technology leader with a clear vision, direction, and broad-based technology expertise. He is passionate, intuitive, engaged, pragmatic, systematic, agile. His experiences span from small start-ups to complex, global companies, from being technical lead to technical strategist to being the leader of larger group of architecture and engineering teams. Much of his experiences are in the area of Java, Scala, Machine Learning, Neural Networks, Cloud Computing, Data Science and what not. I am truly amazed to experience his breadth & depth of technological expertise and pleasure to be part of his team.

ZY
Zulfikri Y
3 years ago
3 years ago

Tarun is very passionate on the domains and gave numerous insights to support critical business decisions and develop data products to transform daily encounters and processes. He was a professional trainer & very knowledgeable in Data Science. His material is presented through a sequence of brief lectures, interactive demonstrations, great hands-on exercises, and discussions.

MS
Marti Sigi
5 years ago
5 years ago

We’ve been collaborated many times in doing courses for the accountants. He spoke to quiet number of event in our company with various topic regards to accountants need. The collaboration was very smooth and his session definitely made a huge impact on our success. Mr Tarun is a great Professional!

PK
Pravena K
3 years ago
3 years ago

Mr. Tarun is a driven, hardworking, and knowledgeable entrepreneur in his field." A broad-minded trainer who embraces change and inspires people to do better every day. Mr. Tarun sets a good example by being enthusiastic and dedicated, and he inspires and motivates others. I am delighted to be working for such personnel

Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer
review avatar
Michael Ogheneme
1 year ago
1 year ago

Mehdi and I worked on several projects with company such as Petronas , Shell and CIMB Regional ETC. I must say Mehdi's training was highly appreciated by our clients as he was able to exhibit in full display his vast knowledge as a Data professional. I would highly recommend him to anyone looking for a top tier training expert.

AJ
Amin Jula
1 year ago
1 year ago

Not only knowledgeable but also having hands dirty on what he knows Friendly and building networks quickly.

KO
Kennedy Okonkwo
1 year ago
1 year ago

I had the pleasure of working with Mehdi together on some high-level initiatives such as the Petronas data scientist program and Shell's project to become a data-driven organization. During these projects, Mehdi received numerous accolades for his ability to share his knowledge and mentor up-and-coming data scientists. Based on our shared experiences, I have no hesitation in recommending Mehdi for any project or position he may be considered for.

FAQs

Why should you attend?

This course offers an in-depth exploration of big data technologies and methodologies, focusing on the creation and optimization of data pipelines, processing frameworks, and messaging systems. Participants will begin by understanding the foundational architecture of Hadoop, including HDFS storage mechanisms and YARN resource management. Through hands-on exercises, learners will set up pseudo-clusters using Docker to gain practical experience. The course progresses into MapReduce programming, where students will learn about mapper, reducer, and combiner design patterns. Optimization techniques for the shuffle/sort phase are covered to enhance performance. Learners will also delve into Hive data warehousing concepts, comparing HiveQL with ANSI SQL syntax and exploring partitioning strategies to efficiently analyze large datasets. Spark's core concepts are thoroughly examined, highlighting the tradeoffs between RDDs and DataFrames. The Catalyst optimizer is explored in detail to understand how Spark executes queries efficiently. Students will engage in hands-on ETL pipeline development with complex joins to solidify their understanding. Advanced topics include Spark optimization techniques such as memory management and handling data skew. The course also covers batch processing patterns like Lambda and Kappa architectures, ensuring participants can handle late-arriving data effectively. Security within the Hadoop ecosystem is addressed through Kerberos authentication and Apache Ranger policy control. Finally, learners will explore Kafka's core concepts and Streams API for real-time data processing, culminating in a comprehensive end-to-end pipeline design project that integrates multiple big data technologies.

What you'll learn

  • Gain expertise in Hive data warehousing for efficient large dataset analysis.
  • Explore Spark core concepts and optimization strategies for enhanced performance.
  • Develop proficiency in MapReduce programming with a focus on design patterns and optimization techniques.
  • Master Kafka core concepts and Streams API for real-time data processing.
  • Understand the fundamentals of Hadoop architecture including HDFS storage and YARN resource management.
  • Design end-to-end pipelines integrating multiple big data technologies.
  • Implement security measures within the Hadoop ecosystem using Kerberos and Apache Ranger.
  • Learn advanced batch processing patterns to manage real-time data effectively.

Course Syllabus

Day 1 - Hadoop Architecture Fundamentals
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 1
Day 2 - MapReduce Programming Patterns
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 2

Instructor Reviews

Tarun Sukhani Founder & CTO
MK
Michael Wong Shen Kai
3 years ago
3 years ago

He was indeed very skilled, knowledgeable and passionate in the data science realm. I was impressed with his business know-how (how the world economy works and how all things can be explain with data, with/without bias) and technical skills in converting data into insights. I will not hesitate to recommend Tarun for any data science related training as I would like to attend more classes myself to learn from the best of the best.

AA
Anak Agung
3 years ago
3 years ago

I attended one of Tarun's Data Science course in Jakarta (CDSS). He was a professional trainer & very knowledgeable in Data Science. In his course, Tarun gave many practical examples & valuable information regarding how to conduct Data Science & it's related components (e.g. Software & Deployment Architecture). In addition to those lessons, he also gave very useful insights on building a career as a Data Scientist.

PC
Pei Cher Chai
3 years ago
3 years ago

Attended "Blockchain Training: An Overview for Business Professionals" conducted by Dr. Tarun. The reference materials are very comprehensive and an excellent means of conveying information. I was very impressed with how this technology works and adapted into business

LO
LJ Ong
3 years ago
3 years ago

He shared his professional insights on data science with a sense of humor that cleared up so many of my questions about the content and real-world applications. Information, tools, and resources given are very useful

AS
Aamer S
3 years ago
3 years ago

His knowledge of multiple subjects exceeds far greater than that of any IT or non-IT person I have met or interacted with in a long time. The breadth and depth of the subject matter he has acquired is exemplary.

JK
Jovyn Kim
3 years ago
3 years ago

Training with Tarun has been awesome. He’s super knowledgable, funny, empathetic and a great educator in general. As someone who didn’t come from a computer science background, his teachings didn’t make me feel stupid or impossible to eventually arrive at being a competent developer. I could understand him as he communicates well & has helped me see the big picture of the computer science field beyond the scope of syntaxes. If you similarly did not come from a CS background and hope to transition into the world of programming but struggle to learn on your own, understand all the foreign & abstract concepts and connect the dots, I think the right person to guide you on your journey would make a big difference. Having someone who’s deep in the field with many years of experience narrow and communicate the relevant areas to focus would also close a big gap from having to struggle and figure out a lot of things on your own. Being able to maintain your interest during your learning journey is important too, thus finding that someone is important. All in all, I would wholeheartedly recommend Tarun and the backend course I took.

SK
Srikanth K
3 years ago
3 years ago

Tarun is a results-driven & inspirational technology leader with a clear vision, direction, and broad-based technology expertise. He is passionate, intuitive, engaged, pragmatic, systematic, agile. His experiences span from small start-ups to complex, global companies, from being technical lead to technical strategist to being the leader of larger group of architecture and engineering teams. Much of his experiences are in the area of Java, Scala, Machine Learning, Neural Networks, Cloud Computing, Data Science and what not. I am truly amazed to experience his breadth & depth of technological expertise and pleasure to be part of his team.

ZY
Zulfikri Y
3 years ago
3 years ago

Tarun is very passionate on the domains and gave numerous insights to support critical business decisions and develop data products to transform daily encounters and processes. He was a professional trainer & very knowledgeable in Data Science. His material is presented through a sequence of brief lectures, interactive demonstrations, great hands-on exercises, and discussions.

MS
Marti Sigi
5 years ago
5 years ago

We’ve been collaborated many times in doing courses for the accountants. He spoke to quiet number of event in our company with various topic regards to accountants need. The collaboration was very smooth and his session definitely made a huge impact on our success. Mr Tarun is a great Professional!

PK
Pravena K
3 years ago
3 years ago

Mr. Tarun is a driven, hardworking, and knowledgeable entrepreneur in his field." A broad-minded trainer who embraces change and inspires people to do better every day. Mr. Tarun sets a good example by being enthusiastic and dedicated, and he inspires and motivates others. I am delighted to be working for such personnel

Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer
review avatar
Michael Ogheneme
1 year ago
1 year ago

Mehdi and I worked on several projects with company such as Petronas , Shell and CIMB Regional ETC. I must say Mehdi's training was highly appreciated by our clients as he was able to exhibit in full display his vast knowledge as a Data professional. I would highly recommend him to anyone looking for a top tier training expert.

AJ
Amin Jula
1 year ago
1 year ago

Not only knowledgeable but also having hands dirty on what he knows Friendly and building networks quickly.

KO
Kennedy Okonkwo
1 year ago
1 year ago

I had the pleasure of working with Mehdi together on some high-level initiatives such as the Petronas data scientist program and Shell's project to become a data-driven organization. During these projects, Mehdi received numerous accolades for his ability to share his knowledge and mentor up-and-coming data scientists. Based on our shared experiences, I have no hesitation in recommending Mehdi for any project or position he may be considered for.

We price match

Public Pricing

MYR 3500

Corporate Pricing

Pax:

Training Fees: MYR 6500/day
Total Fees: MYR 13000 ++

Training Provider Pricing

Pax:

Training Fees: MYR 5600
Material Fees: MYR 600
Total Fees: MYR 6200

Features

2 days
14 modules
11 intakes
3 enrolled
Full life-time access
English

Subsidies

HRDC Claimable logo

Instructors

Loading...
Tarun Sukhani Founder & CTO Teaching

Tarun Sukhani is a distinguished professional trainer and consultant with over 25 years of comprehensive industry experience spanning multinational corporations across the US, Europe, Asia, South America, and the Middle East. His extensive background encompasses senior executive roles including CIO/CTO, director, and board member positions at renowned organizations such as Dell, AMD, and Experian, as well as regional conglomerates like Indra in Asia Pacific. This diverse corporate experience provides him with unique insights into enterprise-level challenges and solutions across multiple business functions including HR, Finance, Operations, Sales, Risk Management, Engineering, and Accounting. As a highly sought-after trainer, Tarun specializes in an impressive array of cutting-edge technologies and methodologies. His expertise spans Agile/Scrum/SAFe frameworks, enterprise architecture (TOGAF/COBIT/ITIL), cybersecurity (CISSP/CEH/CISO), project management (PRINCE2/PMP), Big Data technologies (Hadoop/Spark), Data Science with Python and R, DevOps practices, Machine Learning/AI, cloud computing, blockchain technologies, and modern development frameworks. This comprehensive skill set enables him to deliver training across the entire technology spectrum, from foundational concepts to advanced implementations. His training delivery extends throughout the Asia Pacific region, including Malaysia, Indonesia, Philippines, Thailand, and Singapore, where he has successfully conducted workshops and training programs for both large enterprises and SMEs. Tarun's client portfolio includes industry leaders such as Dell, AMD, Western Digital, Singtel, CIMB, Digi, Tenaga Nasional, and Sime Darby, demonstrating his ability to work with diverse organizational cultures and technical requirements. Academically, Tarun holds exceptional credentials including an MSc in Information Systems and MBA in Finance and Operations Management from Loyola University Chicago, where he graduated summa cum laude with Beta Gamma Sigma and Alpha Sigma Nu honors. His educational foundation is further strengthened by Bachelor's degrees in Biology, Math, Computer Science, and Business Administration, plus advanced programs from MIT and Stanford in AI, Blockchain, and Entrepreneurship. His extensive certifications as an Agile/Scrum trainer, Java/.NET developer, Machine Learning specialist, and InfoSec expert validate his technical proficiency and commitment to continuous learning, making him an ideal trainer for organizations seeking comprehensive technology education and transformation guidance.'

53 Students
243 Courses
English, Malay, Spanish
25 Years
Loading...
Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer Teaching

Mohammad Mehdi Lotfinejad is an accomplished Chief Data Officer and certified HRDF trainer with over 15 years of experience in computer science instruction and professional data science/engineering training. He combines extensive academic credentials with deep industry expertise, holding a PhD in Computer Science from Universiti Malaya and Harvard Business School certification in Business Analytics. His comprehensive technical background spans Apache Spark, MySQL, PostgreSQL, MongoDB, Snowflake, Redshift, Apache Airflow, API development, microservices, and Amazon Web Services. Currently serving as Chief Data and Knowledge Officer at Magna.ai, a Florida-based lawtech company, Lotfinejad leads the development of AI-driven legal case analysis systems, architecting graph databases, data warehouses, and workflow engines while ensuring compliance with legal standards. His concurrent role as Senior Data Engineer at AXIATA Digital Advertising (ADA) in Malaysia demonstrates his ability to manage complex, multi-regional data operations across Southeast Asian markets, designing automated pipelines using AWS RedShift, Snowflake, and Google BigQuery. His training expertise was honed during his tenure as Lead Senior Data Scientist Professional Trainer at The Center of Applied Data Science, where he designed and delivered comprehensive training programs for major corporations including CIMB, PETRONAS, SHELL, and TNB. He successfully led teams of data scientists and engineers in developing cutting-edge curriculum and migrating legacy systems to modern data management solutions. His academic foundation includes faculty positions at multiple universities where he taught computer architecture, programming languages, software engineering, and data structures while publishing numerous high-impact research papers and books. Lotfinejad's unique combination of technical leadership, educational expertise, and industry experience makes him exceptionally qualified to deliver sophisticated software training programs. His proven track record of leading cross-functional teams, developing enterprise-level solutions, and translating complex technical concepts into accessible learning materials positions him as an ideal trainer for organizations seeking to advance their technical capabilities in data science, engineering, and modern software development practices.'

8 Students
70 Courses
18 Years

Minimum Qualification

undergraduate

Target Audience

students
entry level
engineers

Methodologies

lecture
slides
labs
q&A

FAQs

Close menu