Data Engineering - SQL, Spark & Pipeline Automation

Elevate your career with our cutting-edge Data Engineering course. Embark on an immersive learning experience that seamlessly integrates core concepts like SQL, BI tools, Spark processing, NoSQL databases, pipeline automation, Big Data fundamentals, and MapReduce techniques. Enhance your skill set through practical implementation in a dynamic environment.

updated
beginner
Data Engineering
We price match

Public Pricing

MYR 1750

Corporate Pricing

Pax:

Training Fees: MYR 6500/day
Total Fees: MYR 6500 ++

Training Provider Pricing

Pax:

Training Fees: MYR 2400/day
Material Fees: MYR 400
Total Fees: MYR 2800 ++

Certification

Certified Beginner Data Engineer
Certified Beginner Data Engineer
Veritas
Validity: 2 years

Features

1 day
8 modules
1 enrolled
Full life-time access
English
Technical: 25 pax

Target Audience

students
engineers

Methodologies

lecture
slides
labs

Subsidies

HRDC Claimable logo

What you'll learn

  • Develop expertise in MapReduce programming model for large-scale distributed computing.
  • Automate complex Data Pipelines utilizing Apache Airflow for efficient data management.
  • Comprehend the principles of Big Data including handling various types of datasets.
  • Gain proficiency in SQL using PostgreSQL and understand relational database design.
  • Understand NoSQL databases creation with Apache Cassandra and contrast it with SQL models.
  • Learn Business Intelligence concepts and implement a Data Warehouse using Pentaho on AWS.
  • Acquire hands-on experience in Hive Query Language for interacting with big data stores.
  • Master big data processing using SparkSQL, DataFrames, Datasets including MLlib & GraphX.

Why should you attend?

Dive into the world of data engineering with our meticulously crafted course, designed to equip you with the essential skills required in this ever-evolving field. Begin your journey by mastering SQL and PostgreSQL, where you'll gain fluency in SQL commands and understand the intricacies of creating relational data models and normalization processes. Progress to Business Intelligence (BI) and Data Warehousing using Pentaho, learning the fundamentals of data warehousing, integration, and how to implement these concepts on AWS, including building multi-dimensional cubes. Advance further with SparkSQL, DataFrames, and Datasets to handle big data processing with ease. You'll explore the capabilities of SparkSQL, learn to manipulate data using DataFrames and RDDs, and delve into Spark's MLLib for machine learning applications. Gain insights into Data Lakes and acquire skills in data wrangling for more efficient data processing. The course also covers NoSQL database creation with Apache Cassandra, providing a solid foundation in Data Modelling. You will compare SQL vs NoSQL data models and learn about denormalized schemas like STAR and Snowflake. Automating Data Pipelines is another critical area you'll master, using tools such as Apache Airflow to create robust pipelines that ensure data quality and track lineage. Finally, grasp Big Data Fundamentals by understanding the 4 V's—veracity, variability, visualization, and value—and working with different types of data. Explore Hive and HBase along with Hive Query Language before concluding with MapReduce where you'll learn about partitioning mappers and reducers for efficient big data processing.

Course Syllabus

Day 1 - Data Engineering Foundations
Module 1
Module 2
Short Break
15 mins
Module 3
Module 4
Recap and Q&A
15 mins
Lunch
1 hour
Module 5
Module 6
Short Break
15 mins
Module 7
Module 8
Recap and Q&A
15 mins
End of Day 1

Instructor

Loading...
Mohammad Mehdi Lotfinejad Chief Data Officer & Data Science Trainer
Trainer Profile
Trainer Profile
TTT Certificate
TTT Certificate
Mohammad Mehdi Lotfinejad is a distinguished Chief Data Officer and certified HRDF trainer with over 15 years of experience in computer science instruction and professional data science training. Currently serving as Chief Data and Knowledge Officer at Magna.ai, a Florida-based lawtech company, he leads the development of sophisticated graph databases, data warehouses, and API solutions that power AI-driven legal case analysis systems. His expertise spans the entire data ecosystem, from architecture design to workflow automation and team leadership. With a robust background encompassing more than a decade of hands-on experience in data engineering and data science, Mohammad has successfully implemented enterprise-scale data processing pipelines across multiple industries and geographies. His professional journey includes senior roles at Axiata Digital Advertising (ADA) in Malaysia, where he designed and deployed data pipelines using AWS Redshift, Snowflake, Apache Spark, and Apache Airflow, and at The Center of Applied Data Science, where he led teams delivering training solutions to major corporations including CIMB, PETRONAS, SHELL, and TNB. Mohammad's technical proficiency is comprehensive and current, encompassing cloud platforms (AWS, Google Cloud), data warehousing solutions (Redshift, Snowflake), big data technologies (Apache Spark, Hadoop, Hive), workflow orchestration (Apache Airflow), and multiple database systems (MySQL, PostgreSQL, MongoDB). He holds a Harvard Business School certification in Business Analytics and multiple AWS certifications, including specialized credentials in Big Data, Data Warehousing, and Practical Data Science with Amazon SageMaker. As an educator, Mohammad brings exceptional depth to his training delivery. His academic career includes faculty positions at Payame Noor University and Islamic Azad University, where he served in leadership roles including Chancellor and Department Head. He has authored three technical books and published numerous peer-reviewed papers in international journals. His teaching repertoire covers data engineering, data science, machine learning, software development, and computer architecture, delivered through engaging, hands-on methodologies that bridge theoretical concepts with practical industry applications. Mohammad's unique combination of executive leadership, technical expertise, and proven training capabilities makes him an invaluable resource for organizations seeking to develop their data science and engineering capabilities. His ability to translate complex technical concepts into actionable learning outcomes, coupled with his extensive real-world implementation experience, ensures that training participants gain immediately applicable skills that drive business value.
8 Students
102 Courses
18 Years

Instructor

Loading...
Mohammed Al-obaydee AI & Data Science Expert
Trainer Profile
Trainer Profile
TTT Certificate
TTT Certificate
Dr. Mohammed Al-Obaydee is a distinguished academic and industry professional with extensive expertise in artificial intelligence, machine learning, deep learning, and data science. As an Adjunct Associate Professor at Taylor's University Malaysia and a certified Train the Trainer professional, he brings a unique blend of academic rigor and practical industry experience to his teaching and consulting engagements. With a Ph.D. in Information Science from Universiti Kebangsaan Malaysia (UKM) and over two decades of experience spanning academia and industry, Dr. Al-Obaydee has established himself as a leading authority in advanced computing technologies. His research excellence is evidenced by multiple best paper awards, including recognition at IEEE ACSAT2014 and ITNG 2023 conferences, along with an impressive publication record featuring over 149 citations and an h-index of 6. His research focuses on medical image analysis, computer vision, machine learning applications, and deep learning techniques. As a founder of INNOEYETIVE and AIC4ALL, Dr. Al-Obaydee demonstrates entrepreneurial acumen in leveraging AI and immersive technologies for digital transformation. His industry experience includes successful training and consultation projects with major corporations such as HSBC Bank, OCBC Bank, DELL Malaysia, Amway Malaysia, and EtiQa, where he has delivered specialized programs in data science, analytics, and applied AI engineering. His ability to bridge the gap between theoretical concepts and practical applications makes him an exceptional trainer for professionals seeking to upskill in emerging technologies. Dr. Al-Obaydee's teaching portfolio encompasses a comprehensive range of subjects including Machine Learning, Deep Learning, Data Analytics, Data Engineering, Parallel Computing, and Advanced Data Structures. He has supervised numerous master's students and research projects at both Taylor's University and the International Islamic University Malaysia (IIUM), focusing on cutting-edge topics such as deep reinforcement learning, medical image classification, fraud detection, and natural language processing. His international speaking engagements and extensive training experience across multiple organizations demonstrate his exceptional communication skills and ability to make complex technical concepts accessible to diverse audiences.
113 Courses
English
21 Years

Course Reviews

"Overall, I am now confident in my knowledge of Data Engineering. The only criticism is that there is a lot of material to take in, so taking running notes is a good idea (helps revise for assignments as well)"

"This course was well-designed and educational. I came in knowing very little about data engineering and now have a lot better idea of what it is all about."

"A good lesson for beginners to learn the fundamentals of data engineering. People should consider becoming a data engineer, scientist, or analyst."

"Excellent preparation for becoming a Data Engineer. It explains what Data Engineers do and what they should know."

Instructor Reviews

ML

Mohammad Mehdi Lotfinejad

Chief Data Officer & Data Science Trainer

"Mehdi and I worked on several projects with company such as Petronas , Shell and CIMB Regional ETC. I must say Mehdi's training was highly appreciated by our clients as he was able to exhibit in full display his vast knowledge as a Data professional. I would highly recommend him to anyone looking for a top tier training expert."

"Not only knowledgeable but also having hands dirty on what he knows Friendly and building networks quickly."

"I had the pleasure of working with Mehdi together on some high-level initiatives such as the Petronas data scientist program and Shell's project to become a data-driven organization. During these projects, Mehdi received numerous accolades for his ability to share his knowledge and mentor up-and-coming data scientists. Based on our shared experiences, I have no hesitation in recommending Mehdi for any project or position he may be considered for."

FAQ

Frequently Asked Questions About This Course

Why should you attend?

Dive into the world of data engineering with our meticulously crafted course, designed to equip you with the essential skills required in this ever-evolving field. Begin your journey by mastering SQL and PostgreSQL, where you'll gain fluency in SQL commands and understand the intricacies of creating relational data models and normalization processes. Progress to Business Intelligence (BI) and Data Warehousing using Pentaho, learning the fundamentals of data warehousing, integration, and how to implement these concepts on AWS, including building multi-dimensional cubes. Advance further with SparkSQL, DataFrames, and Datasets to handle big data processing with ease. You'll explore the capabilities of SparkSQL, learn to manipulate data using DataFrames and RDDs, and delve into Spark's MLLib for machine learning applications. Gain insights into Data Lakes and acquire skills in data wrangling for more efficient data processing. The course also covers NoSQL database creation with Apache Cassandra, providing a solid foundation in Data Modelling. You will compare SQL vs NoSQL data models and learn about denormalized schemas like STAR and Snowflake. Automating Data Pipelines is another critical area you'll master, using tools such as Apache Airflow to create robust pipelines that ensure data quality and track lineage. Finally, grasp Big Data Fundamentals by understanding the 4 V's—veracity, variability, visualization, and value—and working with different types of data. Explore Hive and HBase along with Hive Query Language before concluding with MapReduce where you'll learn about partitioning mappers and reducers for efficient big data processing.


What you'll learn

  • Develop expertise in MapReduce programming model for large-scale distributed computing.
  • Automate complex Data Pipelines utilizing Apache Airflow for efficient data management.
  • Comprehend the principles of Big Data including handling various types of datasets.
  • Gain proficiency in SQL using PostgreSQL and understand relational database design.
  • Understand NoSQL databases creation with Apache Cassandra and contrast it with SQL models.
  • Learn Business Intelligence concepts and implement a Data Warehouse using Pentaho on AWS.
  • Acquire hands-on experience in Hive Query Language for interacting with big data stores.
  • Master big data processing using SparkSQL, DataFrames, Datasets including MLlib & GraphX.

Course Syllabus

Day 1 - Data Engineering Foundations
Module 1
Module 2
Short Break
15 mins
Module 3
Module 4
Recap and Q&A
15 mins
Lunch
1 hour
Module 5
Module 6
Short Break
15 mins
Module 7
Module 8
Recap and Q&A
15 mins
End of Day 1

Course Reviews

"Overall, I am now confident in my knowledge of Data Engineering. The only criticism is that there is a lot of material to take in, so taking running notes is a good idea (helps revise for assignments as well)"

"This course was well-designed and educational. I came in knowing very little about data engineering and now have a lot better idea of what it is all about."

"A good lesson for beginners to learn the fundamentals of data engineering. People should consider becoming a data engineer, scientist, or analyst."

"Excellent preparation for becoming a Data Engineer. It explains what Data Engineers do and what they should know."


Instructor Reviews

ML

Mohammad Mehdi Lotfinejad

Chief Data Officer & Data Science Trainer

"Mehdi and I worked on several projects with company such as Petronas , Shell and CIMB Regional ETC. I must say Mehdi's training was highly appreciated by our clients as he was able to exhibit in full display his vast knowledge as a Data professional. I would highly recommend him to anyone looking for a top tier training expert."

"Not only knowledgeable but also having hands dirty on what he knows Friendly and building networks quickly."

"I had the pleasure of working with Mehdi together on some high-level initiatives such as the Petronas data scientist program and Shell's project to become a data-driven organization. During these projects, Mehdi received numerous accolades for his ability to share his knowledge and mentor up-and-coming data scientists. Based on our shared experiences, I have no hesitation in recommending Mehdi for any project or position he may be considered for."

We price match

Public Pricing

MYR 1750

Corporate Pricing

Pax:

Training Fees: MYR 6500/day
Total Fees: MYR 6500 ++

Training Provider Pricing

Pax:

Training Fees: MYR 2400/day
Material Fees: MYR 400
Total Fees: MYR 2800 ++

Certification

Certified Beginner Data Engineer
Certified Beginner Data Engineer
Veritas
Validity: 2 years

Features

1 day
8 modules
1 enrolled
Full life-time access
English
Technical: 25 pax

Target Audience

students
engineers

Methodologies

lecture
slides
labs

Subsidies

HRDC Claimable logo

Instructors

Loading...
Mohammad Mehdi Lotfinejad Chief Data Officer & Data Science Trainer
Trainer Profile
Trainer Profile
TTT Certificate
TTT Certificate
Mohammad Mehdi Lotfinejad is a distinguished Chief Data Officer and certified HRDF trainer with over 15 years of experience in computer science instruction and professional data science training. Currently serving as Chief Data and Knowledge Officer at Magna.ai, a Florida-based lawtech company, he leads the development of sophisticated graph databases, data warehouses, and API solutions that power AI-driven legal case analysis systems. His expertise spans the entire data ecosystem, from architecture design to workflow automation and team leadership. With a robust background encompassing more than a decade of hands-on experience in data engineering and data science, Mohammad has successfully implemented enterprise-scale data processing pipelines across multiple industries and geographies. His professional journey includes senior roles at Axiata Digital Advertising (ADA) in Malaysia, where he designed and deployed data pipelines using AWS Redshift, Snowflake, Apache Spark, and Apache Airflow, and at The Center of Applied Data Science, where he led teams delivering training solutions to major corporations including CIMB, PETRONAS, SHELL, and TNB. Mohammad's technical proficiency is comprehensive and current, encompassing cloud platforms (AWS, Google Cloud), data warehousing solutions (Redshift, Snowflake), big data technologies (Apache Spark, Hadoop, Hive), workflow orchestration (Apache Airflow), and multiple database systems (MySQL, PostgreSQL, MongoDB). He holds a Harvard Business School certification in Business Analytics and multiple AWS certifications, including specialized credentials in Big Data, Data Warehousing, and Practical Data Science with Amazon SageMaker. As an educator, Mohammad brings exceptional depth to his training delivery. His academic career includes faculty positions at Payame Noor University and Islamic Azad University, where he served in leadership roles including Chancellor and Department Head. He has authored three technical books and published numerous peer-reviewed papers in international journals. His teaching repertoire covers data engineering, data science, machine learning, software development, and computer architecture, delivered through engaging, hands-on methodologies that bridge theoretical concepts with practical industry applications. Mohammad's unique combination of executive leadership, technical expertise, and proven training capabilities makes him an invaluable resource for organizations seeking to develop their data science and engineering capabilities. His ability to translate complex technical concepts into actionable learning outcomes, coupled with his extensive real-world implementation experience, ensures that training participants gain immediately applicable skills that drive business value.
8 Students
102 Courses
18 Years
Loading...
Mohammed Al-obaydee AI & Data Science Expert
Trainer Profile
Trainer Profile
TTT Certificate
TTT Certificate
Dr. Mohammed Al-Obaydee is a distinguished academic and industry professional with extensive expertise in artificial intelligence, machine learning, deep learning, and data science. As an Adjunct Associate Professor at Taylor's University Malaysia and a certified Train the Trainer professional, he brings a unique blend of academic rigor and practical industry experience to his teaching and consulting engagements. With a Ph.D. in Information Science from Universiti Kebangsaan Malaysia (UKM) and over two decades of experience spanning academia and industry, Dr. Al-Obaydee has established himself as a leading authority in advanced computing technologies. His research excellence is evidenced by multiple best paper awards, including recognition at IEEE ACSAT2014 and ITNG 2023 conferences, along with an impressive publication record featuring over 149 citations and an h-index of 6. His research focuses on medical image analysis, computer vision, machine learning applications, and deep learning techniques. As a founder of INNOEYETIVE and AIC4ALL, Dr. Al-Obaydee demonstrates entrepreneurial acumen in leveraging AI and immersive technologies for digital transformation. His industry experience includes successful training and consultation projects with major corporations such as HSBC Bank, OCBC Bank, DELL Malaysia, Amway Malaysia, and EtiQa, where he has delivered specialized programs in data science, analytics, and applied AI engineering. His ability to bridge the gap between theoretical concepts and practical applications makes him an exceptional trainer for professionals seeking to upskill in emerging technologies. Dr. Al-Obaydee's teaching portfolio encompasses a comprehensive range of subjects including Machine Learning, Deep Learning, Data Analytics, Data Engineering, Parallel Computing, and Advanced Data Structures. He has supervised numerous master's students and research projects at both Taylor's University and the International Islamic University Malaysia (IIUM), focusing on cutting-edge topics such as deep reinforcement learning, medical image classification, fraud detection, and natural language processing. His international speaking engagements and extensive training experience across multiple organizations demonstrate his exceptional communication skills and ability to make complex technical concepts accessible to diverse audiences.
113 Courses
English
21 Years

FAQ

Frequently Asked Questions About This Course

Close menu