Big Data for Beginners

Master the art of Big Data analytics with our comprehensive course designed for aspiring professionals. Delve into Hadoop ecosystems, Scala programming intricacies and elevate your skills with hands-on Spark and PySpark training. Unravel NoSQL databases' potential including MongoDB & Cassandra while leveraging AWS services for unparalleled data handling.

Face-to-Face Apr 28, 2025 - Apr 29, 2025
updated
beginner
Big Data for Beginners
MYR 3500

Training Provider Pricing

Material Fees: MYR 400

Pax:

MYR 4800

Features

2 days (9:00 AM - 5:00 PM)
14 modules
21 intakes
English

Subsidies

HRDC Claimable logo

What you'll learn

  • Operate Apache Kafka for real-time event streaming
  • Learn about Hadoop ecosystem components including HDFS
  • Acquire proficiency in Scala for Big Data tasks
  • Utilize AWS services effectively for Big Data solutions
  • Grasp the essentials of Big Data processing using Hadoop and Spark
  • Develop an understanding of NoSQL databases such as MongoDB & Cassandra
  • Identify use cases where Big Data technologies apply
  • Understand fundamental terminology and concepts of Big Data

Why should you attend?

Embark on a journey through the expansive world of Big Data, where you'll unravel the complexities and harness the power of massive datasets. Begin with understanding the core terminologies and concepts that form the foundation of Big Data analytics, including its definition, history, and the pivotal 5 V's. Explore various data structures from unstructured to structured forms. Dive into real-world applications with use cases that highlight Big Data's role in data science and processing techniques. Gain practical insights into Hadoop's ecosystem, learning about its architecture, file system (HDFS), administration, and essential components. Transition smoothly into Scala programming for Big Data operations, covering object-oriented principles, case classes, collections, and idiomatic usage. Advance your skill set with a comprehensive look at Hadoop and Spark frameworks; understand distributed storage, ETL processes, MapReduce, Hive, HBase, and Spark's cutting-edge capabilities. Master Spark's core functionalities including RDDs, Spark SQL, MLLib for big data modeling, stream processing with Spark Streaming, and GraphX for graph processing. Further amplify your expertise with PySpark where you'll work with resilient distributed datasets (RDDs), dataframes, transformations, and data processing techniques. Python enthusiasts can delve into Python for Data Science to grasp basics, data structures, fundamentals of programming in Python along with Numpy arrays utilization. Stay ahead in the fast-evolving field by mastering Kafka's robust event streaming platform. Learn about Kafka's architecture, cluster management, producers/consumers handling as well as performance tuning. Deepen your knowledge in NoSQL databases focusing on MongoDB's CRUD operations, indexing/aggregation strategies plus Java & Node JS application development. Explore MapReduce programming model extensively along with Cassandra and other common NoSQL databases like Riak, Redis, Neo4j, and Elasticsearch for diverse data modeling approaches. Lastly, leverage Amazon Web Services (AWS) for big data collection, storage services analysis to visualization while ensuring robust security measures are in place.

Course Syllabus

Big Data Definition
History of Big Data
5 V’s
Unstructured, semi-structured, and structured data
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 1
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 2

Instructor

Tarun Sukhani Founder & CTO Teaching

Tarun Sukhani is a distinguished professional trainer and consultant with over 25 years of extensive experience in the IT and business sectors, having worked across multiple continents including the US, Europe, Asia, South America, and the Middle East. His expertise spans a wide array of domains such as Agile methodologies (Scrum, SAFe, Kanban), enterprise architecture frameworks like TOGAF and COBIT, IT service management standards including ITIL and ISO27001, as well as cybersecurity certifications like CISO and CISSP. Tarun's proficiency extends to project management frameworks such as PRINCE2 and PMP, along with cutting-edge technologies in Big Data analytics using Hadoop and Spark, data science with Python or R, and data visualization tools like Tableau. Tarun has held numerous senior development and executive roles including CIO/CTO positions where he managed large-scale IT operations for multinational corporations such as Dell, AMD, and Experian. His leadership skills have been instrumental in enhancing business operations across various functions including HR, Finance, Operations, Sales, Risk Management, Engineering/Manufacturing, and Accounting. He has also contributed significantly to regional conglomerates like Indra in the Asia Pacific region. A passionate educator at heart, Tarun has facilitated training workshops throughout Asia Pacific countries such as Malaysia, Indonesia, Philippines, Thailand, and Singapore. His training sessions cover a broad spectrum of topics from project management to strategic leadership and soft skills development. Additionally, he specializes in advanced technical subjects like software architecture design patterns for reactive microservices architectures on cloud platforms. Tarun's academic credentials are equally impressive; he graduated summa cum laude with an MSc in Information Systems and an MBA in Finance and Operations Management from Loyola University Chicago. He holds multiple Bachelor's degrees in Biology, Math & Computer Science alongside Business Administration. Further enriching his knowledge base are certifications as an Agile/Scrum trainer; Java/.NET programmer; Machine Learning specialist; InfoSec expert; Business Intelligence professional; complemented by advanced studies in AI & Blockchain from prestigious institutions like MIT & Stanford. His client portfolio includes renowned organizations such as Western Digital/Sandisk for machine learning product management projects; Singtel & CIMB for agile product development initiatives; Tenaga Nasional & JPJ for agile project management engagements among others. Tarun's dynamic approach combined with his deep-rooted passion for developing people makes him a sought-after speaker at international conferences where he shares insights on digital transformation strategies.

18 Students
212 Courses
English, Malay, Spanish
27 Years

Instructor

Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer Not Teaching

Mohammad Mehdi Lotfinejad is an accomplished Chief Data Officer with a profound expertise in data science and engineering, amassing over ten years of experience in developing data processing pipelines for enterprise insights. He is a proven leader with exceptional communication, presentation, and leadership skills, certified as an HRDF trainer with more than 15 years of experience as a computer science instructor both in academia and as a professional data science/engineering trainer in the industry. His academic credentials include a PhD in Computer Science from Universiti Malaya, Malaysia, and he holds certifications from Harvard Business School in Business Analytics. Mohammad specializes in cutting-edge technologies such as Apache Spark, MySQL, PostgreSQL, MongoDB, Snowflake, Redshift, Apache Airflow, API and Microservices, and Amazon Web Services. Currently serving as the Chief Data and Knowledge Officer at Magna.ai since February 2024, Mohammad leads the development of graph databases and data warehouses to support AI-driven law case analysis services. He architects robust API and microservice solutions to enhance system interoperability and scalability while ensuring data security and compliance with legal standards. Prior to this role, he has been contributing his expertise as a Senior Data Engineer at AXIATA Digital Advertising (ADA) since March 2020. Here, he collaborates on designing automated data pipelines using AWS RedShift and Snowflake for storing telco data and implements BI dashboards leveraging Google BigQuery. From June 2018 to February 2020, Mohammad was the Lead Senior Data Scientist Professional Trainer at The Center of Applied Data Science in Kuala Lumpur. He led teams of data scientists and engineers to design professional training programs for prominent clients like CIMB, PETRONAS, SHELL, and TNB. His earlier roles include leading big data engineering teams at RAHA in Iran where he developed large-scale analytics pipelines using Hadoop Ecosystem tools like Hive and Spark. In academia, Mohammad served as a faculty member at Payame Noor University from September 2014 to June 2018 where he supervised graduate research projects and contributed significantly to curriculum development. His tenure also includes leadership positions at Islamic Azad University where he managed departments to achieve high academic standards. With technical proficiencies spanning RDBMS like MySQL and PostgreSQL to programming languages such as Python and C++, Mohammad is adept at web design using HTML/CSS/Bootstrap alongside project management skills including Scrum Master certification. His published works include books on Object-Oriented Programming and Project Management Fundamentals along with numerous journal articles on topics ranging from solar radiation prediction models to machine learning algorithms for intrusion detection systems.

2 Students
56 Courses
18 Years

Instructor

Hadia Yousaf Data Scientist Not Teaching

Hadia Yousaf is a highly experienced IT trainer with an extensive background in Statistics, Research, Business Administration, Banking, Finance, and Economics. Currently working as a Data Scientist and Data Science Trainer, Hadia has developed expertise in Python, SQL, Machine Learning related to Business Analytics and Big Data Analytics Data Visualisation. She has proven experience working with large data sets and analytic tools, showcasing her capabilities in Statistical Analysis using Time Series and Panel Data, statistical modeling, and Machine Learning. Hadia is a HRDF Certified Trainer (TTT) and Virtual Learn Caster (VLC), certified from Malaysia. She boasts of a diverse career path which includes 2 years as a Data Scientist/Data Science Trainer, 8 years of research work as a PhD researcher, 3 years as a Senior Banking Services Officer at Allied Bank Ltd., Pakistan, 1 year as an IT Intern under the Pakistan Government Internship Program and 1 year of teaching experience as a Course Instructor. Her current role at The Center of Applied Data Science involves creating data pipelines to collect, clean, normalise and load data for statistical analysis and hypothesis testing. With an expected Ph.D. in Industrial & Development Economics from Universiti Utara Malaysia by December 2022, Hadia's academic journey also includes an MBA in Banking & Finance and a Bachelor's degree in Information Technology. Her profound understanding of statistical and machine learning concepts combined with high standard coding practices positions her as an accomplished professional in the field. As a dedicated professional willing to work remotely or travel abroad, she prides herself on her excellent English language skills.

32 Courses
15 Years

Minimum Qualification

graduate

Target Audience

entry level
engineers

Methodologies

lecture
slides
case studies
labs
group discussion
q&A

Why should you attend?

Embark on a journey through the expansive world of Big Data, where you'll unravel the complexities and harness the power of massive datasets. Begin with understanding the core terminologies and concepts that form the foundation of Big Data analytics, including its definition, history, and the pivotal 5 V's. Explore various data structures from unstructured to structured forms. Dive into real-world applications with use cases that highlight Big Data's role in data science and processing techniques. Gain practical insights into Hadoop's ecosystem, learning about its architecture, file system (HDFS), administration, and essential components. Transition smoothly into Scala programming for Big Data operations, covering object-oriented principles, case classes, collections, and idiomatic usage. Advance your skill set with a comprehensive look at Hadoop and Spark frameworks; understand distributed storage, ETL processes, MapReduce, Hive, HBase, and Spark's cutting-edge capabilities. Master Spark's core functionalities including RDDs, Spark SQL, MLLib for big data modeling, stream processing with Spark Streaming, and GraphX for graph processing. Further amplify your expertise with PySpark where you'll work with resilient distributed datasets (RDDs), dataframes, transformations, and data processing techniques. Python enthusiasts can delve into Python for Data Science to grasp basics, data structures, fundamentals of programming in Python along with Numpy arrays utilization. Stay ahead in the fast-evolving field by mastering Kafka's robust event streaming platform. Learn about Kafka's architecture, cluster management, producers/consumers handling as well as performance tuning. Deepen your knowledge in NoSQL databases focusing on MongoDB's CRUD operations, indexing/aggregation strategies plus Java & Node JS application development. Explore MapReduce programming model extensively along with Cassandra and other common NoSQL databases like Riak, Redis, Neo4j, and Elasticsearch for diverse data modeling approaches. Lastly, leverage Amazon Web Services (AWS) for big data collection, storage services analysis to visualization while ensuring robust security measures are in place.

What you'll learn

  • Operate Apache Kafka for real-time event streaming
  • Learn about Hadoop ecosystem components including HDFS
  • Acquire proficiency in Scala for Big Data tasks
  • Utilize AWS services effectively for Big Data solutions
  • Grasp the essentials of Big Data processing using Hadoop and Spark
  • Develop an understanding of NoSQL databases such as MongoDB & Cassandra
  • Identify use cases where Big Data technologies apply
  • Understand fundamental terminology and concepts of Big Data

Course Syllabus

Big Data Definition
History of Big Data
5 V’s
Unstructured, semi-structured, and structured data
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 1
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
Lunch
1 hour
Short Break
15 mins
Short Break
15 mins
Short Break
15 mins
Recap and Q&A
15 mins
End of Day 2
MYR 3500

Training Provider Pricing

Material Fees: MYR 400

Pax:

MYR 4800

Features

2 days (9:00 AM - 5:00 PM)
14 modules
21 intakes
English

Subsidies

HRDC Claimable logo

Instructors

Tarun Sukhani Founder & CTO Teaching

Tarun Sukhani is a distinguished professional trainer and consultant with over 25 years of extensive experience in the IT and business sectors, having worked across multiple continents including the US, Europe, Asia, South America, and the Middle East. His expertise spans a wide array of domains such as Agile methodologies (Scrum, SAFe, Kanban), enterprise architecture frameworks like TOGAF and COBIT, IT service management standards including ITIL and ISO27001, as well as cybersecurity certifications like CISO and CISSP. Tarun's proficiency extends to project management frameworks such as PRINCE2 and PMP, along with cutting-edge technologies in Big Data analytics using Hadoop and Spark, data science with Python or R, and data visualization tools like Tableau. Tarun has held numerous senior development and executive roles including CIO/CTO positions where he managed large-scale IT operations for multinational corporations such as Dell, AMD, and Experian. His leadership skills have been instrumental in enhancing business operations across various functions including HR, Finance, Operations, Sales, Risk Management, Engineering/Manufacturing, and Accounting. He has also contributed significantly to regional conglomerates like Indra in the Asia Pacific region. A passionate educator at heart, Tarun has facilitated training workshops throughout Asia Pacific countries such as Malaysia, Indonesia, Philippines, Thailand, and Singapore. His training sessions cover a broad spectrum of topics from project management to strategic leadership and soft skills development. Additionally, he specializes in advanced technical subjects like software architecture design patterns for reactive microservices architectures on cloud platforms. Tarun's academic credentials are equally impressive; he graduated summa cum laude with an MSc in Information Systems and an MBA in Finance and Operations Management from Loyola University Chicago. He holds multiple Bachelor's degrees in Biology, Math & Computer Science alongside Business Administration. Further enriching his knowledge base are certifications as an Agile/Scrum trainer; Java/.NET programmer; Machine Learning specialist; InfoSec expert; Business Intelligence professional; complemented by advanced studies in AI & Blockchain from prestigious institutions like MIT & Stanford. His client portfolio includes renowned organizations such as Western Digital/Sandisk for machine learning product management projects; Singtel & CIMB for agile product development initiatives; Tenaga Nasional & JPJ for agile project management engagements among others. Tarun's dynamic approach combined with his deep-rooted passion for developing people makes him a sought-after speaker at international conferences where he shares insights on digital transformation strategies.

18 Students
212 Courses
English, Malay, Spanish
27 Years
Mohammad Mehdi Lotfinejad Certified Data Science Trainer and Data Engineer Not Teaching

Mohammad Mehdi Lotfinejad is an accomplished Chief Data Officer with a profound expertise in data science and engineering, amassing over ten years of experience in developing data processing pipelines for enterprise insights. He is a proven leader with exceptional communication, presentation, and leadership skills, certified as an HRDF trainer with more than 15 years of experience as a computer science instructor both in academia and as a professional data science/engineering trainer in the industry. His academic credentials include a PhD in Computer Science from Universiti Malaya, Malaysia, and he holds certifications from Harvard Business School in Business Analytics. Mohammad specializes in cutting-edge technologies such as Apache Spark, MySQL, PostgreSQL, MongoDB, Snowflake, Redshift, Apache Airflow, API and Microservices, and Amazon Web Services. Currently serving as the Chief Data and Knowledge Officer at Magna.ai since February 2024, Mohammad leads the development of graph databases and data warehouses to support AI-driven law case analysis services. He architects robust API and microservice solutions to enhance system interoperability and scalability while ensuring data security and compliance with legal standards. Prior to this role, he has been contributing his expertise as a Senior Data Engineer at AXIATA Digital Advertising (ADA) since March 2020. Here, he collaborates on designing automated data pipelines using AWS RedShift and Snowflake for storing telco data and implements BI dashboards leveraging Google BigQuery. From June 2018 to February 2020, Mohammad was the Lead Senior Data Scientist Professional Trainer at The Center of Applied Data Science in Kuala Lumpur. He led teams of data scientists and engineers to design professional training programs for prominent clients like CIMB, PETRONAS, SHELL, and TNB. His earlier roles include leading big data engineering teams at RAHA in Iran where he developed large-scale analytics pipelines using Hadoop Ecosystem tools like Hive and Spark. In academia, Mohammad served as a faculty member at Payame Noor University from September 2014 to June 2018 where he supervised graduate research projects and contributed significantly to curriculum development. His tenure also includes leadership positions at Islamic Azad University where he managed departments to achieve high academic standards. With technical proficiencies spanning RDBMS like MySQL and PostgreSQL to programming languages such as Python and C++, Mohammad is adept at web design using HTML/CSS/Bootstrap alongside project management skills including Scrum Master certification. His published works include books on Object-Oriented Programming and Project Management Fundamentals along with numerous journal articles on topics ranging from solar radiation prediction models to machine learning algorithms for intrusion detection systems.

2 Students
56 Courses
18 Years
Hadia Yousaf Data Scientist Not Teaching

Hadia Yousaf is a highly experienced IT trainer with an extensive background in Statistics, Research, Business Administration, Banking, Finance, and Economics. Currently working as a Data Scientist and Data Science Trainer, Hadia has developed expertise in Python, SQL, Machine Learning related to Business Analytics and Big Data Analytics Data Visualisation. She has proven experience working with large data sets and analytic tools, showcasing her capabilities in Statistical Analysis using Time Series and Panel Data, statistical modeling, and Machine Learning. Hadia is a HRDF Certified Trainer (TTT) and Virtual Learn Caster (VLC), certified from Malaysia. She boasts of a diverse career path which includes 2 years as a Data Scientist/Data Science Trainer, 8 years of research work as a PhD researcher, 3 years as a Senior Banking Services Officer at Allied Bank Ltd., Pakistan, 1 year as an IT Intern under the Pakistan Government Internship Program and 1 year of teaching experience as a Course Instructor. Her current role at The Center of Applied Data Science involves creating data pipelines to collect, clean, normalise and load data for statistical analysis and hypothesis testing. With an expected Ph.D. in Industrial & Development Economics from Universiti Utara Malaysia by December 2022, Hadia's academic journey also includes an MBA in Banking & Finance and a Bachelor's degree in Information Technology. Her profound understanding of statistical and machine learning concepts combined with high standard coding practices positions her as an accomplished professional in the field. As a dedicated professional willing to work remotely or travel abroad, she prides herself on her excellent English language skills.

32 Courses
15 Years

Minimum Qualification

graduate

Target Audience

entry level
engineers

Methodologies

lecture
slides
case studies
labs
group discussion
q&A
Close menu