Big Data for Beginners
Master the art of Big Data analytics with our comprehensive course designed for aspiring professionals. Delve into Hadoop ecosystems, Scala programming intricacies and elevate your skills with hands-on Spark and PySpark training. Unravel NoSQL databases' potential including MongoDB & Cassandra while leveraging AWS services for unparalleled data handling.

Training Provider Pricing
Pax:
Features
Subsidies

What you'll learn
- Operate Apache Kafka for real-time event streaming
- Learn about Hadoop ecosystem components including HDFS
- Acquire proficiency in Scala for Big Data tasks
- Utilize AWS services effectively for Big Data solutions
- Grasp the essentials of Big Data processing using Hadoop and Spark
- Develop an understanding of NoSQL databases such as MongoDB & Cassandra
- Identify use cases where Big Data technologies apply
- Understand fundamental terminology and concepts of Big Data
Why should you attend?
Embark on a journey through the expansive world of Big Data, where you'll unravel the complexities and harness the power of massive datasets. Begin with understanding the core terminologies and concepts that form the foundation of Big Data analytics, including its definition, history, and the pivotal 5 V's. Explore various data structures from unstructured to structured forms. Dive into real-world applications with use cases that highlight Big Data's role in data science and processing techniques. Gain practical insights into Hadoop's ecosystem, learning about its architecture, file system (HDFS), administration, and essential components. Transition smoothly into Scala programming for Big Data operations, covering object-oriented principles, case classes, collections, and idiomatic usage. Advance your skill set with a comprehensive look at Hadoop and Spark frameworks; understand distributed storage, ETL processes, MapReduce, Hive, HBase, and Spark's cutting-edge capabilities. Master Spark's core functionalities including RDDs, Spark SQL, MLLib for big data modeling, stream processing with Spark Streaming, and GraphX for graph processing. Further amplify your expertise with PySpark where you'll work with resilient distributed datasets (RDDs), dataframes, transformations, and data processing techniques. Python enthusiasts can delve into Python for Data Science to grasp basics, data structures, fundamentals of programming in Python along with Numpy arrays utilization. Stay ahead in the fast-evolving field by mastering Kafka's robust event streaming platform. Learn about Kafka's architecture, cluster management, producers/consumers handling as well as performance tuning. Deepen your knowledge in NoSQL databases focusing on MongoDB's CRUD operations, indexing/aggregation strategies plus Java & Node JS application development. Explore MapReduce programming model extensively along with Cassandra and other common NoSQL databases like Riak, Redis, Neo4j, and Elasticsearch for diverse data modeling approaches. Lastly, leverage Amazon Web Services (AWS) for big data collection, storage services analysis to visualization while ensuring robust security measures are in place.
Course Syllabus
Short Break
15 minsShort Break
15 minsRecap and Q&A
15 minsLunch
1 hourShort Break
15 minsShort Break
15 minsShort Break
15 minsRecap and Q&A
15 minsEnd of Day 1
Short Break
15 minsShort Break
15 minsRecap and Q&A
15 minsLunch
1 hourShort Break
15 minsShort Break
15 minsShort Break
15 minsRecap and Q&A
15 minsEnd of Day 2
Ratings and Reviews
Instructor
Tarun Sukhani is a distinguished professional trainer and consultant with over 25 years of extensive experience in the IT and business sectors, having worked across multiple continents including the US, Europe, Asia, South America, and the Middle East. His expertise spans a wide array of domains such as Agile methodologies (Scrum, SAFe, Kanban), enterprise architecture frameworks like TOGAF and COBIT, IT service management standards including ITIL and ISO27001, as well as cybersecurity certifications like CISO and CISSP. Tarun's proficiency extends to project management frameworks such as PRINCE2 and PMP, along with cutting-edge technologies in Big Data analytics using Hadoop and Spark, data science with Python or R, and data visualization tools like Tableau. Tarun has held numerous senior development and executive roles including CIO/CTO positions where he managed large-scale IT operations for multinational corporations such as Dell, AMD, and Experian. His leadership skills have been instrumental in enhancing business operations across various functions including HR, Finance, Operations, Sales, Risk Management, Engineering/Manufacturing, and Accounting. He has also contributed significantly to regional conglomerates like Indra in the Asia Pacific region. A passionate educator at heart, Tarun has facilitated training workshops throughout Asia Pacific countries such as Malaysia, Indonesia, Philippines, Thailand, and Singapore. His training sessions cover a broad spectrum of topics from project management to strategic leadership and soft skills development. Additionally, he specializes in advanced technical subjects like software architecture design patterns for reactive microservices architectures on cloud platforms. Tarun's academic credentials are equally impressive; he graduated summa cum laude with an MSc in Information Systems and an MBA in Finance and Operations Management from Loyola University Chicago. He holds multiple Bachelor's degrees in Biology, Math & Computer Science alongside Business Administration. Further enriching his knowledge base are certifications as an Agile/Scrum trainer; Java/.NET programmer; Machine Learning specialist; InfoSec expert; Business Intelligence professional; complemented by advanced studies in AI & Blockchain from prestigious institutions like MIT & Stanford. His client portfolio includes renowned organizations such as Western Digital/Sandisk for machine learning product management projects; Singtel & CIMB for agile product development initiatives; Tenaga Nasional & JPJ for agile project management engagements among others. Tarun's dynamic approach combined with his deep-rooted passion for developing people makes him a sought-after speaker at international conferences where he shares insights on digital transformation strategies.
Instructor
Mohammad Mehdi Lotfinejad is an accomplished Chief Data Officer with a profound expertise in data science and engineering, amassing over ten years of experience in developing data processing pipelines for enterprise insights. He is a proven leader with exceptional communication, presentation, and leadership skills, certified as an HRDF trainer with more than 15 years of experience as a computer science instructor both in academia and as a professional data science/engineering trainer in the industry. His academic credentials include a PhD in Computer Science from Universiti Malaya, Malaysia, and he holds certifications from Harvard Business School in Business Analytics. Mohammad specializes in cutting-edge technologies such as Apache Spark, MySQL, PostgreSQL, MongoDB, Snowflake, Redshift, Apache Airflow, API and Microservices, and Amazon Web Services. Currently serving as the Chief Data and Knowledge Officer at Magna.ai since February 2024, Mohammad leads the development of graph databases and data warehouses to support AI-driven law case analysis services. He architects robust API and microservice solutions to enhance system interoperability and scalability while ensuring data security and compliance with legal standards. Prior to this role, he has been contributing his expertise as a Senior Data Engineer at AXIATA Digital Advertising (ADA) since March 2020. Here, he collaborates on designing automated data pipelines using AWS RedShift and Snowflake for storing telco data and implements BI dashboards leveraging Google BigQuery. From June 2018 to February 2020, Mohammad was the Lead Senior Data Scientist Professional Trainer at The Center of Applied Data Science in Kuala Lumpur. He led teams of data scientists and engineers to design professional training programs for prominent clients like CIMB, PETRONAS, SHELL, and TNB. His earlier roles include leading big data engineering teams at RAHA in Iran where he developed large-scale analytics pipelines using Hadoop Ecosystem tools like Hive and Spark. In academia, Mohammad served as a faculty member at Payame Noor University from September 2014 to June 2018 where he supervised graduate research projects and contributed significantly to curriculum development. His tenure also includes leadership positions at Islamic Azad University where he managed departments to achieve high academic standards. With technical proficiencies spanning RDBMS like MySQL and PostgreSQL to programming languages such as Python and C++, Mohammad is adept at web design using HTML/CSS/Bootstrap alongside project management skills including Scrum Master certification. His published works include books on Object-Oriented Programming and Project Management Fundamentals along with numerous journal articles on topics ranging from solar radiation prediction models to machine learning algorithms for intrusion detection systems.
Instructor
Hadia Yousaf is a highly experienced IT trainer with an extensive background in Statistics, Research, Business Administration, Banking, Finance, and Economics. Currently working as a Data Scientist and Data Science Trainer, Hadia has developed expertise in Python, SQL, Machine Learning related to Business Analytics and Big Data Analytics Data Visualisation. She has proven experience working with large data sets and analytic tools, showcasing her capabilities in Statistical Analysis using Time Series and Panel Data, statistical modeling, and Machine Learning. Hadia is a HRDF Certified Trainer (TTT) and Virtual Learn Caster (VLC), certified from Malaysia. She boasts of a diverse career path which includes 2 years as a Data Scientist/Data Science Trainer, 8 years of research work as a PhD researcher, 3 years as a Senior Banking Services Officer at Allied Bank Ltd., Pakistan, 1 year as an IT Intern under the Pakistan Government Internship Program and 1 year of teaching experience as a Course Instructor. Her current role at The Center of Applied Data Science involves creating data pipelines to collect, clean, normalise and load data for statistical analysis and hypothesis testing. With an expected Ph.D. in Industrial & Development Economics from Universiti Utara Malaysia by December 2022, Hadia's academic journey also includes an MBA in Banking & Finance and a Bachelor's degree in Information Technology. Her profound understanding of statistical and machine learning concepts combined with high standard coding practices positions her as an accomplished professional in the field. As a dedicated professional willing to work remotely or travel abroad, she prides herself on her excellent English language skills.
Minimum Qualification
Target Audience
Methodologies
Why should you attend?
Embark on a journey through the expansive world of Big Data, where you'll unravel the complexities and harness the power of massive datasets. Begin with understanding the core terminologies and concepts that form the foundation of Big Data analytics, including its definition, history, and the pivotal 5 V's. Explore various data structures from unstructured to structured forms. Dive into real-world applications with use cases that highlight Big Data's role in data science and processing techniques. Gain practical insights into Hadoop's ecosystem, learning about its architecture, file system (HDFS), administration, and essential components. Transition smoothly into Scala programming for Big Data operations, covering object-oriented principles, case classes, collections, and idiomatic usage. Advance your skill set with a comprehensive look at Hadoop and Spark frameworks; understand distributed storage, ETL processes, MapReduce, Hive, HBase, and Spark's cutting-edge capabilities. Master Spark's core functionalities including RDDs, Spark SQL, MLLib for big data modeling, stream processing with Spark Streaming, and GraphX for graph processing. Further amplify your expertise with PySpark where you'll work with resilient distributed datasets (RDDs), dataframes, transformations, and data processing techniques. Python enthusiasts can delve into Python for Data Science to grasp basics, data structures, fundamentals of programming in Python along with Numpy arrays utilization. Stay ahead in the fast-evolving field by mastering Kafka's robust event streaming platform. Learn about Kafka's architecture, cluster management, producers/consumers handling as well as performance tuning. Deepen your knowledge in NoSQL databases focusing on MongoDB's CRUD operations, indexing/aggregation strategies plus Java & Node JS application development. Explore MapReduce programming model extensively along with Cassandra and other common NoSQL databases like Riak, Redis, Neo4j, and Elasticsearch for diverse data modeling approaches. Lastly, leverage Amazon Web Services (AWS) for big data collection, storage services analysis to visualization while ensuring robust security measures are in place.
What you'll learn
- Operate Apache Kafka for real-time event streaming
- Learn about Hadoop ecosystem components including HDFS
- Acquire proficiency in Scala for Big Data tasks
- Utilize AWS services effectively for Big Data solutions
- Grasp the essentials of Big Data processing using Hadoop and Spark
- Develop an understanding of NoSQL databases such as MongoDB & Cassandra
- Identify use cases where Big Data technologies apply
- Understand fundamental terminology and concepts of Big Data
Course Syllabus
Short Break
15 minsShort Break
15 minsRecap and Q&A
15 minsLunch
1 hourShort Break
15 minsShort Break
15 minsShort Break
15 minsRecap and Q&A
15 minsEnd of Day 1
Short Break
15 minsShort Break
15 minsRecap and Q&A
15 minsLunch
1 hourShort Break
15 minsShort Break
15 minsShort Break
15 minsRecap and Q&A
15 minsEnd of Day 2
Training Provider Pricing
Pax:
Features
Subsidies

Ratings and Reviews
Instructors
Tarun Sukhani is a distinguished professional trainer and consultant with over 25 years of extensive experience in the IT and business sectors, having worked across multiple continents including the US, Europe, Asia, South America, and the Middle East. His expertise spans a wide array of domains such as Agile methodologies (Scrum, SAFe, Kanban), enterprise architecture frameworks like TOGAF and COBIT, IT service management standards including ITIL and ISO27001, as well as cybersecurity certifications like CISO and CISSP. Tarun's proficiency extends to project management frameworks such as PRINCE2 and PMP, along with cutting-edge technologies in Big Data analytics using Hadoop and Spark, data science with Python or R, and data visualization tools like Tableau. Tarun has held numerous senior development and executive roles including CIO/CTO positions where he managed large-scale IT operations for multinational corporations such as Dell, AMD, and Experian. His leadership skills have been instrumental in enhancing business operations across various functions including HR, Finance, Operations, Sales, Risk Management, Engineering/Manufacturing, and Accounting. He has also contributed significantly to regional conglomerates like Indra in the Asia Pacific region. A passionate educator at heart, Tarun has facilitated training workshops throughout Asia Pacific countries such as Malaysia, Indonesia, Philippines, Thailand, and Singapore. His training sessions cover a broad spectrum of topics from project management to strategic leadership and soft skills development. Additionally, he specializes in advanced technical subjects like software architecture design patterns for reactive microservices architectures on cloud platforms. Tarun's academic credentials are equally impressive; he graduated summa cum laude with an MSc in Information Systems and an MBA in Finance and Operations Management from Loyola University Chicago. He holds multiple Bachelor's degrees in Biology, Math & Computer Science alongside Business Administration. Further enriching his knowledge base are certifications as an Agile/Scrum trainer; Java/.NET programmer; Machine Learning specialist; InfoSec expert; Business Intelligence professional; complemented by advanced studies in AI & Blockchain from prestigious institutions like MIT & Stanford. His client portfolio includes renowned organizations such as Western Digital/Sandisk for machine learning product management projects; Singtel & CIMB for agile product development initiatives; Tenaga Nasional & JPJ for agile project management engagements among others. Tarun's dynamic approach combined with his deep-rooted passion for developing people makes him a sought-after speaker at international conferences where he shares insights on digital transformation strategies.
Mohammad Mehdi Lotfinejad is an accomplished Chief Data Officer with a profound expertise in data science and engineering, amassing over ten years of experience in developing data processing pipelines for enterprise insights. He is a proven leader with exceptional communication, presentation, and leadership skills, certified as an HRDF trainer with more than 15 years of experience as a computer science instructor both in academia and as a professional data science/engineering trainer in the industry. His academic credentials include a PhD in Computer Science from Universiti Malaya, Malaysia, and he holds certifications from Harvard Business School in Business Analytics. Mohammad specializes in cutting-edge technologies such as Apache Spark, MySQL, PostgreSQL, MongoDB, Snowflake, Redshift, Apache Airflow, API and Microservices, and Amazon Web Services. Currently serving as the Chief Data and Knowledge Officer at Magna.ai since February 2024, Mohammad leads the development of graph databases and data warehouses to support AI-driven law case analysis services. He architects robust API and microservice solutions to enhance system interoperability and scalability while ensuring data security and compliance with legal standards. Prior to this role, he has been contributing his expertise as a Senior Data Engineer at AXIATA Digital Advertising (ADA) since March 2020. Here, he collaborates on designing automated data pipelines using AWS RedShift and Snowflake for storing telco data and implements BI dashboards leveraging Google BigQuery. From June 2018 to February 2020, Mohammad was the Lead Senior Data Scientist Professional Trainer at The Center of Applied Data Science in Kuala Lumpur. He led teams of data scientists and engineers to design professional training programs for prominent clients like CIMB, PETRONAS, SHELL, and TNB. His earlier roles include leading big data engineering teams at RAHA in Iran where he developed large-scale analytics pipelines using Hadoop Ecosystem tools like Hive and Spark. In academia, Mohammad served as a faculty member at Payame Noor University from September 2014 to June 2018 where he supervised graduate research projects and contributed significantly to curriculum development. His tenure also includes leadership positions at Islamic Azad University where he managed departments to achieve high academic standards. With technical proficiencies spanning RDBMS like MySQL and PostgreSQL to programming languages such as Python and C++, Mohammad is adept at web design using HTML/CSS/Bootstrap alongside project management skills including Scrum Master certification. His published works include books on Object-Oriented Programming and Project Management Fundamentals along with numerous journal articles on topics ranging from solar radiation prediction models to machine learning algorithms for intrusion detection systems.
Hadia Yousaf is a highly experienced IT trainer with an extensive background in Statistics, Research, Business Administration, Banking, Finance, and Economics. Currently working as a Data Scientist and Data Science Trainer, Hadia has developed expertise in Python, SQL, Machine Learning related to Business Analytics and Big Data Analytics Data Visualisation. She has proven experience working with large data sets and analytic tools, showcasing her capabilities in Statistical Analysis using Time Series and Panel Data, statistical modeling, and Machine Learning. Hadia is a HRDF Certified Trainer (TTT) and Virtual Learn Caster (VLC), certified from Malaysia. She boasts of a diverse career path which includes 2 years as a Data Scientist/Data Science Trainer, 8 years of research work as a PhD researcher, 3 years as a Senior Banking Services Officer at Allied Bank Ltd., Pakistan, 1 year as an IT Intern under the Pakistan Government Internship Program and 1 year of teaching experience as a Course Instructor. Her current role at The Center of Applied Data Science involves creating data pipelines to collect, clean, normalise and load data for statistical analysis and hypothesis testing. With an expected Ph.D. in Industrial & Development Economics from Universiti Utara Malaysia by December 2022, Hadia's academic journey also includes an MBA in Banking & Finance and a Bachelor's degree in Information Technology. Her profound understanding of statistical and machine learning concepts combined with high standard coding practices positions her as an accomplished professional in the field. As a dedicated professional willing to work remotely or travel abroad, she prides herself on her excellent English language skills.
Minimum Qualification
Target Audience
Methodologies
Become an Instructor
Teach what you love. Abundent Academy gives you the tools you need to run your own trainings! We provide you with the platform, the students, the materials, and the support you need to succeed!
Start TeachingAcademy for Business
Get unlimited access to over 150 of Abundent Academy's carefully curated courses for your team, all organized according to job category and role! We can also advertise your open job positions to our community of thousands of developers!
Join TodayTop companies choose Academy for Business
Newsletter
© 2025 Abundent Sdn Bhd. All Rights Reserved.