Dark | Light
# ![@decodethedataai Avatar](https://lunarcrush.com/gi/w:26/cr:youtube::UCA9NCDBdIvRwo-IyOzQc5zw.png) @decodethedataai DECODE THE DATA

DECODE THE DATA posts on YouTube about databricks, share, ai, [----] the most. They currently have [-----] followers and [---] posts still getting attention that total [---] engagements in the last [--] hours.

### Engagements: [---] [#](/creator/youtube::UCA9NCDBdIvRwo-IyOzQc5zw/interactions)
![Engagements Line Chart](https://lunarcrush.com/gi/w:600/cr:youtube::UCA9NCDBdIvRwo-IyOzQc5zw/c:line/m:interactions.svg)

- [--] Week [---] +460%
- [--] Month [---] -44%
- [--] Months [-----] +95%
- [--] Year [-----] +5,946%

### Mentions: [--] [#](/creator/youtube::UCA9NCDBdIvRwo-IyOzQc5zw/posts_active)
![Mentions Line Chart](https://lunarcrush.com/gi/w:600/cr:youtube::UCA9NCDBdIvRwo-IyOzQc5zw/c:line/m:posts_active.svg)

- [--] Week [--] +250%
- [--] Month [--] +11%
- [--] Months [--] +26%
- [--] Year [--] +914%

### Followers: [-----] [#](/creator/youtube::UCA9NCDBdIvRwo-IyOzQc5zw/followers)
![Followers Line Chart](https://lunarcrush.com/gi/w:600/cr:youtube::UCA9NCDBdIvRwo-IyOzQc5zw/c:line/m:followers.svg)

- [--] Week [-----] +1.90%
- [--] Month [-----] +3.90%
- [--] Months [-----] +67%
- [--] Year [-----] +250%

### CreatorRank: undefined [#](/creator/youtube::UCA9NCDBdIvRwo-IyOzQc5zw/influencer_rank)
![CreatorRank Line Chart](https://lunarcrush.com/gi/w:600/cr:youtube::UCA9NCDBdIvRwo-IyOzQc5zw/c:line/m:influencer_rank.svg)

### Social Influence

**Social category influence**
[technology brands](/list/technology-brands)  [social networks](/list/social-networks) 

**Social topic influence**
[databricks](/topic/databricks), [share](/topic/share), [ai](/topic/ai), [6969](/topic/6969), [llm](/topic/llm), [bigdata](/topic/bigdata), [what is](/topic/what-is), [chain](/topic/chain), [education](/topic/education), [frame](/topic/frame)

**Top accounts mentioned or mentioned by**
[@learnomate](/creator/undefined)
### Top Social Posts
Top posts by engagements in the last [--] hours

"Video [--] - What is the need of Lang Chain #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm pyspark apachespark databricks coding learnpyspark"  
[YouTube Link](https://youtube.com/watch?v=Wf3V8L3F6Ko)  2026-02-14T10:36Z [----] followers, [--] engagements


"Video [--] - A to Z About Data Bricks Notebooks LinkedIn Post about User User Groups and Service Principal - https://www.linkedin.com/posts/ganesh-kudale-50bb14ab_youtube-contentcreator-pyspark-activity-7421228245473329152-VDdTutm_source=share&utm_medium=member_desktop&rcm=ACoAABd64tsBZlYaSR7w7vs9gd-HLFllhpPToqQ #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud"  
[YouTube Link](https://youtube.com/watch?v=aPaYggUauP8)  2026-02-14T09:21Z [----] followers, [--] engagements


"Video [--] - Components of Lang Chain - Part [--] #pyspark #apachespark #databricks Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero/ Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&index=1 Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&index=1 Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&index=1 PySpark Zero to"  
[YouTube Link](https://youtube.com/watch?v=bgITOWsw2WU)  2026-02-15T10:16Z [----] followers, [--] engagements


"Why Python In this video we build a strong intuition around why Python is so important in Data Engineering. Youll understand how and where Python is actually used in real-world data engineering use cases from data ingestion and transformation to building scalable data pipelines. Instead of just theory this video focuses on giving you the right context and mindset needed to learn Python with purpose. #python #dataengineering #coding python Data Engineering Data Pipelines coding python Data Engineering Data Pipelines coding"  
[YouTube Link](https://youtube.com/watch?v=3TCgqNKWLpo)  2025-12-14T16:36Z [----] followers, [---] engagements


"Video [--] - Why DataBricks Video [--] - Why DataBricks Company - A person with [--] Lakh salary can complete the project in [--] months [--] Lakhs - Cost Monolithic Systems - Single system with all parts tightly coupled. If one-part breaks the system will fail. One big system holding full power. Vertical Scaling - is not time efficient is not cost efficient Pros - Simple to develop and handle Easy to debug and test Cons Non efficient scalability Hard to maintain and upgrade non fault tolerant Distributed Systems - Pros - Fault Tolerant Efficient Scaling - Horizontal Scaling Parallel Processing Cons -"  
[YouTube Link](https://youtube.com/watch?v=54Q-8vT6QCM)  2025-11-08T15:25Z [----] followers, [---] engagements


"Session [--] - Grouping Aggregations on Multiple Columns in PySpark Session [--] - Grouping Aggregations on Multiple Columns in PySpark emp_id emp_name emp_salary emp_department [--] Ganesh [-----] [--] [--] Ganesh1 [-----] [--] [--] Ramesh [-----] [--] [--] Pritesh [-----] [--] [--] Priyesh [-----] [--] Dep [--] Dep [--] Dep [--] emp_id emp_name emp_salary emp_department emp_city [--] Ganesh [-----] [--] Pune [--] Ganesh1 [-----] [--] Pune [--] Ramesh [-----] [--] Mumbai [--] Pritesh [-----] [--] Chennai [--] Priyesh [-----] [--] Pune grouping based on emp_city and emp_department Pune and dep [--] Pune and dep [--] Mumbai and dep [--] Chennai and dep [--] [--]. How much every department is"  
[YouTube Link](https://youtube.com/watch?v=Aj9hazv4jck)  2025-02-22T07:45Z [---] followers, [--] engagements


"Video [--] - What is Data Lakehouse #pyspark Video [--] - What is Data Lakehouse DataBase - OLTP Online Transaction - Banking Databases are made to store transactional data. cust_id trn_id trn_amount trn_type Databases are suitable for day-to-day transactions. Not made to store historical data. Data Bases are more costly. Data Warehouse - OLAP - Analytical Platform - DWH stores structured historical data for analytical purposes. Data in DWH is cleaned and organized. Data in DWH is best for reporting dashboards and Business Intelligence. DWH are less costly as compared to Data bases. Data Lake Data"  
[YouTube Link](https://youtube.com/watch?v=CXMvSQLoz9M)  2025-12-07T16:12Z [----] followers, [---] engagements


"What is Machine Learning What is Machine Learning Village - [---] houses - [----] people Grocery - City Centre - [--] kms Problem Statement - To buy groceries villagers had to go [--] kms from village. Ganesh went to city centre - [--] items - [---] pieces of [--] items - It generally takes [--] days time. On 3rd day he will receive the items. 5-6 days - [---] pieces of each item got exhausted It generally takes [--] days time. On 3rd day he will receive the items. 2-3 days the grocery shop had no item to sell. This pattern ganesh observed for 2-3 weeks. Observation and action taken - He started giving the order"  
[YouTube Link](https://youtube.com/watch?v=FZTMVdvxF3w)  2025-12-14T06:29Z [----] followers, [--] engagements


"Video [--] - What is DataBricks Video [--] - What is Data Bricks Data Bricks - Apache Spark designed and optimised to work on cloud. Official Definition - Data Bricks is Unified platform for building deploying sharing and maintaining enterprise-grade data analytics and AI solutions at scale. Why Unified [--]. Lake House Architecture - It contains the best of Data Lake and DWH. - All type of data in one place. - ACID Enforcement schema evolutions and time travel. [--]. Integrated Workflows - Running workloads within Data Bricks is available. [--]. Multi Language Support - Python SQL Scala and R. - Data"  
[YouTube Link](https://youtube.com/watch?v=IbD4EKzOrfg)  2025-11-23T13:53Z [----] followers, [---] engagements


"#coding #pyspark #apachespark #databricks #education #viralvideo #vlog #viralshorts #newvideo #learn"  
[YouTube Link](https://youtube.com/watch?v=Ie7XylDfjZE)  2025-02-14T15:51Z [---] followers, [--] engagements


"Video [--] - What is Generative AI Video [--] - What is Generative AI Play List Name - Generative AI Using Lang Chain Definition - Gen AI is a type of AI (Artificial Intelligence) that can create a new content on it's own. Text - Stories Emails Articles Images - Painting Photo for Product Audio/Video - Code - Gen AI is AI that can generate something new based on what it has learnt in past. LLM - Large Language Model Huge Model which learns every time from the huge data it understands the data and patterns generates human readable text. Large - Trained on Massive data Language - Focused on text:"  
[YouTube Link](https://youtube.com/watch?v=KkjjRvzmq7s)  2025-12-28T15:21Z [----] followers, [---] engagements


"Video [--] - Creating Azure Data Bricks Service #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"  
[YouTube Link](https://youtube.com/watch?v=Kte9e5tah-8)  2026-01-03T13:41Z [----] followers, [--] engagements


"Challenge [--] #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support"  
[YouTube Link](https://youtube.com/watch?v=LYHRx8GmTos)  2025-11-01T11:42Z [----] followers, [--] engagements


"Session [--] - Running multiple grouping aggregations together Session [--] - Running multiple grouping aggregations together [--]. How much every department is spending on employee salary in each city [--]. Average employee salary based on department in each city. [--]. Maximum employee salary based on department in each city. [--]. Minimum employee salary based on department in each city. emp_results_df = emp_df .groupBy(F.col("emp_dep")F.col("emp_city"))"  
[YouTube Link](https://youtube.com/watch?v=LcQ0m65Fpg0)  2025-02-22T09:30Z [---] followers, [--] engagements


"Internals of Python - Part [--] - Everything is object in Python Ever wondered what Everything is an object in Python really means In this video we break down this core concept by explaining identity type and value of Python objects and how numbers strings lists functions and classes all follow the same object model. This understanding is crucial if you want to: [--]. Write better Python code [--]. Understand Python internals [--]. Grow as a Data Engineer or Python developer"  
[YouTube Link](https://youtube.com/watch?v=TjUdPBaJ3UI)  2025-12-28T18:45Z [----] followers, [---] engagements


"Video [--] - Creating Azure Cloud Account #pyspark Navigate to https://portal.azure.com/ Basics Tab - Resource Group - Logical grouping of resources. Project Environment Pricing Tier [--]. Standard Available - Core Features Apache Spark Engine Notebooks Job Scheduling Delta Lake Support Cluster Management Monitoring Not Available - Security and Governance Role Based Access Control Unity Cayalog Audit Logs Premium - Available - All Above features available in Standard Security and Governance Role Based Access Control Unity Catalog Audit Logs Workspace Type [--]. Hybrid - Can use cluster and storage"  
[YouTube Link](https://youtube.com/watch?v=_GiHfdmPpgw)  2026-01-01T12:06Z [----] followers, [--] engagements


"Internals of Python - Part [--] - How Python code gets executed internally This video explains how Python code is executed internally diving deep into Pythons execution model. We cover key internals such as source code compilation bytecode generation and how the Python interpreter (PVM) executes bytecode step by step. If you found this video helpful please like the video and share it with others who want to understand Python beyond the surface. python pythonforde dataengineering data ai genai llm python pythonforde dataengineering data ai genai llm"  
[YouTube Link](https://youtube.com/watch?v=cYOr5VjukAI)  2025-12-21T09:29Z [----] followers, [---] engagements


"Challenge [--] - Continuation [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. [--] - If [--] elements in the array - same as above If middle name is not there array will have [--] elements middle name will be null first_name position will be [--] last_name position will be [--] age position will be [--] [--]. The delimiter can be comma or pipe. Code should handle that case as well - We will have to replace the delimiter pipe or comma with space from pyspark.sql import functions as F website_data = ("Ganesh Kudale 31")"  
[YouTube Link](https://youtube.com/watch?v=heCo6seFB38)  2025-10-25T00:14Z [----] followers, [--] engagements


"Video [--] - What is Lang Chain #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm langchaintutorial pyspark apachespark databricks coding learnpyspark python"  
[YouTube Link](https://youtube.com/watch?v=jmoNS_5Zu0U)  2026-01-10T16:54Z [----] followers, [--] engagements


"Video [--] - Working of Data Lake house #pyspark Video [--] - Working of Data Lake House Data Lake House uses [--] key technologies - [--]. Delta Lake - Optimized storage layer the supports ACID transactions and schema enforcement and evolutions [--]. Unity Catalog - Unified governance solution for data and AI Working Of Data Lakehouse - Data Ingestion - Data from multiple sources is dumped in multiple formats. This data can be Batch data or streaming data. This is first logical layer provides the place for the data to land in raw format. Making it single source of truth for raw data. Raw data can be put"  
[YouTube Link](https://youtube.com/watch?v=qL7cOw8otA0)  2025-12-18T18:10Z [----] followers, [---] engagements


"Session [--] - Windowing Aggregations in PySpark - Rank Session [--] - Windowing Aggregations in PySpark - Rank marks row_number rank [---] [--] [--] [---] [--] [--] [---] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] from pyspark.sql.window import Window rank_window = Window.partitionBy(F.col("subject")) .orderBy(F.desc(F.col("marks"))) marks_df1 = marks_df.withColumn("rank_number"F.rank().over(rank_window)) marks_df2 = marks_df.withColumn("row_number"F.row_number().over(rank_window)) marks_df.createOrReplaceTempView("marks") spark.sql("SELECT student_namesubjectmarksrank() OVER (PARTITION BY"  
[YouTube Link](https://youtube.com/watch?v=uZmxEB3sObw)  2025-02-22T11:30Z [---] followers, [--] engagements


"Challenge - [--] Challenge - [--] [--]. Consider there is data that is getting received from website in string format. That data contains first_name middle_namelast_name and age of the customer. Write PySpark code to separate them into separate columns. [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. website_data = ("Ganesh Ramdas Kudale 31") ("Akshay Ramdas Kudale 28") ("Ojas Ganesh Kudale 1.5") split - it gives array of strings - (columndelimiter) substring_index - gives the results based position we specify"  
[YouTube Link](https://youtube.com/watch?v=vArJdgnnrjk)  2025-10-20T12:05Z [----] followers, [--] engagements


"Video [--] - Introduction to Data Bricks UI Video [--] - Introduction to Data Bricks UI Lakehouse App - For technical users Data Engineers Data Scientists ML Engineers Create and Run the Notebooks Build ETL Pipelines Create and Manage the Jobs Train ML models Work with Delta Table Data Bricks One - For Business Users Analysts Decision Makers View Dashboards - Ask question in Natural Language - Why Switch Apps in same workspace Delta Table - PRN Symptoms Summary Diagnosis Summary SELECT * FROM final_table WHERE PRN = [-----] Give me the summary for patient with PRN [-----] #pyspark #apachespark"  
[YouTube Link](https://youtube.com/watch?v=P1ob1SeMn4A)  2026-02-01T14:12Z [----] followers, [--] engagements


"Video [--] - Data Bricks Architecture #pyspark Video [--] - Data Bricks Architecture Two Parts - [--]. Control Plane [--]. Compute Plane Control Plane - Control Plane manages and controls but is not responsible to process the data. A) Data Bricks UI B) Cluster Manager C) Unity Catalog D) Workspace Storage (Metadata Storage) - Notebook Definitions Code Permissions Compute Plane - Actual data processing Running Notebooks Jobs/Pipeline Execution Processing Spark Workloads Reading and writing the data Types of Clusters in Data Bricks - 1) Classic Compute 2) Serverless Compute Classic Compute - The compute"  
[YouTube Link](https://youtube.com/watch?v=R_SGm8hty3c)  2026-02-07T16:30Z [----] followers, [--] engagements


"Internals of Python - Part [--] - Mutable and Immutable Objects in Python In this session I cover what mutable and immutable objects are why Python differentiates between them and how this impacts memory management performance and bug-free coding"  
[YouTube Link](https://youtube.com/watch?v=e0Lvj5iynAM)  2026-01-03T19:39Z [----] followers, [---] engagements


"Video [--] - Components of Lang Chain - Part [--] Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&pp=sAgC Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&pp=sAgC Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&pp=sAgC PySpark Zero to Superhero Playlist -"  
[YouTube Link](https://youtube.com/watch?v=xRtToJAwGYo)  2026-02-15T07:36Z [----] followers, [--] engagements


"Session [--] - Grouping aggregations in PySpark - Continuation [--] Session [--] - Grouping aggregations in PySpark - Continuation [--] /FileStore/emp_data.csv [--]. Maximum employee salary of each department. [--]. Minimum employee salary of each department. emp_df1 = emp_df .agg(F.max(F.col("emp_salary")).alias("maximum_salary")) emp_df2 = emp_df .groupBy("emp_department") .agg(F.max(F.col("emp_salary")).alias("maximum_salary")) emp_df3 = emp_df .groupBy("emp_department") .agg(F.max(F.col("emp_salary")).alias("maximum_salary")) .orderBy(F.asc(F.col("emp_department"))) emp_df4 = emp_df"  
[YouTube Link](https://youtube.com/watch?v=-MIvxHvJ4zM)  2025-02-08T10:43Z [---] followers, [--] engagements


"Session [--] - Full Outer Join in PySpark - Joining over multiple Columns Session [--] - Full Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"  
[YouTube Link](https://youtube.com/watch?v=-jx1pQ2W20Y)  2025-08-29T18:00Z [----] followers, [--] engagements


"Session [--] - WHEN Otherwise in PySpark - One when Condition Session [--] - WHEN Otherwise in PySpark - One when Condition /FileStore/emp_data.csv [--]. When salary is more than [-----] it is high salary. If it is less than or equal to [-----] then it is low salary. Give me results for each employee with their salary and if their salaries are high or low. #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo"  
[YouTube Link](https://youtube.com/watch?v=0IDkJ1nidDI)  2025-05-21T06:35Z [---] followers, [--] engagements


"Concluding PySpark Series Session [--] - Creating the raw data frame - https://youtu.be/zHhRnOPul7g Session [--] - Defining the Schema in PySpark - https://youtu.be/AKuvX8Kn7l4 Session [--] - Reading the data frame form file stored at storage location - https://youtu.be/nq04n-6JvH8 Session [--] - Different ways of creating the data frame - https://youtu.be/tDbmBhghE7Q Session [--] - Transformations and Action in Apache Spark - https://youtu.be/JZu1EK0isjA Session [--] - Data Frame Read Modes - https://youtu.be/lkj8nEzS4To Session [--] - PySpark withColumn Transformation - https://youtu.be/gBMNsspzNiI Session [--] -"  
[YouTube Link](https://youtube.com/watch?v=202CDHkQ2fw)  2025-10-05T14:40Z [----] followers, [---] engagements


"Session [--] - Left Outer Join in PySpark - Joining over multiple Columns Session [--] - Left Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") joined_df = df1.join(other=another_dataframeon=join_condition how=join_type) from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema ="  
[YouTube Link](https://youtube.com/watch?v=3MQhlQLYs8Q)  2025-07-19T17:17Z [----] followers, [---] engagements


"Session [--] - Full Outer Join in PySpark - Joining over one Column Session [--] - Full Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType("  
[YouTube Link](https://youtube.com/watch?v=42RKAdZEqos)  2025-08-24T12:57Z [----] followers, [--] engagements


"Challenge [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want only one output per department based on employee who has joined earlier in the ties (emp having smaller emp_id). emp_data = (1 "emp1" [----] 1) (2 "emp2" [----] 2) (3 "emp3" [----] 3) (4 "emp4" [----] 4) (5 "emp5" [----] 5) (6 "emp6" [----] 5) (7 "emp7" [----] 4) (8 "emp8" [----] 3) (9 "emp9" [----] 2) (10 "emp10" [----] 1) (11 "emp11" [----] 1) (12 "emp12" [----] 2)"  
[YouTube Link](https://youtube.com/watch?v=46gijhHcu48)  2025-10-05T16:45Z [----] followers, [---] engagements


"#pyspark #apachespark #coding #video #videos #shorts #databricks #education #azuredatabricks"  
[YouTube Link](https://youtube.com/watch?v=4H984LF-X00)  2025-07-01T15:46Z [----] followers, [---] engagements


"Ready to Deep Dive #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"  
[YouTube Link](https://youtube.com/watch?v=5g969nl43W4)  2025-09-21T15:39Z [----] followers, [--] engagements


"Session [--] - Introduction to Grouping aggregations in PySpark Session [--] - Introduction to Grouping aggregations in PySpark /FileStore/emp_data.csv [--]. Total employees in the organisation department wise. emp_df1 = emp_df .agg(F.count(F.col("emp_id")).alias("total_employees")) emp_df2 = emp_df .groupBy("emp_department") .agg(F.count(F.col("emp_id")).alias("total_employees")) emp_df3 = emp_df .groupBy("emp_department") .agg(F.count(F.col("emp_id")).alias("total_employees")) .orderBy(F.col("emp_department")ascending=True) emp_df4 = emp_df .groupBy("emp_department")"  
[YouTube Link](https://youtube.com/watch?v=7KyqQJDDdgk)  2025-02-08T09:00Z [---] followers, [--] engagements


"Session - Set up Databricks community edition for practise Session - Set up Databricks community edition for practise https://www.databricks.com/try-databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo"  
[YouTube Link](https://youtube.com/watch?v=7PogdA26_RM)  2025-02-09T15:55Z [---] followers, [--] engagements


"Session [--] - Working With dates in PySpark - Storage Location Session [--] - Working With dates in PySpark - Reading file from storage location date_file_path = "/FileStore/PySpark_Series/dates/date_file_ddmmyyyy.csv" Name DOB Ganesh 22-11-1995 Akshay 23-01-1996 date_file_path = "/FileStore/PySpark_Series/dates/date_file_ddmmyyyy.csv" sample_df = spark.read .format("csv") .schema(sample_schema) .option("header"True) .load(date_file_path) sample_df2 = spark.read .format("csv") .option("header"True) .schema(sample_schema) .option("dateFormat""dd-MM-yyyy") .load(date_file_path) #pyspark"  
[YouTube Link](https://youtube.com/watch?v=8GMEi8FKRlc)  2025-06-06T16:09Z [---] followers, [--] engagements


"Session [--] - Windowing Aggregations in PySpark - Dense Rank Session [--] - Windowing Aggregations in PySpark - Dense Rank row_number - It assigns the number to rows serially without skipping any number in between. rank - It assigns rank to rows but skips the numbers in between. For tie it will assign same rank but skips the next number based on number of records already assigned a rank. dense_rank - It assigns rank to rows and does not skip anything in between. For tie it will assign same rank. from pyspark.sql.window import Window dense_rank_window = Window.partitionBy(F.col("subject"))"  
[YouTube Link](https://youtube.com/watch?v=9etNkjKkN0g)  2025-03-30T16:38Z [---] followers, [--] engagements


"Session [--] - Simple aggregations in PySpark - Count Average Max Min Session [--] - Simple Aggregations in PySpark - Count Average Max Min [--]. Find count of total employees in the organization. [--]. Find the Average Salary of the employees in the organization. [--]. Find the maximum salary of the employee in the organization. [--]. Find the minimum salary of the employee in the organization. emp_df1 = emp_df.agg(F.count(F.col("emp_id")).alias("total_employees")) emp_df2 = emp_df.count() emp_df3 = emp_df.agg(F.avg(F.col("emp_salary")).alias("average_salary")) emp_df4 ="  
[YouTube Link](https://youtube.com/watch?v=9q9aF5vgjBY)  2025-02-08T08:39Z [---] followers, [--] engagements


"Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) (9"Person9"6) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"  
[YouTube Link](https://youtube.com/watch?v=A3m0EnbKucg)  2025-08-29T18:00Z [----] followers, [--] engagements


"Session [--] - WHEN Otherwise in PySpark - Multiple when Conditions Session [--] - WHEN Otherwise in PySpark - Multiple when Conditions and Multiple Conditions within when [--]. When salary is less than or equal to [-----] - Very Low Salary when salary is greater than [-----] and less than or equal to [-----] - Low Salary when salary is greater than [-----] and less than or equal to [-----] - Average Salary when salary is greater than [-----] and less than or equal to [------] - High Salary when salary is greater [------] - Very High Salary #pyspark #apachespark #databricks #coding #learnpyspark #python"  
[YouTube Link](https://youtube.com/watch?v=CObkvm1fA0c)  2025-05-24T16:40Z [---] followers, [--] engagements


"#pyspark #coding #databricks #education #viralvideo #apachespark #programming #shorts #viralshorts"  
[YouTube Link](https://youtube.com/watch?v=EjTwQRfjMtI)  2025-03-22T11:43Z [---] followers, [--] engagements


"Defining the Schema for dataframe #pyspark #apachespark #coding #education #pythonprogramming"  
[YouTube Link](https://youtube.com/watch?v=FF-BYlIRgTw)  2025-01-12T05:23Z [---] followers, [--] engagements


"#pyspark #youtubeshorts #viralshorts #viral #video #apachespark #education #databricks #data"  
[YouTube Link](https://youtube.com/watch?v=GFO2Bva3vAg)  2025-06-16T07:27Z [---] followers, [---] engagements


"#pyspark #education #databricks #dataengineering #apachespark #trainer #support #share #subscribe"  
[YouTube Link](https://youtube.com/watch?v=Ghncctolz1A)  2025-09-03T15:54Z [----] followers, [---] engagements


"#pyspark #apachespark #coding #databricks #education #definitions #viralshorts #shorts #python"  
[YouTube Link](https://youtube.com/watch?v=IDo38X4_TwM)  2025-07-08T05:33Z [----] followers, [---] engagements


"Session [--] - Reading parquet file as pyspark data frame Parquet is column based file format. [---] columns and you want to read [--] columns. parquet_df = spark.read.format("parquet").load("/Volumes/demo/default/landing/titanic.parquet") parquet_df1 = spark.read.parquet("/Volumes/demo/default/landing/titanic.parquet") #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python"  
[YouTube Link](https://youtube.com/watch?v=J_Q4y_W41tI)  2025-09-14T16:49Z [----] followers, [---] engagements


"Session [--] - Inner Join in PySpark - Joining over one Column Session [--] - Inner Join in PySpark - Joining over one Column Problem Statement - Get the department name for all the employees assigned to the departments. If there are employees which are not assigned to any of the departments do not return them in results. Sample Data - emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") Generic Structure for Join - df1 and df2 joined_df ="  
[YouTube Link](https://youtube.com/watch?v=LOSe1n8liUI)  2025-06-18T15:30Z [----] followers, [--] engagements


"#pyspark #python #coding #education #databricks #dataengineering #spark #viral #viralshorts #shorts"  
[YouTube Link](https://youtube.com/watch?v=MozUpGy5jig)  2025-09-04T14:38Z [----] followers, [---] engagements


"Session [--] - Top scorer students in each subject using PySpark window functions Session [--] - Top scorer students in each subject using PySpark window functions marks_data = ("Ganesh""English"99) ("Akshay""English"99) ("Priyesh""English"98) ("Rohit""English"98) ("Shrikant""English"98) ("Mayur""English"97) ("Ganesh""History"77) ("Akshay""History"77) ("Priyesh""History"98) ("Rohit""History"98) ("Shrikant""History"99) ("Mayur""History"99) from pyspark.sql.types import * from pyspark.sql import functions as F marks_schema = StructType( StructField("student_name"StringType())"  
[YouTube Link](https://youtube.com/watch?v=O8kPwovva44)  2025-04-05T16:32Z [---] followers, [--] engagements


"Session [--] - Adding created timestamp and created date to the newly added data in PySpark Session [--] - Adding created_timestamp and created_date to the newly added data in PySpark sample_data = ("Priyesh""IT Engineer" 32) ("Gitesh""Data Engineer" 35) # Session [--] - Adding created_timestamp and created_date to the newly added data in PySpark sample_data = ("Priyesh""IT Engineer" 32) ("Gitesh""Data Engineer" 35) from pyspark.sql.types import * from pyspark.sql import functions as F sample_schema = StructType( StructField("emp_name"StringType()) StructField("emp_department"StringType())"  
[YouTube Link](https://youtube.com/watch?v=ORTggQbihT0)  2025-06-10T11:08Z [---] followers, [--] engagements


"Session [--] - Inner Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("country"StringType()) StructField("dept_id"IntegerType()) )"  
[YouTube Link](https://youtube.com/watch?v=RD6WcoExPwo)  2025-06-29T18:36Z [----] followers, [---] engagements


"Session [--] - PySpark Window Function Lead Data Frame Session [--] - PySpark Window Function - Lead (Data Frame) (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-03" "Cancelled") (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-04" "Shipped") (1 [--] "2024-01-05" "In-Transit") (1 [--] "2024-01-06" "Destination-City") (1 [--] "2024-01-07" "Out For Delivery") (1 [--] "2024-01-08" "Cancelled") paritionBy - customer_id order_id orderBy - date Next from particular column we use lead window function. (1 [--] "2024-01-07" "Out For Delivery") (1 [--] "2024-01-07" "Out For"  
[YouTube Link](https://youtube.com/watch?v=SbwDzM7lqPA)  2025-04-05T20:09Z [---] followers, [--] engagements


"Challenge [--] - Find out 4th highest salary per department. Challenge - [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want all those employees to be present in the output in ascending order of their names. [--] - Window aggregations - row_number rank dense_rank [--] - partition based on department and order based on salary desc and order based on emp_name row_number rank dense_rank [--] [----] [--] [--] [--] [--] [----] [--] [--] [--] 1"  
[YouTube Link](https://youtube.com/watch?v=WMD59YUFJJM)  2025-10-05T05:52Z [----] followers, [---] engagements


"Session [--] - PySpark Window Function - LAG Session [--] - PySpark Window Function - LAG (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-03" "Cancelled") (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-04" "Shipped") (1 [--] "2024-01-05" "In-Transit") (1 [--] "2024-01-06" "Destination-City") (1 [--] "2024-01-07" "Out For Delivery") (1 [--] "2024-01-08" "Cancelled") Performance Difference Between Lead and Lag - [--]. Lag processes data by looking at previous rows. As spark works in linear fashion lag can be proved more perfromance efficient. [--]. The catalyst"  
[YouTube Link](https://youtube.com/watch?v=XhDK5aFXjUM)  2025-04-11T16:41Z [---] followers, [--] engagements


"Session [--] - Reading Multi Line JSON file as PySpark Data frame from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType()) StructField("carb"DoubleType()) ) muliline_json_df ="  
[YouTube Link](https://youtube.com/watch?v=ZRRq8V27JXk)  2025-09-14T16:49Z [----] followers, [---] engagements


"Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) )"  
[YouTube Link](https://youtube.com/watch?v=byHPggCj1Nk)  2025-06-21T15:54Z [----] followers, [---] engagements


"Session [--] - Data Frame writer API and data frame writer Modes [--]. Read the dataframe - spark.read.format("").option() .schema.load(file_path) [--]. We process this dataframe [--]. We write the result to some location df.write.format().mode().option("path"writing_path).save() df.write.format().mode().option("path"writing_path) .saveAsTable(table_name) Data Frame read modes - [--]. Permissive [--]. Failfast [--]. DropMalformed While reading the data frame we can specify the file path or folder path. While writing the data frame we can specify folder path. Data Frame writer modes - [--]. Append - It will append"  
[YouTube Link](https://youtube.com/watch?v=ebZ31PKOcrY)  2025-09-14T16:49Z [----] followers, [---] engagements


"Session [--] - Reading Single Line JSON file as PySpark Data frame Session [--] - Reading Single Line JSON file as PySpark Dataframe from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType())"  
[YouTube Link](https://youtube.com/watch?v=fgPvPV-cBpk)  2025-09-13T18:31Z [----] followers, [---] engagements


"Session [--] - CASE WHEN in PySpark - Multiple when Conditions Session [--] - CASE WHEN in PySpark - Multiple when Conditions and Multiple Conditions within when /FileStore/emp_data.csv [--]. When salary is less than or equal to [-----] - Very Low Salary when salary is greater than [-----] and less than or equal to [-----] - Low Salary when salary is greater than [-----] and less than or equal to [-----] - Average Salary when salary is greater than [-----] and less than or equal to [------] - High Salary when salary is greater [------] - Very High Salary #pyspark #apachespark #databricks #coding #learnpyspark"  
[YouTube Link](https://youtube.com/watch?v=gYC42BI67DM)  2025-05-18T18:24Z [---] followers, [--] engagements


"Session [--] - Joins in PySpark - Theory Session [--] - Joins in PySpark - Theory [--]. Inner Join - Inner Join will join each matching records from left data frame to each matching record from right data frame based on join column. emp_id emp_name manager_id [--] Ganesh [--] [--] Akshay [--] department_id department_name [--] "DE" [--]. Left Join - Left Outer Join - It will join each matching records from left data frame to each matching record from right data frame based on join column. Also if there is no match to any of the records from left data frame it will list down all those records padded with NULL values"  
[YouTube Link](https://youtube.com/watch?v=gfZQcj7c0YY)  2025-06-14T14:20Z [---] followers, [--] engagements


"Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dep_id"IntegerType())"  
[YouTube Link](https://youtube.com/watch?v=gmbs5d5L2o8)  2025-07-10T12:57Z [----] followers, [--] engagements


"Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS")"  
[YouTube Link](https://youtube.com/watch?v=iSGp79n6wkA)  2025-07-31T17:57Z [----] followers, [---] engagements


"Session [--] - Left Anti Join in PySpark Session [--] - Left Anti Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. What is left Anti join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will not give those records as output. Left anti join will give the columns of only left dataset as output. Customers - Orders (Left Semi) Give me all"  
[YouTube Link](https://youtube.com/watch?v=mKtJHU-XSig)  2025-09-13T14:25Z [----] followers, [--] engagements


"Session [--] - Grouping aggregations in PySpark - Continuation Session [--] - Grouping aggregations in PySpark - Continuation /FileStore/emp_data.csv [--]. Average salary of employees per department. [--]. Total amount every department is spending on employee salary. [--]. Maximum employee salary of each department. [--]. Minimum employee salary of each department. emp_df1 = emp_df .agg(F.avg(F.col("emp_salary")).alias("average_salary")) emp_df2 = emp_df .groupBy("emp_department") .agg(F.avg(F.col("emp_salary")).alias("average_salary")) emp_df3 = emp_df .groupBy("emp_department")"  
[YouTube Link](https://youtube.com/watch?v=n9d9g4V7-pc)  2025-02-08T09:30Z [---] followers, [--] engagements


"Session [--] - Left Outer Join in PySpark - Joining over one Column Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") # Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from"  
[YouTube Link](https://youtube.com/watch?v=ncurxxaqn7Y)  2025-07-05T13:00Z [----] followers, [---] engagements


"Session [--] - PySpark Window Function Lead Spark SQL Session [--] - PySpark Window Function - Lead (Spark SQL) orders_df1.createOrReplaceTempView("orders") spark.sql("""WITH T1 AS (SELECT *LEAD(order_status) OVER (PARTITION BY customer_idorder_id ORDER BY status_date ASC) AS next_order_status FROM orders) SELECT customer_idorder_id FROM T1 WHERE order_status = 'Out For Delivery' AND next_order_status = 'Cancelled'""").display() #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo"  
[YouTube Link](https://youtube.com/watch?v=oGYvoHSfIUU)  2025-04-11T15:34Z [---] followers, [--] engagements


"Session [--] - Working With dates in PySpark - Python List Session [--] - Working With dates in PySpark - Python List [--]. Give all the records where date is greater than 31-12-1994. sample_data1 = ("Ganesh""1995-11-22") ("Akshay""1997-09-21") sample_data2 = ("Priyesh""23-11-1996") ("Gitesh""12-01-1991") The date we need to read it as string first and then convert to date datatype. Apache Spark by default gives all the dates in yyyy-MM-dd format. from pyspark.sql.types import * from pyspark.sql import functions as F sample_data1 = ("Ganesh""1995-11-22") ("Akshay""1997-09-21") sample_schema ="  
[YouTube Link](https://youtube.com/watch?v=pXDrrgKxzIQ)  2025-05-29T11:56Z [---] followers, [--] engagements


"Mock Interview with Satyam Meena This is End to End Data Engineering Project Satyam has spent 1-2 years on this project. If you are the one who wants to understand how Data Engineering projects work in real life then this video is for you Connect with me: Instagram:"  
[YouTube Link](https://youtube.com/watch?v=ppAXlXBRRB4)  2025-07-31T17:57Z [----] followers, [---] engagements


"Session [--] - CASE WHEN in PySpark - One when Condition Session [--] - CASE WHEN in PySpark - One when Condition /FileStore/emp_data.csv [--]. When salary is more than [-----] it is high salary. If it is less than or equal to [-----] then it is low salary. Give me results for each employee with their salary and if their salaries are high or low. from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"LongType()) StructField("emp_name"StringType()) StructField("emp_salary"LongType()) StructField("emp_department"IntegerType()) ) emp_df ="  
[YouTube Link](https://youtube.com/watch?v=rIhBMviU7Ag)  2025-04-27T08:31Z [---] followers, [--] engagements


"Session [--] - Right Outer Join in PySpark - Joining over one Column Session [--] - Right Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) ) dept_schema = StructType( StructField("department_id"IntegerType())"  
[YouTube Link](https://youtube.com/watch?v=rvn8i6Jezgg)  2025-07-25T15:34Z [----] followers, [---] engagements


"Session [--] - Remove duplicates using PySpark window functions Session [--] - Remove duplicates using PySpark window functions When row_number rank and dense_rank gives same results and your perpose is solved by using any of these always go with row_number. Rank and Dense_Rank needs little extra processing as they have to decide between ties. duplicate_marks_data = ("Ganesh""English"99) ("Akshay""English"99) ("Priyesh""English"98) ("Rohit""English"98) ("Shrikant""English"98) ("Mayur""English"97) ("Akshay""English"99) ("Akshay""English"99) ("Ganesh""History"77) ("Akshay""History"77)"  
[YouTube Link](https://youtube.com/watch?v=t3u7tVBhbog)  2025-03-30T17:46Z [---] followers, [--] engagements


"Session [--] - Left Semi Join in PySpark Session [--] - Left Semi Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. On the other hand inner join will give columns of both left and right dataset. customers - orders Give me all customer details only who have placed order so far. I don't want customers in the output who have not yet placed any order. emp_data_single_column = (1"Person1"1)"  
[YouTube Link](https://youtube.com/watch?v=wEAIlOX4mw8)  2025-09-06T05:49Z [----] followers, [--] engagements


"Session [--] - Right Outer Join in PySpark - Joining over multiple Columns Session [--] - Right Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3)"  
[YouTube Link](https://youtube.com/watch?v=wrUDh2J1cw4)  2025-08-21T02:39Z [----] followers, [---] engagements


"#pyspark #apachespark #programming #coding #learnpyspark #dataframes #dataengineering #databricks"  
[YouTube Link](https://youtube.com/watch?v=z3Rpq-yBrzA)  2025-02-05T16:58Z [---] followers, [--] engagements


"Video [--] - Components of Lang Chain - Part [--] #pyspark #apachespark #databricks Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero/ Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&index=1 Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&index=1 Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&index=1 PySpark Zero to"  
[YouTube Link](https://youtube.com/watch?v=bgITOWsw2WU)  2026-02-15T10:16Z [----] followers, [--] engagements


"Video [--] - Components of Lang Chain - Part [--] Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&pp=sAgC Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&pp=sAgC Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&pp=sAgC PySpark Zero to Superhero Playlist -"  
[YouTube Link](https://youtube.com/watch?v=xRtToJAwGYo)  2026-02-15T07:36Z [----] followers, [--] engagements


"Video [--] - Data Bricks Architecture #pyspark Video [--] - Data Bricks Architecture Two Parts - [--]. Control Plane [--]. Compute Plane Control Plane - Control Plane manages and controls but is not responsible to process the data. A) Data Bricks UI B) Cluster Manager C) Unity Catalog D) Workspace Storage (Metadata Storage) - Notebook Definitions Code Permissions Compute Plane - Actual data processing Running Notebooks Jobs/Pipeline Execution Processing Spark Workloads Reading and writing the data Types of Clusters in Data Bricks - 1) Classic Compute 2) Serverless Compute Classic Compute - The compute"  
[YouTube Link](https://youtube.com/watch?v=R_SGm8hty3c)  2026-02-07T16:30Z [----] followers, [--] engagements


"Video [--] - What is the need of Lang Chain #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm pyspark apachespark databricks coding learnpyspark"  
[YouTube Link](https://youtube.com/watch?v=Wf3V8L3F6Ko)  2026-02-14T10:36Z [----] followers, [--] engagements


"Video [--] - A to Z About Data Bricks Notebooks LinkedIn Post about User User Groups and Service Principal - https://www.linkedin.com/posts/ganesh-kudale-50bb14ab_youtube-contentcreator-pyspark-activity-7421228245473329152-VDdTutm_source=share&utm_medium=member_desktop&rcm=ACoAABd64tsBZlYaSR7w7vs9gd-HLFllhpPToqQ #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud"  
[YouTube Link](https://youtube.com/watch?v=aPaYggUauP8)  2026-02-14T09:21Z [----] followers, [--] engagements


"Podcast with @learnomate 🎙 From Mechanical Engineer to Data Engineer: A Career Transition Story with @learnomate Switching careers from Mechanical Engineering to Data Engineering is challengingbut absolutely possible 🚀 In this podcast I've shared real-life career transition story of myself as mechanical engineer who successfully moved into the Data Engineering domain. PLEASE NOTE THIS IS NOT ANY HYPE. THIS IS WHAT I'M NOT SPEAKING ANYTHING WHICH I HAVE NOTE DONE. Here is the link for Learnomate Technologies Video - https://youtu.be/uyOIkC6VFecsi=mmJeiD4IF0mxW5Ek Note - I'm not promoting any"  
[YouTube Link](https://youtube.com/watch?v=Qq9qdgYcpjQ)  2026-02-07T13:04Z [----] followers, [---] engagements


"Video [--] - Introduction to Data Bricks UI Video [--] - Introduction to Data Bricks UI Lakehouse App - For technical users Data Engineers Data Scientists ML Engineers Create and Run the Notebooks Build ETL Pipelines Create and Manage the Jobs Train ML models Work with Delta Table Data Bricks One - For Business Users Analysts Decision Makers View Dashboards - Ask question in Natural Language - Why Switch Apps in same workspace Delta Table - PRN Symptoms Summary Diagnosis Summary SELECT * FROM final_table WHERE PRN = [-----] Give me the summary for patient with PRN [-----] #pyspark #apachespark"  
[YouTube Link](https://youtube.com/watch?v=P1ob1SeMn4A)  2026-02-01T14:12Z [----] followers, [--] engagements


"Video [--] - What is Lang Chain #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm langchaintutorial pyspark apachespark databricks coding learnpyspark python"  
[YouTube Link](https://youtube.com/watch?v=jmoNS_5Zu0U)  2026-01-10T16:54Z [----] followers, [--] engagements


"Video [--] - Creating Azure Data Bricks Service #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"  
[YouTube Link](https://youtube.com/watch?v=Kte9e5tah-8)  2026-01-03T13:41Z [----] followers, [--] engagements


"Internals of Python - Part [--] - Mutable and Immutable Objects in Python In this session I cover what mutable and immutable objects are why Python differentiates between them and how this impacts memory management performance and bug-free coding"  
[YouTube Link](https://youtube.com/watch?v=e0Lvj5iynAM)  2026-01-03T19:39Z [----] followers, [---] engagements


"Video [--] - Creating Azure Cloud Account #pyspark Navigate to https://portal.azure.com/ Basics Tab - Resource Group - Logical grouping of resources. Project Environment Pricing Tier [--]. Standard Available - Core Features Apache Spark Engine Notebooks Job Scheduling Delta Lake Support Cluster Management Monitoring Not Available - Security and Governance Role Based Access Control Unity Cayalog Audit Logs Premium - Available - All Above features available in Standard Security and Governance Role Based Access Control Unity Catalog Audit Logs Workspace Type [--]. Hybrid - Can use cluster and storage"  
[YouTube Link](https://youtube.com/watch?v=_GiHfdmPpgw)  2026-01-01T12:06Z [----] followers, [--] engagements


"Video [--] - What is Generative AI Video [--] - What is Generative AI Play List Name - Generative AI Using Lang Chain Definition - Gen AI is a type of AI (Artificial Intelligence) that can create a new content on it's own. Text - Stories Emails Articles Images - Painting Photo for Product Audio/Video - Code - Gen AI is AI that can generate something new based on what it has learnt in past. LLM - Large Language Model Huge Model which learns every time from the huge data it understands the data and patterns generates human readable text. Large - Trained on Massive data Language - Focused on text:"  
[YouTube Link](https://youtube.com/watch?v=KkjjRvzmq7s)  2025-12-28T15:21Z [----] followers, [---] engagements


"Internals of Python - Part [--] - Everything is object in Python Ever wondered what Everything is an object in Python really means In this video we break down this core concept by explaining identity type and value of Python objects and how numbers strings lists functions and classes all follow the same object model. This understanding is crucial if you want to: [--]. Write better Python code [--]. Understand Python internals [--]. Grow as a Data Engineer or Python developer"  
[YouTube Link](https://youtube.com/watch?v=TjUdPBaJ3UI)  2025-12-28T18:45Z [----] followers, [---] engagements


"Video [--] - Working of Data Lake house #pyspark Video [--] - Working of Data Lake House Data Lake House uses [--] key technologies - [--]. Delta Lake - Optimized storage layer the supports ACID transactions and schema enforcement and evolutions [--]. Unity Catalog - Unified governance solution for data and AI Working Of Data Lakehouse - Data Ingestion - Data from multiple sources is dumped in multiple formats. This data can be Batch data or streaming data. This is first logical layer provides the place for the data to land in raw format. Making it single source of truth for raw data. Raw data can be put"  
[YouTube Link](https://youtube.com/watch?v=qL7cOw8otA0)  2025-12-18T18:10Z [----] followers, [---] engagements


"Internals of Python - Part [--] - How Python code gets executed internally This video explains how Python code is executed internally diving deep into Pythons execution model. We cover key internals such as source code compilation bytecode generation and how the Python interpreter (PVM) executes bytecode step by step. If you found this video helpful please like the video and share it with others who want to understand Python beyond the surface. python pythonforde dataengineering data ai genai llm python pythonforde dataengineering data ai genai llm"  
[YouTube Link](https://youtube.com/watch?v=cYOr5VjukAI)  2025-12-21T09:29Z [----] followers, [---] engagements


"Video [--] - What is Data Lakehouse #pyspark Video [--] - What is Data Lakehouse DataBase - OLTP Online Transaction - Banking Databases are made to store transactional data. cust_id trn_id trn_amount trn_type Databases are suitable for day-to-day transactions. Not made to store historical data. Data Bases are more costly. Data Warehouse - OLAP - Analytical Platform - DWH stores structured historical data for analytical purposes. Data in DWH is cleaned and organized. Data in DWH is best for reporting dashboards and Business Intelligence. DWH are less costly as compared to Data bases. Data Lake Data"  
[YouTube Link](https://youtube.com/watch?v=CXMvSQLoz9M)  2025-12-07T16:12Z [----] followers, [---] engagements


"Why Python In this video we build a strong intuition around why Python is so important in Data Engineering. Youll understand how and where Python is actually used in real-world data engineering use cases from data ingestion and transformation to building scalable data pipelines. Instead of just theory this video focuses on giving you the right context and mindset needed to learn Python with purpose. #python #dataengineering #coding python Data Engineering Data Pipelines coding python Data Engineering Data Pipelines coding"  
[YouTube Link](https://youtube.com/watch?v=3TCgqNKWLpo)  2025-12-14T16:36Z [----] followers, [---] engagements


"What is Machine Learning What is Machine Learning Village - [---] houses - [----] people Grocery - City Centre - [--] kms Problem Statement - To buy groceries villagers had to go [--] kms from village. Ganesh went to city centre - [--] items - [---] pieces of [--] items - It generally takes [--] days time. On 3rd day he will receive the items. 5-6 days - [---] pieces of each item got exhausted It generally takes [--] days time. On 3rd day he will receive the items. 2-3 days the grocery shop had no item to sell. This pattern ganesh observed for 2-3 weeks. Observation and action taken - He started giving the order"  
[YouTube Link](https://youtube.com/watch?v=FZTMVdvxF3w)  2025-12-14T06:29Z [----] followers, [--] engagements


"Video [--] - What is DataBricks Video [--] - What is Data Bricks Data Bricks - Apache Spark designed and optimised to work on cloud. Official Definition - Data Bricks is Unified platform for building deploying sharing and maintaining enterprise-grade data analytics and AI solutions at scale. Why Unified [--]. Lake House Architecture - It contains the best of Data Lake and DWH. - All type of data in one place. - ACID Enforcement schema evolutions and time travel. [--]. Integrated Workflows - Running workloads within Data Bricks is available. [--]. Multi Language Support - Python SQL Scala and R. - Data"  
[YouTube Link](https://youtube.com/watch?v=IbD4EKzOrfg)  2025-11-23T13:53Z [----] followers, [---] engagements


"Video [--] - Why DataBricks Video [--] - Why DataBricks Company - A person with [--] Lakh salary can complete the project in [--] months [--] Lakhs - Cost Monolithic Systems - Single system with all parts tightly coupled. If one-part breaks the system will fail. One big system holding full power. Vertical Scaling - is not time efficient is not cost efficient Pros - Simple to develop and handle Easy to debug and test Cons Non efficient scalability Hard to maintain and upgrade non fault tolerant Distributed Systems - Pros - Fault Tolerant Efficient Scaling - Horizontal Scaling Parallel Processing Cons -"  
[YouTube Link](https://youtube.com/watch?v=54Q-8vT6QCM)  2025-11-08T15:25Z [----] followers, [---] engagements


"Challenge [--] #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support"  
[YouTube Link](https://youtube.com/watch?v=LYHRx8GmTos)  2025-11-01T11:42Z [----] followers, [--] engagements


"Challenge [--] - Continuation [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. [--] - If [--] elements in the array - same as above If middle name is not there array will have [--] elements middle name will be null first_name position will be [--] last_name position will be [--] age position will be [--] [--]. The delimiter can be comma or pipe. Code should handle that case as well - We will have to replace the delimiter pipe or comma with space from pyspark.sql import functions as F website_data = ("Ganesh Kudale 31")"  
[YouTube Link](https://youtube.com/watch?v=heCo6seFB38)  2025-10-25T00:14Z [----] followers, [--] engagements


"Challenge - [--] Challenge - [--] [--]. Consider there is data that is getting received from website in string format. That data contains first_name middle_namelast_name and age of the customer. Write PySpark code to separate them into separate columns. [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. website_data = ("Ganesh Ramdas Kudale 31") ("Akshay Ramdas Kudale 28") ("Ojas Ganesh Kudale 1.5") split - it gives array of strings - (columndelimiter) substring_index - gives the results based position we specify"  
[YouTube Link](https://youtube.com/watch?v=vArJdgnnrjk)  2025-10-20T12:05Z [----] followers, [--] engagements


"Challenge [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want only one output per department based on employee who has joined earlier in the ties (emp having smaller emp_id). emp_data = (1 "emp1" [----] 1) (2 "emp2" [----] 2) (3 "emp3" [----] 3) (4 "emp4" [----] 4) (5 "emp5" [----] 5) (6 "emp6" [----] 5) (7 "emp7" [----] 4) (8 "emp8" [----] 3) (9 "emp9" [----] 2) (10 "emp10" [----] 1) (11 "emp11" [----] 1) (12 "emp12" [----] 2)"  
[YouTube Link](https://youtube.com/watch?v=46gijhHcu48)  2025-10-05T16:45Z [----] followers, [---] engagements


"Concluding PySpark Series Session [--] - Creating the raw data frame - https://youtu.be/zHhRnOPul7g Session [--] - Defining the Schema in PySpark - https://youtu.be/AKuvX8Kn7l4 Session [--] - Reading the data frame form file stored at storage location - https://youtu.be/nq04n-6JvH8 Session [--] - Different ways of creating the data frame - https://youtu.be/tDbmBhghE7Q Session [--] - Transformations and Action in Apache Spark - https://youtu.be/JZu1EK0isjA Session [--] - Data Frame Read Modes - https://youtu.be/lkj8nEzS4To Session [--] - PySpark withColumn Transformation - https://youtu.be/gBMNsspzNiI Session [--] -"  
[YouTube Link](https://youtube.com/watch?v=202CDHkQ2fw)  2025-10-05T14:40Z [----] followers, [---] engagements


"Challenge [--] - Find out 4th highest salary per department. Challenge - [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want all those employees to be present in the output in ascending order of their names. [--] - Window aggregations - row_number rank dense_rank [--] - partition based on department and order based on salary desc and order based on emp_name row_number rank dense_rank [--] [----] [--] [--] [--] [--] [----] [--] [--] [--] 1"  
[YouTube Link](https://youtube.com/watch?v=WMD59YUFJJM)  2025-10-05T05:52Z [----] followers, [---] engagements


"Session [--] - Data Frame writer API and data frame writer Modes [--]. Read the dataframe - spark.read.format("").option() .schema.load(file_path) [--]. We process this dataframe [--]. We write the result to some location df.write.format().mode().option("path"writing_path).save() df.write.format().mode().option("path"writing_path) .saveAsTable(table_name) Data Frame read modes - [--]. Permissive [--]. Failfast [--]. DropMalformed While reading the data frame we can specify the file path or folder path. While writing the data frame we can specify folder path. Data Frame writer modes - [--]. Append - It will append"  
[YouTube Link](https://youtube.com/watch?v=ebZ31PKOcrY)  2025-09-14T16:49Z [----] followers, [---] engagements


"Ready to Deep Dive #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"  
[YouTube Link](https://youtube.com/watch?v=5g969nl43W4)  2025-09-21T15:39Z [----] followers, [--] engagements


"Session [--] - Reading parquet file as pyspark data frame Parquet is column based file format. [---] columns and you want to read [--] columns. parquet_df = spark.read.format("parquet").load("/Volumes/demo/default/landing/titanic.parquet") parquet_df1 = spark.read.parquet("/Volumes/demo/default/landing/titanic.parquet") #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python"  
[YouTube Link](https://youtube.com/watch?v=J_Q4y_W41tI)  2025-09-14T16:49Z [----] followers, [---] engagements


"Session [--] - Reading Multi Line JSON file as PySpark Data frame from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType()) StructField("carb"DoubleType()) ) muliline_json_df ="  
[YouTube Link](https://youtube.com/watch?v=ZRRq8V27JXk)  2025-09-14T16:49Z [----] followers, [---] engagements


"Session [--] - Reading Single Line JSON file as PySpark Data frame Session [--] - Reading Single Line JSON file as PySpark Dataframe from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType())"  
[YouTube Link](https://youtube.com/watch?v=fgPvPV-cBpk)  2025-09-13T18:31Z [----] followers, [---] engagements


"Session [--] - Left Anti Join in PySpark Session [--] - Left Anti Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. What is left Anti join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will not give those records as output. Left anti join will give the columns of only left dataset as output. Customers - Orders (Left Semi) Give me all"  
[YouTube Link](https://youtube.com/watch?v=mKtJHU-XSig)  2025-09-13T14:25Z [----] followers, [--] engagements


"Session [--] - Left Semi Join in PySpark Session [--] - Left Semi Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. On the other hand inner join will give columns of both left and right dataset. customers - orders Give me all customer details only who have placed order so far. I don't want customers in the output who have not yet placed any order. emp_data_single_column = (1"Person1"1)"  
[YouTube Link](https://youtube.com/watch?v=wEAIlOX4mw8)  2025-09-06T05:49Z [----] followers, [--] engagements


"#pyspark #python #coding #education #databricks #dataengineering #spark #viral #viralshorts #shorts"  
[YouTube Link](https://youtube.com/watch?v=MozUpGy5jig)  2025-09-04T14:38Z [----] followers, [---] engagements


"#pyspark #education #databricks #dataengineering #apachespark #trainer #support #share #subscribe"  
[YouTube Link](https://youtube.com/watch?v=Ghncctolz1A)  2025-09-03T15:54Z [----] followers, [---] engagements


"Session [--] - Full Outer Join in PySpark - Joining over multiple Columns Session [--] - Full Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"  
[YouTube Link](https://youtube.com/watch?v=-jx1pQ2W20Y)  2025-08-29T18:00Z [----] followers, [--] engagements


"Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) (9"Person9"6) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"  
[YouTube Link](https://youtube.com/watch?v=A3m0EnbKucg)  2025-08-29T18:00Z [----] followers, [--] engagements


"Session [--] - Full Outer Join in PySpark - Joining over one Column Session [--] - Full Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType("  
[YouTube Link](https://youtube.com/watch?v=42RKAdZEqos)  2025-08-24T12:57Z [----] followers, [--] engagements


"Session [--] - Right Outer Join in PySpark - Joining over multiple Columns Session [--] - Right Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3)"  
[YouTube Link](https://youtube.com/watch?v=wrUDh2J1cw4)  2025-08-21T02:39Z [----] followers, [---] engagements


"Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS")"  
[YouTube Link](https://youtube.com/watch?v=iSGp79n6wkA)  2025-07-31T17:57Z [----] followers, [---] engagements


"Mock Interview with Satyam Meena This is End to End Data Engineering Project Satyam has spent 1-2 years on this project. If you are the one who wants to understand how Data Engineering projects work in real life then this video is for you Connect with me: Instagram:"  
[YouTube Link](https://youtube.com/watch?v=ppAXlXBRRB4)  2025-07-31T17:57Z [----] followers, [---] engagements


"Session [--] - Right Outer Join in PySpark - Joining over one Column Session [--] - Right Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) ) dept_schema = StructType( StructField("department_id"IntegerType())"  
[YouTube Link](https://youtube.com/watch?v=rvn8i6Jezgg)  2025-07-25T15:34Z [----] followers, [---] engagements


"Session [--] - Left Outer Join in PySpark - Joining over multiple Columns Session [--] - Left Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") joined_df = df1.join(other=another_dataframeon=join_condition how=join_type) from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema ="  
[YouTube Link](https://youtube.com/watch?v=3MQhlQLYs8Q)  2025-07-19T17:17Z [----] followers, [---] engagements


"Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dep_id"IntegerType())"  
[YouTube Link](https://youtube.com/watch?v=gmbs5d5L2o8)  2025-07-10T12:57Z [----] followers, [--] engagements


"#pyspark #apachespark #coding #databricks #education #definitions #viralshorts #shorts #python"  
[YouTube Link](https://youtube.com/watch?v=IDo38X4_TwM)  2025-07-08T05:33Z [----] followers, [---] engagements


"Session [--] - Left Outer Join in PySpark - Joining over one Column Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") # Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from"  
[YouTube Link](https://youtube.com/watch?v=ncurxxaqn7Y)  2025-07-05T13:00Z [----] followers, [---] engagements


"#pyspark #apachespark #coding #video #videos #shorts #databricks #education #azuredatabricks"  
[YouTube Link](https://youtube.com/watch?v=4H984LF-X00)  2025-07-01T15:46Z [----] followers, [---] engagements


"Session [--] - Inner Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("country"StringType()) StructField("dept_id"IntegerType()) )"  
[YouTube Link](https://youtube.com/watch?v=RD6WcoExPwo)  2025-06-29T18:36Z [----] followers, [---] engagements


"Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) )"  
[YouTube Link](https://youtube.com/watch?v=byHPggCj1Nk)  2025-06-21T15:54Z [----] followers, [---] engagements


"Session [--] - Inner Join in PySpark - Joining over one Column Session [--] - Inner Join in PySpark - Joining over one Column Problem Statement - Get the department name for all the employees assigned to the departments. If there are employees which are not assigned to any of the departments do not return them in results. Sample Data - emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") Generic Structure for Join - df1 and df2 joined_df ="  
[YouTube Link](https://youtube.com/watch?v=LOSe1n8liUI)  2025-06-18T15:30Z [----] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@decodethedataai Avatar @decodethedataai DECODE THE DATA

DECODE THE DATA posts on YouTube about databricks, share, ai, [----] the most. They currently have [-----] followers and [---] posts still getting attention that total [---] engagements in the last [--] hours.

Engagements: [---] #

Engagements Line Chart

  • [--] Week [---] +460%
  • [--] Month [---] -44%
  • [--] Months [-----] +95%
  • [--] Year [-----] +5,946%

Mentions: [--] #

Mentions Line Chart

  • [--] Week [--] +250%
  • [--] Month [--] +11%
  • [--] Months [--] +26%
  • [--] Year [--] +914%

Followers: [-----] #

Followers Line Chart

  • [--] Week [-----] +1.90%
  • [--] Month [-----] +3.90%
  • [--] Months [-----] +67%
  • [--] Year [-----] +250%

CreatorRank: undefined #

CreatorRank Line Chart

Social Influence

Social category influence technology brands social networks

Social topic influence databricks, share, ai, 6969, llm, bigdata, what is, chain, education, frame

Top accounts mentioned or mentioned by @learnomate

Top Social Posts

Top posts by engagements in the last [--] hours

"Video [--] - What is the need of Lang Chain #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm pyspark apachespark databricks coding learnpyspark"
YouTube Link 2026-02-14T10:36Z [----] followers, [--] engagements

"Video [--] - A to Z About Data Bricks Notebooks LinkedIn Post about User User Groups and Service Principal - https://www.linkedin.com/posts/ganesh-kudale-50bb14ab_youtube-contentcreator-pyspark-activity-7421228245473329152-VDdTutm_source=share&utm_medium=member_desktop&rcm=ACoAABd64tsBZlYaSR7w7vs9gd-HLFllhpPToqQ #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud"
YouTube Link 2026-02-14T09:21Z [----] followers, [--] engagements

"Video [--] - Components of Lang Chain - Part [--] #pyspark #apachespark #databricks Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero/ Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&index=1 Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&index=1 Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&index=1 PySpark Zero to"
YouTube Link 2026-02-15T10:16Z [----] followers, [--] engagements

"Why Python In this video we build a strong intuition around why Python is so important in Data Engineering. Youll understand how and where Python is actually used in real-world data engineering use cases from data ingestion and transformation to building scalable data pipelines. Instead of just theory this video focuses on giving you the right context and mindset needed to learn Python with purpose. #python #dataengineering #coding python Data Engineering Data Pipelines coding python Data Engineering Data Pipelines coding"
YouTube Link 2025-12-14T16:36Z [----] followers, [---] engagements

"Video [--] - Why DataBricks Video [--] - Why DataBricks Company - A person with [--] Lakh salary can complete the project in [--] months [--] Lakhs - Cost Monolithic Systems - Single system with all parts tightly coupled. If one-part breaks the system will fail. One big system holding full power. Vertical Scaling - is not time efficient is not cost efficient Pros - Simple to develop and handle Easy to debug and test Cons Non efficient scalability Hard to maintain and upgrade non fault tolerant Distributed Systems - Pros - Fault Tolerant Efficient Scaling - Horizontal Scaling Parallel Processing Cons -"
YouTube Link 2025-11-08T15:25Z [----] followers, [---] engagements

"Session [--] - Grouping Aggregations on Multiple Columns in PySpark Session [--] - Grouping Aggregations on Multiple Columns in PySpark emp_id emp_name emp_salary emp_department [--] Ganesh [-----] [--] [--] Ganesh1 [-----] [--] [--] Ramesh [-----] [--] [--] Pritesh [-----] [--] [--] Priyesh [-----] [--] Dep [--] Dep [--] Dep [--] emp_id emp_name emp_salary emp_department emp_city [--] Ganesh [-----] [--] Pune [--] Ganesh1 [-----] [--] Pune [--] Ramesh [-----] [--] Mumbai [--] Pritesh [-----] [--] Chennai [--] Priyesh [-----] [--] Pune grouping based on emp_city and emp_department Pune and dep [--] Pune and dep [--] Mumbai and dep [--] Chennai and dep [--] [--]. How much every department is"
YouTube Link 2025-02-22T07:45Z [---] followers, [--] engagements

"Video [--] - What is Data Lakehouse #pyspark Video [--] - What is Data Lakehouse DataBase - OLTP Online Transaction - Banking Databases are made to store transactional data. cust_id trn_id trn_amount trn_type Databases are suitable for day-to-day transactions. Not made to store historical data. Data Bases are more costly. Data Warehouse - OLAP - Analytical Platform - DWH stores structured historical data for analytical purposes. Data in DWH is cleaned and organized. Data in DWH is best for reporting dashboards and Business Intelligence. DWH are less costly as compared to Data bases. Data Lake Data"
YouTube Link 2025-12-07T16:12Z [----] followers, [---] engagements

"What is Machine Learning What is Machine Learning Village - [---] houses - [----] people Grocery - City Centre - [--] kms Problem Statement - To buy groceries villagers had to go [--] kms from village. Ganesh went to city centre - [--] items - [---] pieces of [--] items - It generally takes [--] days time. On 3rd day he will receive the items. 5-6 days - [---] pieces of each item got exhausted It generally takes [--] days time. On 3rd day he will receive the items. 2-3 days the grocery shop had no item to sell. This pattern ganesh observed for 2-3 weeks. Observation and action taken - He started giving the order"
YouTube Link 2025-12-14T06:29Z [----] followers, [--] engagements

"Video [--] - What is DataBricks Video [--] - What is Data Bricks Data Bricks - Apache Spark designed and optimised to work on cloud. Official Definition - Data Bricks is Unified platform for building deploying sharing and maintaining enterprise-grade data analytics and AI solutions at scale. Why Unified [--]. Lake House Architecture - It contains the best of Data Lake and DWH. - All type of data in one place. - ACID Enforcement schema evolutions and time travel. [--]. Integrated Workflows - Running workloads within Data Bricks is available. [--]. Multi Language Support - Python SQL Scala and R. - Data"
YouTube Link 2025-11-23T13:53Z [----] followers, [---] engagements

"#coding #pyspark #apachespark #databricks #education #viralvideo #vlog #viralshorts #newvideo #learn"
YouTube Link 2025-02-14T15:51Z [---] followers, [--] engagements

"Video [--] - What is Generative AI Video [--] - What is Generative AI Play List Name - Generative AI Using Lang Chain Definition - Gen AI is a type of AI (Artificial Intelligence) that can create a new content on it's own. Text - Stories Emails Articles Images - Painting Photo for Product Audio/Video - Code - Gen AI is AI that can generate something new based on what it has learnt in past. LLM - Large Language Model Huge Model which learns every time from the huge data it understands the data and patterns generates human readable text. Large - Trained on Massive data Language - Focused on text:"
YouTube Link 2025-12-28T15:21Z [----] followers, [---] engagements

"Video [--] - Creating Azure Data Bricks Service #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"
YouTube Link 2026-01-03T13:41Z [----] followers, [--] engagements

"Challenge [--] #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support"
YouTube Link 2025-11-01T11:42Z [----] followers, [--] engagements

"Session [--] - Running multiple grouping aggregations together Session [--] - Running multiple grouping aggregations together [--]. How much every department is spending on employee salary in each city [--]. Average employee salary based on department in each city. [--]. Maximum employee salary based on department in each city. [--]. Minimum employee salary based on department in each city. emp_results_df = emp_df .groupBy(F.col("emp_dep")F.col("emp_city"))"
YouTube Link 2025-02-22T09:30Z [---] followers, [--] engagements

"Internals of Python - Part [--] - Everything is object in Python Ever wondered what Everything is an object in Python really means In this video we break down this core concept by explaining identity type and value of Python objects and how numbers strings lists functions and classes all follow the same object model. This understanding is crucial if you want to: [--]. Write better Python code [--]. Understand Python internals [--]. Grow as a Data Engineer or Python developer"
YouTube Link 2025-12-28T18:45Z [----] followers, [---] engagements

"Video [--] - Creating Azure Cloud Account #pyspark Navigate to https://portal.azure.com/ Basics Tab - Resource Group - Logical grouping of resources. Project Environment Pricing Tier [--]. Standard Available - Core Features Apache Spark Engine Notebooks Job Scheduling Delta Lake Support Cluster Management Monitoring Not Available - Security and Governance Role Based Access Control Unity Cayalog Audit Logs Premium - Available - All Above features available in Standard Security and Governance Role Based Access Control Unity Catalog Audit Logs Workspace Type [--]. Hybrid - Can use cluster and storage"
YouTube Link 2026-01-01T12:06Z [----] followers, [--] engagements

"Internals of Python - Part [--] - How Python code gets executed internally This video explains how Python code is executed internally diving deep into Pythons execution model. We cover key internals such as source code compilation bytecode generation and how the Python interpreter (PVM) executes bytecode step by step. If you found this video helpful please like the video and share it with others who want to understand Python beyond the surface. python pythonforde dataengineering data ai genai llm python pythonforde dataengineering data ai genai llm"
YouTube Link 2025-12-21T09:29Z [----] followers, [---] engagements

"Challenge [--] - Continuation [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. [--] - If [--] elements in the array - same as above If middle name is not there array will have [--] elements middle name will be null first_name position will be [--] last_name position will be [--] age position will be [--] [--]. The delimiter can be comma or pipe. Code should handle that case as well - We will have to replace the delimiter pipe or comma with space from pyspark.sql import functions as F website_data = ("Ganesh Kudale 31")"
YouTube Link 2025-10-25T00:14Z [----] followers, [--] engagements

"Video [--] - What is Lang Chain #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm langchaintutorial pyspark apachespark databricks coding learnpyspark python"
YouTube Link 2026-01-10T16:54Z [----] followers, [--] engagements

"Video [--] - Working of Data Lake house #pyspark Video [--] - Working of Data Lake House Data Lake House uses [--] key technologies - [--]. Delta Lake - Optimized storage layer the supports ACID transactions and schema enforcement and evolutions [--]. Unity Catalog - Unified governance solution for data and AI Working Of Data Lakehouse - Data Ingestion - Data from multiple sources is dumped in multiple formats. This data can be Batch data or streaming data. This is first logical layer provides the place for the data to land in raw format. Making it single source of truth for raw data. Raw data can be put"
YouTube Link 2025-12-18T18:10Z [----] followers, [---] engagements

"Session [--] - Windowing Aggregations in PySpark - Rank Session [--] - Windowing Aggregations in PySpark - Rank marks row_number rank [---] [--] [--] [---] [--] [--] [---] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] [--] from pyspark.sql.window import Window rank_window = Window.partitionBy(F.col("subject")) .orderBy(F.desc(F.col("marks"))) marks_df1 = marks_df.withColumn("rank_number"F.rank().over(rank_window)) marks_df2 = marks_df.withColumn("row_number"F.row_number().over(rank_window)) marks_df.createOrReplaceTempView("marks") spark.sql("SELECT student_namesubjectmarksrank() OVER (PARTITION BY"
YouTube Link 2025-02-22T11:30Z [---] followers, [--] engagements

"Challenge - [--] Challenge - [--] [--]. Consider there is data that is getting received from website in string format. That data contains first_name middle_namelast_name and age of the customer. Write PySpark code to separate them into separate columns. [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. website_data = ("Ganesh Ramdas Kudale 31") ("Akshay Ramdas Kudale 28") ("Ojas Ganesh Kudale 1.5") split - it gives array of strings - (columndelimiter) substring_index - gives the results based position we specify"
YouTube Link 2025-10-20T12:05Z [----] followers, [--] engagements

"Video [--] - Introduction to Data Bricks UI Video [--] - Introduction to Data Bricks UI Lakehouse App - For technical users Data Engineers Data Scientists ML Engineers Create and Run the Notebooks Build ETL Pipelines Create and Manage the Jobs Train ML models Work with Delta Table Data Bricks One - For Business Users Analysts Decision Makers View Dashboards - Ask question in Natural Language - Why Switch Apps in same workspace Delta Table - PRN Symptoms Summary Diagnosis Summary SELECT * FROM final_table WHERE PRN = [-----] Give me the summary for patient with PRN [-----] #pyspark #apachespark"
YouTube Link 2026-02-01T14:12Z [----] followers, [--] engagements

"Video [--] - Data Bricks Architecture #pyspark Video [--] - Data Bricks Architecture Two Parts - [--]. Control Plane [--]. Compute Plane Control Plane - Control Plane manages and controls but is not responsible to process the data. A) Data Bricks UI B) Cluster Manager C) Unity Catalog D) Workspace Storage (Metadata Storage) - Notebook Definitions Code Permissions Compute Plane - Actual data processing Running Notebooks Jobs/Pipeline Execution Processing Spark Workloads Reading and writing the data Types of Clusters in Data Bricks - 1) Classic Compute 2) Serverless Compute Classic Compute - The compute"
YouTube Link 2026-02-07T16:30Z [----] followers, [--] engagements

"Internals of Python - Part [--] - Mutable and Immutable Objects in Python In this session I cover what mutable and immutable objects are why Python differentiates between them and how this impacts memory management performance and bug-free coding"
YouTube Link 2026-01-03T19:39Z [----] followers, [---] engagements

"Video [--] - Components of Lang Chain - Part [--] Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&pp=sAgC Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&pp=sAgC Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&pp=sAgC PySpark Zero to Superhero Playlist -"
YouTube Link 2026-02-15T07:36Z [----] followers, [--] engagements

"Session [--] - Grouping aggregations in PySpark - Continuation [--] Session [--] - Grouping aggregations in PySpark - Continuation [--] /FileStore/emp_data.csv [--]. Maximum employee salary of each department. [--]. Minimum employee salary of each department. emp_df1 = emp_df .agg(F.max(F.col("emp_salary")).alias("maximum_salary")) emp_df2 = emp_df .groupBy("emp_department") .agg(F.max(F.col("emp_salary")).alias("maximum_salary")) emp_df3 = emp_df .groupBy("emp_department") .agg(F.max(F.col("emp_salary")).alias("maximum_salary")) .orderBy(F.asc(F.col("emp_department"))) emp_df4 = emp_df"
YouTube Link 2025-02-08T10:43Z [---] followers, [--] engagements

"Session [--] - Full Outer Join in PySpark - Joining over multiple Columns Session [--] - Full Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"
YouTube Link 2025-08-29T18:00Z [----] followers, [--] engagements

"Session [--] - WHEN Otherwise in PySpark - One when Condition Session [--] - WHEN Otherwise in PySpark - One when Condition /FileStore/emp_data.csv [--]. When salary is more than [-----] it is high salary. If it is less than or equal to [-----] then it is low salary. Give me results for each employee with their salary and if their salaries are high or low. #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo"
YouTube Link 2025-05-21T06:35Z [---] followers, [--] engagements

"Concluding PySpark Series Session [--] - Creating the raw data frame - https://youtu.be/zHhRnOPul7g Session [--] - Defining the Schema in PySpark - https://youtu.be/AKuvX8Kn7l4 Session [--] - Reading the data frame form file stored at storage location - https://youtu.be/nq04n-6JvH8 Session [--] - Different ways of creating the data frame - https://youtu.be/tDbmBhghE7Q Session [--] - Transformations and Action in Apache Spark - https://youtu.be/JZu1EK0isjA Session [--] - Data Frame Read Modes - https://youtu.be/lkj8nEzS4To Session [--] - PySpark withColumn Transformation - https://youtu.be/gBMNsspzNiI Session [--] -"
YouTube Link 2025-10-05T14:40Z [----] followers, [---] engagements

"Session [--] - Left Outer Join in PySpark - Joining over multiple Columns Session [--] - Left Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") joined_df = df1.join(other=another_dataframeon=join_condition how=join_type) from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema ="
YouTube Link 2025-07-19T17:17Z [----] followers, [---] engagements

"Session [--] - Full Outer Join in PySpark - Joining over one Column Session [--] - Full Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType("
YouTube Link 2025-08-24T12:57Z [----] followers, [--] engagements

"Challenge [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want only one output per department based on employee who has joined earlier in the ties (emp having smaller emp_id). emp_data = (1 "emp1" [----] 1) (2 "emp2" [----] 2) (3 "emp3" [----] 3) (4 "emp4" [----] 4) (5 "emp5" [----] 5) (6 "emp6" [----] 5) (7 "emp7" [----] 4) (8 "emp8" [----] 3) (9 "emp9" [----] 2) (10 "emp10" [----] 1) (11 "emp11" [----] 1) (12 "emp12" [----] 2)"
YouTube Link 2025-10-05T16:45Z [----] followers, [---] engagements

"#pyspark #apachespark #coding #video #videos #shorts #databricks #education #azuredatabricks"
YouTube Link 2025-07-01T15:46Z [----] followers, [---] engagements

"Ready to Deep Dive #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"
YouTube Link 2025-09-21T15:39Z [----] followers, [--] engagements

"Session [--] - Introduction to Grouping aggregations in PySpark Session [--] - Introduction to Grouping aggregations in PySpark /FileStore/emp_data.csv [--]. Total employees in the organisation department wise. emp_df1 = emp_df .agg(F.count(F.col("emp_id")).alias("total_employees")) emp_df2 = emp_df .groupBy("emp_department") .agg(F.count(F.col("emp_id")).alias("total_employees")) emp_df3 = emp_df .groupBy("emp_department") .agg(F.count(F.col("emp_id")).alias("total_employees")) .orderBy(F.col("emp_department")ascending=True) emp_df4 = emp_df .groupBy("emp_department")"
YouTube Link 2025-02-08T09:00Z [---] followers, [--] engagements

"Session - Set up Databricks community edition for practise Session - Set up Databricks community edition for practise https://www.databricks.com/try-databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo"
YouTube Link 2025-02-09T15:55Z [---] followers, [--] engagements

"Session [--] - Working With dates in PySpark - Storage Location Session [--] - Working With dates in PySpark - Reading file from storage location date_file_path = "/FileStore/PySpark_Series/dates/date_file_ddmmyyyy.csv" Name DOB Ganesh 22-11-1995 Akshay 23-01-1996 date_file_path = "/FileStore/PySpark_Series/dates/date_file_ddmmyyyy.csv" sample_df = spark.read .format("csv") .schema(sample_schema) .option("header"True) .load(date_file_path) sample_df2 = spark.read .format("csv") .option("header"True) .schema(sample_schema) .option("dateFormat""dd-MM-yyyy") .load(date_file_path) #pyspark"
YouTube Link 2025-06-06T16:09Z [---] followers, [--] engagements

"Session [--] - Windowing Aggregations in PySpark - Dense Rank Session [--] - Windowing Aggregations in PySpark - Dense Rank row_number - It assigns the number to rows serially without skipping any number in between. rank - It assigns rank to rows but skips the numbers in between. For tie it will assign same rank but skips the next number based on number of records already assigned a rank. dense_rank - It assigns rank to rows and does not skip anything in between. For tie it will assign same rank. from pyspark.sql.window import Window dense_rank_window = Window.partitionBy(F.col("subject"))"
YouTube Link 2025-03-30T16:38Z [---] followers, [--] engagements

"Session [--] - Simple aggregations in PySpark - Count Average Max Min Session [--] - Simple Aggregations in PySpark - Count Average Max Min [--]. Find count of total employees in the organization. [--]. Find the Average Salary of the employees in the organization. [--]. Find the maximum salary of the employee in the organization. [--]. Find the minimum salary of the employee in the organization. emp_df1 = emp_df.agg(F.count(F.col("emp_id")).alias("total_employees")) emp_df2 = emp_df.count() emp_df3 = emp_df.agg(F.avg(F.col("emp_salary")).alias("average_salary")) emp_df4 ="
YouTube Link 2025-02-08T08:39Z [---] followers, [--] engagements

"Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) (9"Person9"6) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"
YouTube Link 2025-08-29T18:00Z [----] followers, [--] engagements

"Session [--] - WHEN Otherwise in PySpark - Multiple when Conditions Session [--] - WHEN Otherwise in PySpark - Multiple when Conditions and Multiple Conditions within when [--]. When salary is less than or equal to [-----] - Very Low Salary when salary is greater than [-----] and less than or equal to [-----] - Low Salary when salary is greater than [-----] and less than or equal to [-----] - Average Salary when salary is greater than [-----] and less than or equal to [------] - High Salary when salary is greater [------] - Very High Salary #pyspark #apachespark #databricks #coding #learnpyspark #python"
YouTube Link 2025-05-24T16:40Z [---] followers, [--] engagements

"#pyspark #coding #databricks #education #viralvideo #apachespark #programming #shorts #viralshorts"
YouTube Link 2025-03-22T11:43Z [---] followers, [--] engagements

"Defining the Schema for dataframe #pyspark #apachespark #coding #education #pythonprogramming"
YouTube Link 2025-01-12T05:23Z [---] followers, [--] engagements

"#pyspark #youtubeshorts #viralshorts #viral #video #apachespark #education #databricks #data"
YouTube Link 2025-06-16T07:27Z [---] followers, [---] engagements

"#pyspark #education #databricks #dataengineering #apachespark #trainer #support #share #subscribe"
YouTube Link 2025-09-03T15:54Z [----] followers, [---] engagements

"#pyspark #apachespark #coding #databricks #education #definitions #viralshorts #shorts #python"
YouTube Link 2025-07-08T05:33Z [----] followers, [---] engagements

"Session [--] - Reading parquet file as pyspark data frame Parquet is column based file format. [---] columns and you want to read [--] columns. parquet_df = spark.read.format("parquet").load("/Volumes/demo/default/landing/titanic.parquet") parquet_df1 = spark.read.parquet("/Volumes/demo/default/landing/titanic.parquet") #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python"
YouTube Link 2025-09-14T16:49Z [----] followers, [---] engagements

"Session [--] - Inner Join in PySpark - Joining over one Column Session [--] - Inner Join in PySpark - Joining over one Column Problem Statement - Get the department name for all the employees assigned to the departments. If there are employees which are not assigned to any of the departments do not return them in results. Sample Data - emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") Generic Structure for Join - df1 and df2 joined_df ="
YouTube Link 2025-06-18T15:30Z [----] followers, [--] engagements

"#pyspark #python #coding #education #databricks #dataengineering #spark #viral #viralshorts #shorts"
YouTube Link 2025-09-04T14:38Z [----] followers, [---] engagements

"Session [--] - Top scorer students in each subject using PySpark window functions Session [--] - Top scorer students in each subject using PySpark window functions marks_data = ("Ganesh""English"99) ("Akshay""English"99) ("Priyesh""English"98) ("Rohit""English"98) ("Shrikant""English"98) ("Mayur""English"97) ("Ganesh""History"77) ("Akshay""History"77) ("Priyesh""History"98) ("Rohit""History"98) ("Shrikant""History"99) ("Mayur""History"99) from pyspark.sql.types import * from pyspark.sql import functions as F marks_schema = StructType( StructField("student_name"StringType())"
YouTube Link 2025-04-05T16:32Z [---] followers, [--] engagements

"Session [--] - Adding created timestamp and created date to the newly added data in PySpark Session [--] - Adding created_timestamp and created_date to the newly added data in PySpark sample_data = ("Priyesh""IT Engineer" 32) ("Gitesh""Data Engineer" 35) # Session [--] - Adding created_timestamp and created_date to the newly added data in PySpark sample_data = ("Priyesh""IT Engineer" 32) ("Gitesh""Data Engineer" 35) from pyspark.sql.types import * from pyspark.sql import functions as F sample_schema = StructType( StructField("emp_name"StringType()) StructField("emp_department"StringType())"
YouTube Link 2025-06-10T11:08Z [---] followers, [--] engagements

"Session [--] - Inner Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("country"StringType()) StructField("dept_id"IntegerType()) )"
YouTube Link 2025-06-29T18:36Z [----] followers, [---] engagements

"Session [--] - PySpark Window Function Lead Data Frame Session [--] - PySpark Window Function - Lead (Data Frame) (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-03" "Cancelled") (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-04" "Shipped") (1 [--] "2024-01-05" "In-Transit") (1 [--] "2024-01-06" "Destination-City") (1 [--] "2024-01-07" "Out For Delivery") (1 [--] "2024-01-08" "Cancelled") paritionBy - customer_id order_id orderBy - date Next from particular column we use lead window function. (1 [--] "2024-01-07" "Out For Delivery") (1 [--] "2024-01-07" "Out For"
YouTube Link 2025-04-05T20:09Z [---] followers, [--] engagements

"Challenge [--] - Find out 4th highest salary per department. Challenge - [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want all those employees to be present in the output in ascending order of their names. [--] - Window aggregations - row_number rank dense_rank [--] - partition based on department and order based on salary desc and order based on emp_name row_number rank dense_rank [--] [----] [--] [--] [--] [--] [----] [--] [--] [--] 1"
YouTube Link 2025-10-05T05:52Z [----] followers, [---] engagements

"Session [--] - PySpark Window Function - LAG Session [--] - PySpark Window Function - LAG (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-03" "Cancelled") (1 [--] "2024-01-01" "Placed") (1 [--] "2024-01-02" "Confirmed") (1 [--] "2024-01-04" "Shipped") (1 [--] "2024-01-05" "In-Transit") (1 [--] "2024-01-06" "Destination-City") (1 [--] "2024-01-07" "Out For Delivery") (1 [--] "2024-01-08" "Cancelled") Performance Difference Between Lead and Lag - [--]. Lag processes data by looking at previous rows. As spark works in linear fashion lag can be proved more perfromance efficient. [--]. The catalyst"
YouTube Link 2025-04-11T16:41Z [---] followers, [--] engagements

"Session [--] - Reading Multi Line JSON file as PySpark Data frame from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType()) StructField("carb"DoubleType()) ) muliline_json_df ="
YouTube Link 2025-09-14T16:49Z [----] followers, [---] engagements

"Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) )"
YouTube Link 2025-06-21T15:54Z [----] followers, [---] engagements

"Session [--] - Data Frame writer API and data frame writer Modes [--]. Read the dataframe - spark.read.format("").option() .schema.load(file_path) [--]. We process this dataframe [--]. We write the result to some location df.write.format().mode().option("path"writing_path).save() df.write.format().mode().option("path"writing_path) .saveAsTable(table_name) Data Frame read modes - [--]. Permissive [--]. Failfast [--]. DropMalformed While reading the data frame we can specify the file path or folder path. While writing the data frame we can specify folder path. Data Frame writer modes - [--]. Append - It will append"
YouTube Link 2025-09-14T16:49Z [----] followers, [---] engagements

"Session [--] - Reading Single Line JSON file as PySpark Data frame Session [--] - Reading Single Line JSON file as PySpark Dataframe from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType())"
YouTube Link 2025-09-13T18:31Z [----] followers, [---] engagements

"Session [--] - CASE WHEN in PySpark - Multiple when Conditions Session [--] - CASE WHEN in PySpark - Multiple when Conditions and Multiple Conditions within when /FileStore/emp_data.csv [--]. When salary is less than or equal to [-----] - Very Low Salary when salary is greater than [-----] and less than or equal to [-----] - Low Salary when salary is greater than [-----] and less than or equal to [-----] - Average Salary when salary is greater than [-----] and less than or equal to [------] - High Salary when salary is greater [------] - Very High Salary #pyspark #apachespark #databricks #coding #learnpyspark"
YouTube Link 2025-05-18T18:24Z [---] followers, [--] engagements

"Session [--] - Joins in PySpark - Theory Session [--] - Joins in PySpark - Theory [--]. Inner Join - Inner Join will join each matching records from left data frame to each matching record from right data frame based on join column. emp_id emp_name manager_id [--] Ganesh [--] [--] Akshay [--] department_id department_name [--] "DE" [--]. Left Join - Left Outer Join - It will join each matching records from left data frame to each matching record from right data frame based on join column. Also if there is no match to any of the records from left data frame it will list down all those records padded with NULL values"
YouTube Link 2025-06-14T14:20Z [---] followers, [--] engagements

"Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dep_id"IntegerType())"
YouTube Link 2025-07-10T12:57Z [----] followers, [--] engagements

"Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS")"
YouTube Link 2025-07-31T17:57Z [----] followers, [---] engagements

"Session [--] - Left Anti Join in PySpark Session [--] - Left Anti Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. What is left Anti join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will not give those records as output. Left anti join will give the columns of only left dataset as output. Customers - Orders (Left Semi) Give me all"
YouTube Link 2025-09-13T14:25Z [----] followers, [--] engagements

"Session [--] - Grouping aggregations in PySpark - Continuation Session [--] - Grouping aggregations in PySpark - Continuation /FileStore/emp_data.csv [--]. Average salary of employees per department. [--]. Total amount every department is spending on employee salary. [--]. Maximum employee salary of each department. [--]. Minimum employee salary of each department. emp_df1 = emp_df .agg(F.avg(F.col("emp_salary")).alias("average_salary")) emp_df2 = emp_df .groupBy("emp_department") .agg(F.avg(F.col("emp_salary")).alias("average_salary")) emp_df3 = emp_df .groupBy("emp_department")"
YouTube Link 2025-02-08T09:30Z [---] followers, [--] engagements

"Session [--] - Left Outer Join in PySpark - Joining over one Column Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") # Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from"
YouTube Link 2025-07-05T13:00Z [----] followers, [---] engagements

"Session [--] - PySpark Window Function Lead Spark SQL Session [--] - PySpark Window Function - Lead (Spark SQL) orders_df1.createOrReplaceTempView("orders") spark.sql("""WITH T1 AS (SELECT *LEAD(order_status) OVER (PARTITION BY customer_idorder_id ORDER BY status_date ASC) AS next_order_status FROM orders) SELECT customer_idorder_id FROM T1 WHERE order_status = 'Out For Delivery' AND next_order_status = 'Cancelled'""").display() #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo"
YouTube Link 2025-04-11T15:34Z [---] followers, [--] engagements

"Session [--] - Working With dates in PySpark - Python List Session [--] - Working With dates in PySpark - Python List [--]. Give all the records where date is greater than 31-12-1994. sample_data1 = ("Ganesh""1995-11-22") ("Akshay""1997-09-21") sample_data2 = ("Priyesh""23-11-1996") ("Gitesh""12-01-1991") The date we need to read it as string first and then convert to date datatype. Apache Spark by default gives all the dates in yyyy-MM-dd format. from pyspark.sql.types import * from pyspark.sql import functions as F sample_data1 = ("Ganesh""1995-11-22") ("Akshay""1997-09-21") sample_schema ="
YouTube Link 2025-05-29T11:56Z [---] followers, [--] engagements

"Mock Interview with Satyam Meena This is End to End Data Engineering Project Satyam has spent 1-2 years on this project. If you are the one who wants to understand how Data Engineering projects work in real life then this video is for you Connect with me: Instagram:"
YouTube Link 2025-07-31T17:57Z [----] followers, [---] engagements

"Session [--] - CASE WHEN in PySpark - One when Condition Session [--] - CASE WHEN in PySpark - One when Condition /FileStore/emp_data.csv [--]. When salary is more than [-----] it is high salary. If it is less than or equal to [-----] then it is low salary. Give me results for each employee with their salary and if their salaries are high or low. from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"LongType()) StructField("emp_name"StringType()) StructField("emp_salary"LongType()) StructField("emp_department"IntegerType()) ) emp_df ="
YouTube Link 2025-04-27T08:31Z [---] followers, [--] engagements

"Session [--] - Right Outer Join in PySpark - Joining over one Column Session [--] - Right Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) ) dept_schema = StructType( StructField("department_id"IntegerType())"
YouTube Link 2025-07-25T15:34Z [----] followers, [---] engagements

"Session [--] - Remove duplicates using PySpark window functions Session [--] - Remove duplicates using PySpark window functions When row_number rank and dense_rank gives same results and your perpose is solved by using any of these always go with row_number. Rank and Dense_Rank needs little extra processing as they have to decide between ties. duplicate_marks_data = ("Ganesh""English"99) ("Akshay""English"99) ("Priyesh""English"98) ("Rohit""English"98) ("Shrikant""English"98) ("Mayur""English"97) ("Akshay""English"99) ("Akshay""English"99) ("Ganesh""History"77) ("Akshay""History"77)"
YouTube Link 2025-03-30T17:46Z [---] followers, [--] engagements

"Session [--] - Left Semi Join in PySpark Session [--] - Left Semi Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. On the other hand inner join will give columns of both left and right dataset. customers - orders Give me all customer details only who have placed order so far. I don't want customers in the output who have not yet placed any order. emp_data_single_column = (1"Person1"1)"
YouTube Link 2025-09-06T05:49Z [----] followers, [--] engagements

"Session [--] - Right Outer Join in PySpark - Joining over multiple Columns Session [--] - Right Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3)"
YouTube Link 2025-08-21T02:39Z [----] followers, [---] engagements

"#pyspark #apachespark #programming #coding #learnpyspark #dataframes #dataengineering #databricks"
YouTube Link 2025-02-05T16:58Z [---] followers, [--] engagements

"Video [--] - Components of Lang Chain - Part [--] #pyspark #apachespark #databricks Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero/ Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&index=1 Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&index=1 Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&index=1 PySpark Zero to"
YouTube Link 2026-02-15T10:16Z [----] followers, [--] engagements

"Video [--] - Components of Lang Chain - Part [--] Important Links - PySpark Zero to Superhero Udemy Course - https://www.udemy.com/course/pyspark-zero-to-superhero Gen AI using Lang chain Playlist - https://www.youtube.com/watchv=jmoNS_5Zu0U&list=PLYr5szPccHHbqY4m9xxIM93IDMsf5ciJq&pp=sAgC Python for Data Engineers Playlist - https://www.youtube.com/watchv=e0Lvj5iynAM&list=PLYr5szPccHHZNF93_B0PyxOQ6G6lttkF5&pp=sAgC Data Bricks Zero to Superhero Playlist - https://www.youtube.com/watchv=R_SGm8hty3c&list=PLYr5szPccHHZl2aerhLAegZWsiiXCqtc0&pp=sAgC PySpark Zero to Superhero Playlist -"
YouTube Link 2026-02-15T07:36Z [----] followers, [--] engagements

"Video [--] - Data Bricks Architecture #pyspark Video [--] - Data Bricks Architecture Two Parts - [--]. Control Plane [--]. Compute Plane Control Plane - Control Plane manages and controls but is not responsible to process the data. A) Data Bricks UI B) Cluster Manager C) Unity Catalog D) Workspace Storage (Metadata Storage) - Notebook Definitions Code Permissions Compute Plane - Actual data processing Running Notebooks Jobs/Pipeline Execution Processing Spark Workloads Reading and writing the data Types of Clusters in Data Bricks - 1) Classic Compute 2) Serverless Compute Classic Compute - The compute"
YouTube Link 2026-02-07T16:30Z [----] followers, [--] engagements

"Video [--] - What is the need of Lang Chain #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm pyspark apachespark databricks coding learnpyspark"
YouTube Link 2026-02-14T10:36Z [----] followers, [--] engagements

"Video [--] - A to Z About Data Bricks Notebooks LinkedIn Post about User User Groups and Service Principal - https://www.linkedin.com/posts/ganesh-kudale-50bb14ab_youtube-contentcreator-pyspark-activity-7421228245473329152-VDdTutm_source=share&utm_medium=member_desktop&rcm=ACoAABd64tsBZlYaSR7w7vs9gd-HLFllhpPToqQ #pyspark #apachespark #databricks #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud"
YouTube Link 2026-02-14T09:21Z [----] followers, [--] engagements

"Podcast with @learnomate 🎙 From Mechanical Engineer to Data Engineer: A Career Transition Story with @learnomate Switching careers from Mechanical Engineering to Data Engineering is challengingbut absolutely possible 🚀 In this podcast I've shared real-life career transition story of myself as mechanical engineer who successfully moved into the Data Engineering domain. PLEASE NOTE THIS IS NOT ANY HYPE. THIS IS WHAT I'M NOT SPEAKING ANYTHING WHICH I HAVE NOTE DONE. Here is the link for Learnomate Technologies Video - https://youtu.be/uyOIkC6VFecsi=mmJeiD4IF0mxW5Ek Note - I'm not promoting any"
YouTube Link 2026-02-07T13:04Z [----] followers, [---] engagements

"Video [--] - Introduction to Data Bricks UI Video [--] - Introduction to Data Bricks UI Lakehouse App - For technical users Data Engineers Data Scientists ML Engineers Create and Run the Notebooks Build ETL Pipelines Create and Manage the Jobs Train ML models Work with Delta Table Data Bricks One - For Business Users Analysts Decision Makers View Dashboards - Ask question in Natural Language - Why Switch Apps in same workspace Delta Table - PRN Symptoms Summary Diagnosis Summary SELECT * FROM final_table WHERE PRN = [-----] Give me the summary for patient with PRN [-----] #pyspark #apachespark"
YouTube Link 2026-02-01T14:12Z [----] followers, [--] engagements

"Video [--] - What is Lang Chain #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks langchain genai llm langchaintutorial pyspark apachespark databricks coding learnpyspark python"
YouTube Link 2026-01-10T16:54Z [----] followers, [--] engagements

"Video [--] - Creating Azure Data Bricks Service #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata #azuredatabricks #generativeai #genai #langchain #ai #llm #azurecloud #azuredatabrickswithpyspark #azuredatabricks #azure pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support azuredatabricks pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"
YouTube Link 2026-01-03T13:41Z [----] followers, [--] engagements

"Internals of Python - Part [--] - Mutable and Immutable Objects in Python In this session I cover what mutable and immutable objects are why Python differentiates between them and how this impacts memory management performance and bug-free coding"
YouTube Link 2026-01-03T19:39Z [----] followers, [---] engagements

"Video [--] - Creating Azure Cloud Account #pyspark Navigate to https://portal.azure.com/ Basics Tab - Resource Group - Logical grouping of resources. Project Environment Pricing Tier [--]. Standard Available - Core Features Apache Spark Engine Notebooks Job Scheduling Delta Lake Support Cluster Management Monitoring Not Available - Security and Governance Role Based Access Control Unity Cayalog Audit Logs Premium - Available - All Above features available in Standard Security and Governance Role Based Access Control Unity Catalog Audit Logs Workspace Type [--]. Hybrid - Can use cluster and storage"
YouTube Link 2026-01-01T12:06Z [----] followers, [--] engagements

"Video [--] - What is Generative AI Video [--] - What is Generative AI Play List Name - Generative AI Using Lang Chain Definition - Gen AI is a type of AI (Artificial Intelligence) that can create a new content on it's own. Text - Stories Emails Articles Images - Painting Photo for Product Audio/Video - Code - Gen AI is AI that can generate something new based on what it has learnt in past. LLM - Large Language Model Huge Model which learns every time from the huge data it understands the data and patterns generates human readable text. Large - Trained on Massive data Language - Focused on text:"
YouTube Link 2025-12-28T15:21Z [----] followers, [---] engagements

"Internals of Python - Part [--] - Everything is object in Python Ever wondered what Everything is an object in Python really means In this video we break down this core concept by explaining identity type and value of Python objects and how numbers strings lists functions and classes all follow the same object model. This understanding is crucial if you want to: [--]. Write better Python code [--]. Understand Python internals [--]. Grow as a Data Engineer or Python developer"
YouTube Link 2025-12-28T18:45Z [----] followers, [---] engagements

"Video [--] - Working of Data Lake house #pyspark Video [--] - Working of Data Lake House Data Lake House uses [--] key technologies - [--]. Delta Lake - Optimized storage layer the supports ACID transactions and schema enforcement and evolutions [--]. Unity Catalog - Unified governance solution for data and AI Working Of Data Lakehouse - Data Ingestion - Data from multiple sources is dumped in multiple formats. This data can be Batch data or streaming data. This is first logical layer provides the place for the data to land in raw format. Making it single source of truth for raw data. Raw data can be put"
YouTube Link 2025-12-18T18:10Z [----] followers, [---] engagements

"Internals of Python - Part [--] - How Python code gets executed internally This video explains how Python code is executed internally diving deep into Pythons execution model. We cover key internals such as source code compilation bytecode generation and how the Python interpreter (PVM) executes bytecode step by step. If you found this video helpful please like the video and share it with others who want to understand Python beyond the surface. python pythonforde dataengineering data ai genai llm python pythonforde dataengineering data ai genai llm"
YouTube Link 2025-12-21T09:29Z [----] followers, [---] engagements

"Video [--] - What is Data Lakehouse #pyspark Video [--] - What is Data Lakehouse DataBase - OLTP Online Transaction - Banking Databases are made to store transactional data. cust_id trn_id trn_amount trn_type Databases are suitable for day-to-day transactions. Not made to store historical data. Data Bases are more costly. Data Warehouse - OLAP - Analytical Platform - DWH stores structured historical data for analytical purposes. Data in DWH is cleaned and organized. Data in DWH is best for reporting dashboards and Business Intelligence. DWH are less costly as compared to Data bases. Data Lake Data"
YouTube Link 2025-12-07T16:12Z [----] followers, [---] engagements

"Why Python In this video we build a strong intuition around why Python is so important in Data Engineering. Youll understand how and where Python is actually used in real-world data engineering use cases from data ingestion and transformation to building scalable data pipelines. Instead of just theory this video focuses on giving you the right context and mindset needed to learn Python with purpose. #python #dataengineering #coding python Data Engineering Data Pipelines coding python Data Engineering Data Pipelines coding"
YouTube Link 2025-12-14T16:36Z [----] followers, [---] engagements

"What is Machine Learning What is Machine Learning Village - [---] houses - [----] people Grocery - City Centre - [--] kms Problem Statement - To buy groceries villagers had to go [--] kms from village. Ganesh went to city centre - [--] items - [---] pieces of [--] items - It generally takes [--] days time. On 3rd day he will receive the items. 5-6 days - [---] pieces of each item got exhausted It generally takes [--] days time. On 3rd day he will receive the items. 2-3 days the grocery shop had no item to sell. This pattern ganesh observed for 2-3 weeks. Observation and action taken - He started giving the order"
YouTube Link 2025-12-14T06:29Z [----] followers, [--] engagements

"Video [--] - What is DataBricks Video [--] - What is Data Bricks Data Bricks - Apache Spark designed and optimised to work on cloud. Official Definition - Data Bricks is Unified platform for building deploying sharing and maintaining enterprise-grade data analytics and AI solutions at scale. Why Unified [--]. Lake House Architecture - It contains the best of Data Lake and DWH. - All type of data in one place. - ACID Enforcement schema evolutions and time travel. [--]. Integrated Workflows - Running workloads within Data Bricks is available. [--]. Multi Language Support - Python SQL Scala and R. - Data"
YouTube Link 2025-11-23T13:53Z [----] followers, [---] engagements

"Video [--] - Why DataBricks Video [--] - Why DataBricks Company - A person with [--] Lakh salary can complete the project in [--] months [--] Lakhs - Cost Monolithic Systems - Single system with all parts tightly coupled. If one-part breaks the system will fail. One big system holding full power. Vertical Scaling - is not time efficient is not cost efficient Pros - Simple to develop and handle Easy to debug and test Cons Non efficient scalability Hard to maintain and upgrade non fault tolerant Distributed Systems - Pros - Fault Tolerant Efficient Scaling - Horizontal Scaling Parallel Processing Cons -"
YouTube Link 2025-11-08T15:25Z [----] followers, [---] engagements

"Challenge [--] #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo #conclude #complete #pyspark #pysparktutorial #bigdata pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pysparktutorial like share subscribe support"
YouTube Link 2025-11-01T11:42Z [----] followers, [--] engagements

"Challenge [--] - Continuation [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. [--] - If [--] elements in the array - same as above If middle name is not there array will have [--] elements middle name will be null first_name position will be [--] last_name position will be [--] age position will be [--] [--]. The delimiter can be comma or pipe. Code should handle that case as well - We will have to replace the delimiter pipe or comma with space from pyspark.sql import functions as F website_data = ("Ganesh Kudale 31")"
YouTube Link 2025-10-25T00:14Z [----] followers, [--] engagements

"Challenge - [--] Challenge - [--] [--]. Consider there is data that is getting received from website in string format. That data contains first_name middle_namelast_name and age of the customer. Write PySpark code to separate them into separate columns. [--]. Middle name is optional. Code should handle that case. [--]. The delimiter can be comma or pipe. Code should handle that case as well. website_data = ("Ganesh Ramdas Kudale 31") ("Akshay Ramdas Kudale 28") ("Ojas Ganesh Kudale 1.5") split - it gives array of strings - (columndelimiter) substring_index - gives the results based position we specify"
YouTube Link 2025-10-20T12:05Z [----] followers, [--] engagements

"Challenge [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want only one output per department based on employee who has joined earlier in the ties (emp having smaller emp_id). emp_data = (1 "emp1" [----] 1) (2 "emp2" [----] 2) (3 "emp3" [----] 3) (4 "emp4" [----] 4) (5 "emp5" [----] 5) (6 "emp6" [----] 5) (7 "emp7" [----] 4) (8 "emp8" [----] 3) (9 "emp9" [----] 2) (10 "emp10" [----] 1) (11 "emp11" [----] 1) (12 "emp12" [----] 2)"
YouTube Link 2025-10-05T16:45Z [----] followers, [---] engagements

"Concluding PySpark Series Session [--] - Creating the raw data frame - https://youtu.be/zHhRnOPul7g Session [--] - Defining the Schema in PySpark - https://youtu.be/AKuvX8Kn7l4 Session [--] - Reading the data frame form file stored at storage location - https://youtu.be/nq04n-6JvH8 Session [--] - Different ways of creating the data frame - https://youtu.be/tDbmBhghE7Q Session [--] - Transformations and Action in Apache Spark - https://youtu.be/JZu1EK0isjA Session [--] - Data Frame Read Modes - https://youtu.be/lkj8nEzS4To Session [--] - PySpark withColumn Transformation - https://youtu.be/gBMNsspzNiI Session [--] -"
YouTube Link 2025-10-05T14:40Z [----] followers, [---] engagements

"Challenge [--] - Find out 4th highest salary per department. Challenge - [--] Statement - We have dataset having emp_id emp_name emp_salary and dept_id as it's columns. The challenge is to find out 4th highest salary per department. If there is tie i.e. if there are multiple employees with 4th highest salary we want all those employees to be present in the output in ascending order of their names. [--] - Window aggregations - row_number rank dense_rank [--] - partition based on department and order based on salary desc and order based on emp_name row_number rank dense_rank [--] [----] [--] [--] [--] [--] [----] [--] [--] [--] 1"
YouTube Link 2025-10-05T05:52Z [----] followers, [---] engagements

"Session [--] - Data Frame writer API and data frame writer Modes [--]. Read the dataframe - spark.read.format("").option() .schema.load(file_path) [--]. We process this dataframe [--]. We write the result to some location df.write.format().mode().option("path"writing_path).save() df.write.format().mode().option("path"writing_path) .saveAsTable(table_name) Data Frame read modes - [--]. Permissive [--]. Failfast [--]. DropMalformed While reading the data frame we can specify the file path or folder path. While writing the data frame we can specify folder path. Data Frame writer modes - [--]. Append - It will append"
YouTube Link 2025-09-14T16:49Z [----] followers, [---] engagements

"Ready to Deep Dive #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo"
YouTube Link 2025-09-21T15:39Z [----] followers, [--] engagements

"Session [--] - Reading parquet file as pyspark data frame Parquet is column based file format. [---] columns and you want to read [--] columns. parquet_df = spark.read.format("parquet").load("/Volumes/demo/default/landing/titanic.parquet") parquet_df1 = spark.read.parquet("/Volumes/demo/default/landing/titanic.parquet") #pyspark #apachespark #databricks #coding #learnpyspark #python #azuredatabrickswithpyspark #vlog #viralvideo pyspark apachespark databricks coding learnpyspark python azuredatabrickswithpyspark vlog viralvideo pyspark apachespark databricks coding learnpyspark python"
YouTube Link 2025-09-14T16:49Z [----] followers, [---] engagements

"Session [--] - Reading Multi Line JSON file as PySpark Data frame from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType()) StructField("carb"DoubleType()) ) muliline_json_df ="
YouTube Link 2025-09-14T16:49Z [----] followers, [---] engagements

"Session [--] - Reading Single Line JSON file as PySpark Data frame Session [--] - Reading Single Line JSON file as PySpark Dataframe from pyspark.sql.types import * from pyspark.sql import functions as F bikes_schema = StructType( StructField("model"StringType()) StructField("mpg"DoubleType()) StructField("cyl"DoubleType()) StructField("disp"DoubleType()) StructField("hp"DoubleType()) StructField("drat"DoubleType()) StructField("wt"DoubleType()) StructField("qsec"DoubleType()) StructField("vs"DoubleType()) StructField("am"DoubleType()) StructField("gear"DoubleType())"
YouTube Link 2025-09-13T18:31Z [----] followers, [---] engagements

"Session [--] - Left Anti Join in PySpark Session [--] - Left Anti Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. What is left Anti join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will not give those records as output. Left anti join will give the columns of only left dataset as output. Customers - Orders (Left Semi) Give me all"
YouTube Link 2025-09-13T14:25Z [----] followers, [--] engagements

"Session [--] - Left Semi Join in PySpark Session [--] - Left Semi Join in PySpark What is left semi join - It will perform the join based on If there is match of left dataset to right dataset. If there is match it will give those records as output. Left semi join will give the columns of only left dataset as output. On the other hand inner join will give columns of both left and right dataset. customers - orders Give me all customer details only who have placed order so far. I don't want customers in the output who have not yet placed any order. emp_data_single_column = (1"Person1"1)"
YouTube Link 2025-09-06T05:49Z [----] followers, [--] engagements

"#pyspark #python #coding #education #databricks #dataengineering #spark #viral #viralshorts #shorts"
YouTube Link 2025-09-04T14:38Z [----] followers, [---] engagements

"#pyspark #education #databricks #dataengineering #apachespark #trainer #support #share #subscribe"
YouTube Link 2025-09-03T15:54Z [----] followers, [---] engagements

"Session [--] - Full Outer Join in PySpark - Joining over multiple Columns Session [--] - Full Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"
YouTube Link 2025-08-29T18:00Z [----] followers, [--] engagements

"Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Full Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) (9"Person9"6) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType())"
YouTube Link 2025-08-29T18:00Z [----] followers, [--] engagements

"Session [--] - Full Outer Join in PySpark - Joining over one Column Session [--] - Full Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType("
YouTube Link 2025-08-24T12:57Z [----] followers, [--] engagements

"Session [--] - Right Outer Join in PySpark - Joining over multiple Columns Session [--] - Right Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3)"
YouTube Link 2025-08-21T02:39Z [----] followers, [---] engagements

"Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Right Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS")"
YouTube Link 2025-07-31T17:57Z [----] followers, [---] engagements

"Mock Interview with Satyam Meena This is End to End Data Engineering Project Satyam has spent 1-2 years on this project. If you are the one who wants to understand how Data Engineering projects work in real life then this video is for you Connect with me: Instagram:"
YouTube Link 2025-07-31T17:57Z [----] followers, [---] engagements

"Session [--] - Right Outer Join in PySpark - Joining over one Column Session [--] - Right Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) ) dept_schema = StructType( StructField("department_id"IntegerType())"
YouTube Link 2025-07-25T15:34Z [----] followers, [---] engagements

"Session [--] - Left Outer Join in PySpark - Joining over multiple Columns Session [--] - Left Outer Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") joined_df = df1.join(other=another_dataframeon=join_condition how=join_type) from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema ="
YouTube Link 2025-07-19T17:17Z [----] followers, [---] engagements

"Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Left Outer Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dep_id"IntegerType())"
YouTube Link 2025-07-10T12:57Z [----] followers, [--] engagements

"#pyspark #apachespark #coding #databricks #education #definitions #viralshorts #shorts #python"
YouTube Link 2025-07-08T05:33Z [----] followers, [---] engagements

"Session [--] - Left Outer Join in PySpark - Joining over one Column Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") # Session [--] - Left Outer Join in PySpark - Joining over one Column emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") from pyspark.sql.types import * from"
YouTube Link 2025-07-05T13:00Z [----] followers, [---] engagements

"#pyspark #apachespark #coding #video #videos #shorts #databricks #education #azuredatabricks"
YouTube Link 2025-07-01T15:46Z [----] followers, [---] engagements

"Session [--] - Inner Join in PySpark - Joining over multiple Columns emp_data = (1"Person1""IN"1) (2"Person2""IN"2) (3"Person3""IN"1) (4"Person4""IN"1) (5"Person5""IN"6) (6"Person6""SA"4) (7"Person6""UK"2) (8"Person8""IN"3) (4"Person4""UK"1) (5"Person5""IN"6) (6"Person6""US"4) department_data = (1"IT""IN") (2"HR""US") (3"DE""IN") (4"BE""UK") (5"FE""SA") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("country"StringType()) StructField("dept_id"IntegerType()) )"
YouTube Link 2025-06-29T18:36Z [----] followers, [---] engagements

"Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns Session [--] - Inner Join in PySpark - Joining over one Column - NULL values in joining Columns emp_data = (1"Person1"1) (2"Person2"None) (3"Person3"1) (4"Person4"1) (5"Person5"None) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") (None"TRS") from pyspark.sql.types import * from pyspark.sql import functions as F emp_schema = StructType( StructField("emp_id"IntegerType()) StructField("emp_name"StringType()) StructField("dept_id"IntegerType()) )"
YouTube Link 2025-06-21T15:54Z [----] followers, [---] engagements

"Session [--] - Inner Join in PySpark - Joining over one Column Session [--] - Inner Join in PySpark - Joining over one Column Problem Statement - Get the department name for all the employees assigned to the departments. If there are employees which are not assigned to any of the departments do not return them in results. Sample Data - emp_data = (1"Person1"1) (2"Person2"2) (3"Person3"1) (4"Person4"1) (5"Person5"6) (6"Person6"4) (7"Person6"2) (8"Person8"3) department_data = (1"IT") (2"HR") (3"DE") (4"BE") (5"FE") Generic Structure for Join - df1 and df2 joined_df ="
YouTube Link 2025-06-18T15:30Z [----] followers, [--] engagements

Limited data mode. Full metrics available with subscription: lunarcrush.com/pricing

@decodethedataai
/creator/youtube::decodethedataai