Insert, Update and Delete in PySpark

Here’s the scenario: We had two data tables, Table_A and Table_B, each containing a “Name” and “Age” column. ๐Ÿ“‹๐Ÿ’ก

Table_A:
Name | Age
————
S1 | 20
S2 | 23
————————-
Table_B:
Name | Age
————
S1 | 22
S4 | 27

Our mission was to determine the differences between these tables and generate a Action between Update, Delete, Insert๐Ÿš€ and here’s the solution we came up with :
๐ŸŽฏ For “S1,” we identified an update as the age changed from 20 to 22.
๐ŸŽฏ For “S2,” we detected a delete as it existed in Table_A but not in Table_B.
๐ŸŽฏ For “S4,” we found an insert since it was present in Table_B but not in Table_A.

๐Ÿง  This question tests your data engineering and data manipulation skills, making it a vital part of your interview preparation.
๐Ÿ‘‰ What’s your approach to solving this question? Share your insights in the comments !!

Code sample

Leave a comment

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started