COMPLEX SQL QUERIES

Questions on SQL are based on following two tables, Employee Table and Employee Incentive Table. Table Name : Employee EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY JOINING_DATE DEPARTMENT 1 John Abraham 1000000 01-JAN-13 12.00.00 AM Banking 2 Michael Clarke 800000 01-JAN-13 12.00.00 AM Insurance 3 Roy Thomas 700000 01-FEB-13 12.00.00 AM Banking 4 Tom Jose 600000 01-FEB-13 12.00.00... Continue Reading →

GCP ZERO TO HERO

Do you have the knowledge and skills to design a mobile gaming analytics platform that collects, stores, and analyzes large amounts of bulk and real-time data? Well, after reading this article, you will. I aim to take you from zero to hero in Google Cloud Platform (GCP) in just one article. I will show you... Continue Reading →

Data Scientist Roadmap

How I would relearn Data Science In 2024 to get a job: Getting Started: ⬇️ -  Data Science Intro: DataCamp-  Anaconda Setup: Anaconda Documentation Programming: -  Python Basics: Real Python-  R Basics: R-bloggers-  SQL Fundamentals: SQLZoo- 六 Java for Data Science: Udemy - Java Programming and Software Engineering Fundamentals Mathematics:... Continue Reading →

Azure and Databricks Prep

𝐃𝐚𝐭𝐚𝐛𝐫𝐢𝐜𝐤𝐬 𝐚𝐧𝐝 𝐏𝐲𝐒𝐩𝐚𝐫𝐤 𝐚𝐫𝐞 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐢𝐦𝐩𝐨𝐫𝐭𝐚𝐧𝐭 𝐬𝐤𝐢𝐥𝐥𝐬 𝐢𝐧 𝐝𝐚𝐭𝐚 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠. 𝐀𝐥𝐦𝐨𝐬𝐭 𝐚𝐥𝐥 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐚𝐫𝐞 𝐦𝐨𝐯𝐢𝐧𝐠 𝐟𝐫𝐨𝐦 𝐇𝐚𝐝𝐨𝐨𝐩 𝐭𝐨 𝐀𝐩𝐚𝐜𝐡𝐞 𝐒𝐩𝐚𝐫𝐤. 𝐈 𝐡𝐚𝐯𝐞 𝐜𝐨𝐯𝐞𝐫𝐞𝐝 𝐚𝐥𝐦𝐨𝐬𝐭 𝐞𝐯𝐞𝐫𝐲𝐭𝐡𝐢𝐧𝐠 𝐢𝐧 𝐦𝐲 𝐅𝐫𝐞𝐞 𝐘𝐨𝐮𝐓𝐮𝐛𝐞 𝐩𝐥𝐚𝐲𝐥𝐢𝐬𝐭. 𝐓𝐡𝐞𝐫𝐞 𝐚𝐫𝐞 70 𝐯𝐢𝐝𝐞𝐨𝐬 𝐚𝐯𝐚𝐢𝐥𝐚𝐛𝐥𝐞 𝐟𝐨𝐫 𝐟𝐫𝐞𝐞.0. Introduction to How to setup Account 1. How to read CSV file in PySpark 2. How to... Continue Reading →

Partition Scenario with Pyspark

📕how to create partitions based on year and month ?Data partitioning is critical to data processing performance especially for large volume of data processing in spark.Most of the traditional databases will be having default date format DD-MM-YYYY.But cloud storage (spark delta lake/databricks tables) will be using YYYY-MM-DD format.So here we will be see how to... Continue Reading →

Incremental Loading with CDC using Pyspark

⏫ Incremental Loading technique with Change Data Capture (CDC): ➡️ Incremental Load with Change Data Capture (CDC) is a strategy in data warehousing and ETL (Extract, Transform, Load) processes where only the changed or newly added data is loaded from source systems to the target system. CDC is particularly useful in scenarios where processing the... Continue Reading →

Dynamic Column handling in file

‐----------Spark Interview Questions-------------📍Important Note : This scenario is bit complex I would suggest go through it multiple times. (code implementation is in #databricks )📕how to handle or how to read variable/dynamic number of columns details?id,name,location,emaild,phone1, aman2,abhi,Delhi3,john,chennai,sample123@gmail.com,688080in a scenario we are geeting not complete columnar information but vary from row to row.pyspark code :===============dbutils.fs.put("/dbfs/tmp/dynamic_columns.csv","""id,name,location,emaild,phone1, aman2,abhi,Delhi3,john,chennai,sample123@gmail.com,688080""")now lets... Continue Reading →

Azure Data Engineering by Deepak Goyal

List of All azure / data / devops /ML Interview Q& ASave & Share.1. 𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗙𝗮𝗰𝘁𝗼𝗿𝘆 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/dVzCmzcZ2. 𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼 𝗯𝗮𝘀𝗲𝗱 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/dUCf8qf8𝟯. 𝗥𝗲𝗮𝗹𝘁𝗶𝗺𝗲 𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗙𝗮𝗰𝘁𝗼𝗿𝘆 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/ex_Vixh𝟰.𝗟𝗮𝘁𝗲𝘀𝘁 𝗔𝘇𝘂𝗿𝗲 𝗗𝗲𝘃𝗢𝗽𝘀 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/g7PdATm𝟱. 𝗔𝘇𝘂𝗿𝗲 𝗔𝗰𝘁𝗶𝘃𝗲 𝗗𝗶𝗿𝗲𝗰𝘁𝗼𝗿𝘆 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/dtWYXTKN𝟲. 𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗟𝗮𝗸𝗲 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/dgr-uGQB𝟳. 𝗔𝘇𝘂𝗿𝗲 𝗔𝗽𝗽 𝗦𝗲𝗿𝘃𝗶𝗰𝗲 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/dP4Afqkb𝟴. 𝗔𝘇𝘂𝗿𝗲 𝗗𝗮𝘁𝗮 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤&𝗔https://lnkd.in/dj_m2yeQ𝟵.... Continue Reading →

AWS Solution Architect in 2 months – Road Map

8-week journey toward becoming an AWS Solutions Architect AssociateHere's the breakdown:🚀 Week 1: AWS Fundamentals- Introduction to AWS: Discover the basics and the core services that form the backbone of AWS- AWS Free Tier Account: Learn how to set up an account to leverage AWS's free offerings.- AWS Management Console: Navigate the user interface to... Continue Reading →

Spotify Cloud Project

Spotify Stream Analytics 🎥Built a synthetic data pipeline for real-time music insights, stunning dashboards, and actionable decisions.🌟 Project Overview:Addresses limited Spotify stream data access with a synthetic pipeline. Realistic events stream to Kafka, processed by Spark, stored in Deltalake. Airflow ensures a seamless pipeline, and dbt transforms data into captivating dashboards.📌 Key Features:Streamlined Infrastructure: Scripts... Continue Reading →

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started