Netflix Data Engineering Summit

Netflix recently hosted their Data Engineering Summit, bringing engineers from different teams together to share many use cases and best practices. Having the chance to watch all the series, It provides valuable insights on various topics, especially in designing and executing products and services at scale. A big shout-out to Netflix team 👏 Here is... Continue Reading →

February 12, 2024 0

𝐌𝐢𝐠𝐫𝐚𝐭𝐢𝐧𝐠 𝐕𝐌𝐬 𝐭𝐨 𝐆𝐂𝐏 𝐟𝐫𝐨𝐦 𝐎𝐧-𝐏𝐫𝐞𝐦, 𝐕𝐌𝐰𝐚𝐫𝐞, 𝐀𝐖𝐒, 𝐀𝐳𝐮𝐫𝐞

𝐌𝐢𝐠𝐫𝐚𝐭𝐢𝐧𝐠 𝐕𝐌𝐬 𝐭𝐨 𝐆𝐂𝐏 𝐟𝐫𝐨𝐦 𝐎𝐧-𝐏𝐫𝐞𝐦, 𝐕𝐌𝐰𝐚𝐫𝐞, 𝐀𝐖𝐒, 𝐀𝐳𝐮𝐫𝐞 Moving your virtual machines (VMs) to the cloud offers numerous benefits, from scalability and cost savings to increased agility and security. But choosing the right path and navigating the complexities can be daunting. This guide simplifies the process, covering migration strategies for various environments (on-premises, VMware,... Continue Reading →

February 11, 2024 0

Databricks Learning Path

If you know working with databricks, it helps lot in your data engineering job…You can learn databricks here…1. Learn databricks basics here...https://lnkd.in/gQNKd8HEhttps://lnkd.in/gf_-6EEg2. pyspark with databricks herehttps://lnkd.in/g2iTevyJ2.1 azure databricks with python herehttps://lnkd.in/gyeNtq8n2.2 databricks with scala herehttps://lnkd.in/gzMAcm3s2.3 databricks with sql herehttps://lnkd.in/gdby9_bj3. databricks with spark herehttps://lnkd.in/g-YT-qiF4. databricks on AWShttps://lnkd.in/gYcxe8Tn5. official guide to learn databricks herehttps://lnkd.in/gt8sQeeH6. Databricks projectshttps://lnkd.in/gtpa7jhRhttps://lnkd.in/gdWUBUN9follow this... Continue Reading →

February 8, 2024 0

What is Surrogate keys and how can we handle during data warehouse migration?

What is surrogate key? Surrogate key is nothing but unique identifier assigned to each row in a dimension table. Isn’t simple? Yes. For one, this might raise few questions, because what about primary key, its also unique in nature and assigned to each row. Then, how it differs from primary key of a table, what... Continue Reading →

January 28, 2024 0

Data Engineering with Cloud Resources link

learn here about data pipeline for FREE.....data pipeline consists of several stages that work together to ensure that data is processed efficiently and accurately. it involves....1. data ingestion2. data transformation3. data analysis4. data visualisation5. data storage📌 complete data pipeline diagram can be found here....https://lnkd.in/gdifVyHY📌 FREE guide to data pipeline in AWS, Azure cloud....https://lnkd.in/gtq_8rd9📌 learn more... Continue Reading →

January 27, 2024 0

500+ Data Engineering Interview questions & Answers

1. What is Hadoop MapReduce? A.) For processing large datasets in parallel across hadoop cluster, hadoop mapReduce framework is used. 2. What are the difference between relational database and HDFS? There are 6 major categories we can define RDMBS and HDFS. They areData TypesprocessingSchema on read Vs WriteRead/write speed cost Best fit use case RDBMSHDFS1. ... Continue Reading →

January 25, 2024 0

Big Data Learning Plan

Step by Step Plan to learn Big Data (All Free resources Included)1. Learn SQL Basics - https://lnkd.in/g9NEJMVE SQL will be used at a lot of places - Hive/Spark SQL/RDBMS queriesJoins & windowing functions are very important2. Learn Programming/Python for Data Engineering - https://lnkd.in/gr6fFPdU Learn Python to an extent required for Data Engineers.3. Learn the Fundamentals... Continue Reading →

January 19, 2024 0

Pyspark Scenario ~ Find Average

Write a solution in PySpark to find the average selling price for each product. average_price should be rounded to 2 decimal places.Solution :import datetimefrom pyspark.sql import SparkSessionfrom pyspark.sql.functions import col, sum, roundfrom pyspark.sql.types import StructType, StructField, IntegerType, DateType# Initialize Spark sessionspark = SparkSession.builder.appName("average_selling_price").getOrCreate()# Data for Prices and Units Soldprices_data = [(1, datetime.date(2019, 2, 17), datetime.date(2019,... Continue Reading →

January 19, 2024 0

Step by Step approach to Master Big Data (Free Resources)

Step by Step approach to Master Big Data (Free Resources)Step 1 - Learn SQL📌 Basics -https://lnkd.in/gdnhRk8b📌 Advanced -https://lnkd.in/g8tyEKbU📌 Leetcode -https://lnkd.in/gKeSMPmW2. Learn Python basics -📌 Python Tutorial : https://lnkd.in/gPBDBhpA📌 Python for Beginners : https://lnkd.in/gHWyQfQX3. Big Data Concepts -📌 Big Data Fundamentalshttps://lnkd.in/fWZPWKP📌 HDFS Architecturehttps://lnkd.in/fNP7bf7📌 Mapreduce Fundamentalshttps://lnkd.in/g457Wmv📌 Hive tutorial for Beginnershttps://lnkd.in/gJpDMTfD📌 Introduction to Apache Sparkhttps://lnkd.in/gFRpe3-D📌 Spark Accumulator &... Continue Reading →

January 10, 2024 0

Pyspark Scenarios

Check out these 23 complete PySpark real-time scenario videos covering everything from partitioning data by month and year to handling complex JSON files and implementing multiprocessing in Azure Databricks. ✅ Pyspark Scenarios 1: How to create partition by month and year in pyspark https://lnkd.in/dFfxYR_F ✅ pyspark scenarios 2 : how to read variable number of... Continue Reading →

January 9, 2024 0