Netflix recently hosted their Data Engineering Summit, bringing engineers from different teams together to share many use cases and best practices. Having the chance to watch all the series, It provides valuable insights on various topics, especially in designing and executing products and services at scale. A big shout-out to Netflix team ๐ Here is... Continue Reading →
๐๐ข๐ ๐ซ๐๐ญ๐ข๐ง๐ ๐๐๐ฌ ๐ญ๐จ ๐๐๐ ๐๐ซ๐จ๐ฆ ๐๐ง-๐๐ซ๐๐ฆ, ๐๐๐ฐ๐๐ซ๐, ๐๐๐, ๐๐ณ๐ฎ๐ซ๐
๐๐ข๐ ๐ซ๐๐ญ๐ข๐ง๐ ๐๐๐ฌ ๐ญ๐จ ๐๐๐ ๐๐ซ๐จ๐ฆ ๐๐ง-๐๐ซ๐๐ฆ, ๐๐๐ฐ๐๐ซ๐, ๐๐๐, ๐๐ณ๐ฎ๐ซ๐ Moving your virtual machines (VMs) to the cloud offers numerous benefits, from scalability and cost savings to increased agility and security. But choosing the right path and navigating the complexities can be daunting. This guide simplifies the process, covering migration strategies for various environments (on-premises, VMware,... Continue Reading →
Databricks Learning Path
If you know working with databricks, it helps lot in your data engineering jobโฆYou can learn databricks hereโฆ1. Learn databricks basics here...https://lnkd.in/gQNKd8HEhttps://lnkd.in/gf_-6EEg2. pyspark with databricks herehttps://lnkd.in/g2iTevyJ2.1 azure databricks with python herehttps://lnkd.in/gyeNtq8n2.2 databricks with scala herehttps://lnkd.in/gzMAcm3s2.3 databricks with sql herehttps://lnkd.in/gdby9_bj3. databricks with spark herehttps://lnkd.in/g-YT-qiF4. databricks on AWShttps://lnkd.in/gYcxe8Tn5. official guide to learn databricks herehttps://lnkd.in/gt8sQeeH6. Databricks projectshttps://lnkd.in/gtpa7jhRhttps://lnkd.in/gdWUBUN9follow this... Continue Reading →
What is Surrogate keys and how can we handle during data warehouse migration?
What is surrogate key? Surrogate key is nothing but unique identifier assigned to each row in a dimension table. Isnโt simple? Yes. For one, this might raise few questions, because what about primary key, its also unique in nature and assigned to each row. Then, how it differs from primary key of a table, what... Continue Reading →
Data Engineering with Cloud Resources link
learn here about data pipeline for FREE.....data pipeline consists of several stages that work together to ensure that data is processed efficiently and accurately. it involves....1. data ingestion2. data transformation3. data analysis4. data visualisation5. data storage๐ complete data pipeline diagram can be found here....https://lnkd.in/gdifVyHY๐ FREE guide to data pipeline in AWS, Azure cloud....https://lnkd.in/gtq_8rd9๐ learn more... Continue Reading →
500+ Data Engineering Interview questions & Answers
1. What is Hadoop MapReduce? A.) For processing large datasets in parallel across hadoop cluster, hadoop mapReduce framework is used. 2. What are the difference between relational database and HDFS? There are 6 major categories we can define RDMBS and HDFS. They areData TypesprocessingSchema on read Vs WriteRead/write speed cost Best fit use case RDBMSHDFS1. ... Continue Reading →
Big Data Learning Plan
Step by Step Plan to learn Big Data (All Free resources Included)1. Learn SQL Basics - https://lnkd.in/g9NEJMVE SQL will be used at a lot of places - Hive/Spark SQL/RDBMS queriesJoins & windowing functions are very important2. Learn Programming/Python for Data Engineering - https://lnkd.in/gr6fFPdU Learn Python to an extent required for Data Engineers.3. Learn the Fundamentals... Continue Reading →
Pyspark Scenario ~ Find Average
Write a solution in PySpark to find the average selling price for each product. average_price should be rounded to 2 decimal places.Solution :import datetimefrom pyspark.sql import SparkSessionfrom pyspark.sql.functions import col, sum, roundfrom pyspark.sql.types import StructType, StructField, IntegerType, DateType# Initialize Spark sessionspark = SparkSession.builder.appName("average_selling_price").getOrCreate()# Data for Prices and Units Soldprices_data = [(1, datetime.date(2019, 2, 17), datetime.date(2019,... Continue Reading →
Step by Step approach to Master Big Data (Free Resources)
Step by Step approach to Master Big Data (Free Resources)Step 1 - Learn SQL๐ Basics -https://lnkd.in/gdnhRk8b๐ Advanced -https://lnkd.in/g8tyEKbU๐ Leetcode -https://lnkd.in/gKeSMPmW2. Learn Python basics -๐ Python Tutorial : https://lnkd.in/gPBDBhpA๐ Python for Beginners : https://lnkd.in/gHWyQfQX3. Big Data Concepts -๐ Big Data Fundamentalshttps://lnkd.in/fWZPWKP๐ HDFS Architecturehttps://lnkd.in/fNP7bf7๐ Mapreduce Fundamentalshttps://lnkd.in/g457Wmv๐ Hive tutorial for Beginnershttps://lnkd.in/gJpDMTfD๐ Introduction to Apache Sparkhttps://lnkd.in/gFRpe3-D๐ Spark Accumulator &... Continue Reading →
Pyspark Scenarios
Check out these 23 complete PySpark real-time scenario videos covering everything from partitioning data by month and year to handling complex JSON files and implementing multiprocessing in Azure Databricks. โ Pyspark Scenarios 1: How to create partition by month and year in pyspark https://lnkd.in/dFfxYR_F โ pyspark scenarios 2 : how to read variable number of... Continue Reading →