Theย withColumnย method in PySpark is used to add a new column to an existing DataFrame. It takes two arguments: the name of the new column and an expression for the values of the column. The expression is usually a function that transforms an existing column or combines multiple columns. Here is the basic syntax of the withColumn method:... Continue Reading →
100 Latest Azure Interview Questions
BASIC AZURE INTERVIEW QUESTIONS AND ANSWERS 1. What is Azure and how does it work? Azure is a cloud computing platform managed by Microsoft. It offers services and tools for building, deploying, and managing applications and services in the cloud. The Azure services can be accessed through the internet. These include virtual machines, databases, storage,... Continue Reading →
Data Engineering with Cloud Resources link
learn here about data pipeline for FREE.....data pipeline consists of several stages that work together to ensure that data is processed efficiently and accurately. it involves....1. data ingestion2. data transformation3. data analysis4. data visualisation5. data storage๐ complete data pipeline diagram can be found here....https://lnkd.in/gdifVyHY๐ FREE guide to data pipeline in AWS, Azure cloud....https://lnkd.in/gtq_8rd9๐ learn more... Continue Reading →
500+ Data Engineering Interview questions & Answers
1. What is Hadoop MapReduce? A.) For processing large datasets in parallel across hadoop cluster, hadoop mapReduce framework is used. 2. What are the difference between relational database and HDFS? There are 6 major categories we can define RDMBS and HDFS. They areData TypesprocessingSchema on read Vs WriteRead/write speed cost Best fit use case RDBMSHDFS1. ... Continue Reading →
Big Data Learning Plan
Step by Step Plan to learn Big Data (All Free resources Included)1. Learn SQL Basics - https://lnkd.in/g9NEJMVE SQL will be used at a lot of places - Hive/Spark SQL/RDBMS queriesJoins & windowing functions are very important2. Learn Programming/Python for Data Engineering - https://lnkd.in/gr6fFPdU Learn Python to an extent required for Data Engineers.3. Learn the Fundamentals... Continue Reading →
Data Scientist Roadmap
How I would relearn Data Science In 2024 to get a job: Getting Started: โฌ๏ธ - ๏ Data Science Intro: DataCamp- ๏ฆ Anaconda Setup: Anaconda Documentation Programming: - ๏ Python Basics: Real Python- ๏ R Basics: R-bloggers- ๏ป SQL Fundamentals: SQLZoo- ๏ง๏ป Java for Data Science: Udemy - Java Programming and Software Engineering Fundamentals Mathematics:... Continue Reading →
Azure and Databricks Prep
๐๐๐ญ๐๐๐ซ๐ข๐๐ค๐ฌ ๐๐ง๐ ๐๐ฒ๐๐ฉ๐๐ซ๐ค ๐๐ซ๐ ๐ญ๐ก๐ ๐ฆ๐จ๐ฌ๐ญ ๐ข๐ฆ๐ฉ๐จ๐ซ๐ญ๐๐ง๐ญ ๐ฌ๐ค๐ข๐ฅ๐ฅ๐ฌ ๐ข๐ง ๐๐๐ญ๐ ๐๐ง๐ ๐ข๐ง๐๐๐ซ๐ข๐ง๐ . ๐๐ฅ๐ฆ๐จ๐ฌ๐ญ ๐๐ฅ๐ฅ ๐๐จ๐ฆ๐ฉ๐๐ง๐ข๐๐ฌ ๐๐ซ๐ ๐ฆ๐จ๐ฏ๐ข๐ง๐ ๐๐ซ๐จ๐ฆ ๐๐๐๐จ๐จ๐ฉ ๐ญ๐จ ๐๐ฉ๐๐๐ก๐ ๐๐ฉ๐๐ซ๐ค. ๐ ๐ก๐๐ฏ๐ ๐๐จ๐ฏ๐๐ซ๐๐ ๐๐ฅ๐ฆ๐จ๐ฌ๐ญ ๐๐ฏ๐๐ซ๐ฒ๐ญ๐ก๐ข๐ง๐ ๐ข๐ง ๐ฆ๐ฒ ๐ ๐ซ๐๐ ๐๐จ๐ฎ๐๐ฎ๐๐ ๐ฉ๐ฅ๐๐ฒ๐ฅ๐ข๐ฌ๐ญ. ๐๐ก๐๐ซ๐ ๐๐ซ๐ 70 ๐ฏ๐ข๐๐๐จ๐ฌ ๐๐ฏ๐๐ข๐ฅ๐๐๐ฅ๐ ๐๐จ๐ซ ๐๐ซ๐๐.0. Introduction to How to setup Account 1. How to read CSV file in PySpark 2. How to... Continue Reading →
Partition Scenario with Pyspark
๐how to create partitions based on year and month ?Data partitioning is critical to data processing performance especially for large volume of data processing in spark.Most of the traditional databases will be having default date format DD-MM-YYYY.But cloud storage (spark delta lake/databricks tables) will be using YYYY-MM-DD format.So here we will be see how to... Continue Reading →
Incremental Loading with CDC using Pyspark
โซ Incremental Loading technique with Change Data Capture (CDC): โก๏ธ Incremental Load with Change Data Capture (CDC) is a strategy in data warehousing and ETL (Extract, Transform, Load) processes where only the changed or newly added data is loaded from source systems to the target system. CDC is particularly useful in scenarios where processing the... Continue Reading →
Google Cloud Associate Cloud engineer(ACE) Resources
I receive 10+ DMs daily regarding "How to start their journey in Google Cloud ". So I have curated a complete list of resources for The Google Cloud Associate Cloud engineer(ACE).1. Basics of Linux commands - https://lnkd.in/dN5BPhTq2. File system - https://lnkd.in/dkEAA_qU3. Linux Files Hierarchy Structure - https://lnkd.in/d8hQR5m44. Linux Directory Hierarchy Structure- https://lnkd.in/dWMNd6J95. Associate Cloud Engineer... Continue Reading →