Azure Data Engineering by Deepak Goyal

List of All azure / data / devops /ML Interview Q& ASave & Share.1. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—™๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dVzCmzcZ2. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฟ๐—ถ๐—ฐ๐—ธ๐˜€ ๐—ฆ๐—ฐ๐—ฒ๐—ป๐—ฎ๐—ฟ๐—ถ๐—ผ ๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dUCf8qf8๐Ÿฏ. ๐—ฅ๐—ฒ๐—ฎ๐—น๐˜๐—ถ๐—บ๐—ฒ ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—™๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/ex_Vixh๐Ÿฐ.๐—Ÿ๐—ฎ๐˜๐—ฒ๐˜€๐˜ ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฒ๐˜ƒ๐—ข๐—ฝ๐˜€ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/g7PdATm๐Ÿฑ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐—”๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ ๐——๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dtWYXTKN๐Ÿฒ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—Ÿ๐—ฎ๐—ธ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dgr-uGQB๐Ÿณ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐—”๐—ฝ๐—ฝ ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dP4Afqkb๐Ÿด. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dj_m2yeQ๐Ÿต.... Continue Reading →

Read CSV File by Spark

---------------Spark Interview Questions------------๐Ÿ“•How to read a csv file in spark?Method 1: ---------------spark.read.csv("path")df=spark.read.csv("dbfs:/FileStore/small_zipcode.csv")df.show()---+-------+--------+-------------------+-----+----------+|_c0| _c1| _c2| _c3| _c4| _c5|+---+-------+--------+-------------------+-----+----------+| id|zipcode| type| city|state|population|| 1| 704|STANDARD| null| PR| 30100|| 2| 704| null|PASEO COSTA DEL SUR| PR| null|| 3| 709| null| BDA SAN LUIS| PR| 3700|| 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000|| 5| 76177|STANDARD| null| TX| null|+---+-------+--------+-------------------+-----+----------+Method 2 :--------------df=spark.read.format("csv").option("inferSchema",True).option("header",True).option("sep",",").load("dbfs:/FileStore/small_zipcode.csv")df.show()+---+-------+--------+-------------------+-----+----------+|... Continue Reading →

AWS Certification

FREE AWS Certificate by Amazon that you can't miss in 20231. Getting Started with Data Analytics on AWS๐Ÿ”—https://lnkd.in/dwRhRAzM2. Practical Data Science on the AWS Cloud Specialization๐Ÿ”—https://lnkd.in/d3-3GZbG3. Getting Started with AWS Machine Learning๐Ÿ”—https://lnkd.in/dhAp-Vjh4. Introduction to Machine Learning on AWS๐Ÿ”—https://lnkd.in/detfDCWA5. Hands-on Machine Learning with AWS and NVIDIA๐Ÿ”—https://lnkd.in/dgGvATq26. AWS Fundamentals Specialization๐Ÿ”—https://lnkd.in/dSV9jhRz7. Building Modern Python Applications on AWS๐Ÿ”—https://lnkd.in/dQAinFGy8. AWS... Continue Reading →

Free Spark Course

Don't pay for Apache Spark Course because it is in demand.You can learn for free here......1. Install spark from here....https://lnkd.in/gx_Dc8phhttps://lnkd.in/gg6-8xDz2. Learn spark Basics from here--https://lnkd.in/g-gCpUyihttps://lnkd.in/gkNhMnTZhttps://lnkd.in/gkbVB6YX2.1 Learn spark with Scala from here:https://lnkd.in/gtrZAmn42.2 Learn spark with python from here:https://lnkd.in/gQaeSjbH3. Learn pyspark from here:https://lnkd.in/g6kyihyW4. Work on Spark projects from here..https://lnkd.in/gE8hsyZxhttps://lnkd.in/gwWytS-Qhttps://lnkd.in/gR7DR6_5https://lnkd.in/gzngHhrChttps://lnkd.in/gACn6bK85. Finally list down your projects Here.....https://github.com/I highly recommend... Continue Reading →

System Design Blogs

30 Blogs to learn 30 System Design Concepts:1) Content Delivery Network (CDN): https://lnkd.in/gjJrEJeH2) Caching: https://lnkd.in/gC9piQbJ3) Distributed Caching: https://lnkd.in/g7WKydNg4) Latency vs Throughput: https://lnkd.in/g_amhAtN5) CAP Theorem: https://lnkd.in/g3hmVamx6) Load Balancing: https://lnkd.in/gQaa8sXK7) ACID Transactions: https://lnkd.in/gMe2JqaF8) SQL vs NoSQL: https://lnkd.in/g3WC_yxn9) Consistent Hashing: https://lnkd.in/gd3eAQKA10) Database Index: https://lnkd.in/gCeshYVt11) Rate Limiting: https://lnkd.in/gWsTDR3m12) Microservices Architecture: https://lnkd.in/gFXUrz_T13) Strong vs Eventual Consistency: https://lnkd.in/gJ-uXQXZ14) REST vs RPC:... Continue Reading →

Data Masking in Pyspark

Hide Credit card number:Accept 16 digit credit card number from user and display only last 4 characters of card numberinput :1234567891234567output :************4567We can use Py spark or pythonCode In Pyspark:---------------------from pyspark.sql import SparkSessionfrom pyspark.sql.functions import substring# Create a SparkSessionspark = SparkSession.builder.appName("HideCreditCard").getOrCreate()# Sample input credit card numberinput_cc_number = "1234567891234567"# Hide all characters except the last four... Continue Reading →

PySpark: Cleansing Data with Regex

๐Ÿ” Delving into PySpark: Cleansing Data with Regex Magic!โš™๏ธ๐ŸŒŸ Example: Transforming Names with Special Characters ๐Ÿš€Picture yourself in the realm of data, where you've stumbled upon a trove of Indian names. However, these names are shrouded in a layer of noise, with special characters cluttering them. ๐Ÿ”‘ Step 1๏ธโƒฃ: The ChallengeImagine a dataset of Indian... Continue Reading →

AWS DE Questions

This post details AWS data engineering interview and highlights the most common concepts you can expect to be asked in interview processes.1. Start by providing a concise introduction to your professional projects, emphasizing your role as a data engineer.2. Share your knowledge of cloud platforms (AWS, GCP, Azure) as it pertains to data engineering.3. Discuss... Continue Reading →

Insert, Update and Delete in PySpark

Here's the scenario: We had two data tables, Table_A and Table_B, each containing a "Name" and "Age" column. ๐Ÿ“‹๐Ÿ’กTable_A:Name | Age------------S1 | 20S2 | 23-------------------------Table_B:Name | Age------------S1 | 22S4 | 27Our mission was to determine the differences between these tables and generate a Action between Update, Delete, Insert๐Ÿš€ and here's the solution we came up... Continue Reading →

๐Ÿš€๐ŸŒ ๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐—•๐˜‚๐—ถ๐—น๐—ฑ ๐—ฎ๐—ป ๐—˜๐˜ƒ๐—ฒ๐—ป๐˜-๐——๐—ฟ๐—ถ๐˜ƒ๐—ฒ๐—ป ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ๐—น๐—ฒ๐˜€๐˜€ ๐—˜๐—ง๐—Ÿ ๐—ฃ๐—ถ๐—ฝ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ ๐—ผ๐—ป ๐—”๐—ช๐—ฆ

๐—˜๐—ง๐—Ÿ => ๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜ | ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ | ๐—Ÿ๐—ผ๐—ฎ๐—ฑEvent-Driven Serverless ETL Pipelines is a data processing architecture that is used to process large amounts of data in real-time.Here data is processed as soon as it is generated, rather than being stored and processed later.This allows for faster processing times and more efficient use of resources.Here are the... Continue Reading →

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started