Azure Data Engineering by Deepak Goyal

List of All azure / data / devops /ML Interview Q& ASave & Share.1. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—™๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dVzCmzcZ2. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฟ๐—ถ๐—ฐ๐—ธ๐˜€ ๐—ฆ๐—ฐ๐—ฒ๐—ป๐—ฎ๐—ฟ๐—ถ๐—ผ ๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dUCf8qf8๐Ÿฏ. ๐—ฅ๐—ฒ๐—ฎ๐—น๐˜๐—ถ๐—บ๐—ฒ ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—™๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/ex_Vixh๐Ÿฐ.๐—Ÿ๐—ฎ๐˜๐—ฒ๐˜€๐˜ ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฒ๐˜ƒ๐—ข๐—ฝ๐˜€ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/g7PdATm๐Ÿฑ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐—”๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ ๐——๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dtWYXTKN๐Ÿฒ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—Ÿ๐—ฎ๐—ธ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dgr-uGQB๐Ÿณ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐—”๐—ฝ๐—ฝ ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dP4Afqkb๐Ÿด. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dj_m2yeQ๐Ÿต.... Continue Reading →

Caching in Pyspark

Internals of Caching in PysparkCaching DataFrames in PySpark is a powerful technique to improve query performance. However, there's a subtle difference in how you can cache DataFrames in PySpark.cached_df = orders_df.cache() and orders_df.cache() are two common approaches & they serve different purposes.The choice between these two depends on your specific use case and whether you... Continue Reading →

Google Cloud GCloud Commands Cheat Sheet

Google Cloud Config PURPOSECOMMANDList projectsgcloud config list, gcloud config list projectList projectsgcloud config list, gcloud config list projectShow project infogcloud compute project-info describeSwitch projectgcloud config set project <project-id>Set the active accountgcloud config set account <ACCOUNT>Set default regiongcloud config set compute/region us-westSet default zonegcloud config set compute/zone us-west1-bList configurationsgcloud config configurations listActivate configurationgcloud config configurations activate Google Cloud... Continue Reading →

Read CSV File by Spark

---------------Spark Interview Questions------------๐Ÿ“•How to read a csv file in spark?Method 1: ---------------spark.read.csv("path")df=spark.read.csv("dbfs:/FileStore/small_zipcode.csv")df.show()---+-------+--------+-------------------+-----+----------+|_c0| _c1| _c2| _c3| _c4| _c5|+---+-------+--------+-------------------+-----+----------+| id|zipcode| type| city|state|population|| 1| 704|STANDARD| null| PR| 30100|| 2| 704| null|PASEO COSTA DEL SUR| PR| null|| 3| 709| null| BDA SAN LUIS| PR| 3700|| 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000|| 5| 76177|STANDARD| null| TX| null|+---+-------+--------+-------------------+-----+----------+Method 2 :--------------df=spark.read.format("csv").option("inferSchema",True).option("header",True).option("sep",",").load("dbfs:/FileStore/small_zipcode.csv")df.show()+---+-------+--------+-------------------+-----+----------+|... Continue Reading →

Free Spark Course

Don't pay for Apache Spark Course because it is in demand.You can learn for free here......1. Install spark from here....https://lnkd.in/gx_Dc8phhttps://lnkd.in/gg6-8xDz2. Learn spark Basics from here--https://lnkd.in/g-gCpUyihttps://lnkd.in/gkNhMnTZhttps://lnkd.in/gkbVB6YX2.1 Learn spark with Scala from here:https://lnkd.in/gtrZAmn42.2 Learn spark with python from here:https://lnkd.in/gQaeSjbH3. Learn pyspark from here:https://lnkd.in/g6kyihyW4. Work on Spark projects from here..https://lnkd.in/gE8hsyZxhttps://lnkd.in/gwWytS-Qhttps://lnkd.in/gR7DR6_5https://lnkd.in/gzngHhrChttps://lnkd.in/gACn6bK85. Finally list down your projects Here.....https://github.com/I highly recommend... Continue Reading →

Data Masking in Pyspark

Hide Credit card number:Accept 16 digit credit card number from user and display only last 4 characters of card numberinput :1234567891234567output :************4567We can use Py spark or pythonCode In Pyspark:---------------------from pyspark.sql import SparkSessionfrom pyspark.sql.functions import substring# Create a SparkSessionspark = SparkSession.builder.appName("HideCreditCard").getOrCreate()# Sample input credit card numberinput_cc_number = "1234567891234567"# Hide all characters except the last four... Continue Reading →

Insert, Update and Delete in PySpark

Here's the scenario: We had two data tables, Table_A and Table_B, each containing a "Name" and "Age" column. ๐Ÿ“‹๐Ÿ’กTable_A:Name | Age------------S1 | 20S2 | 23-------------------------Table_B:Name | Age------------S1 | 22S4 | 27Our mission was to determine the differences between these tables and generate a Action between Update, Delete, Insert๐Ÿš€ and here's the solution we came up... Continue Reading →

Spark – BTS

Internal working of Apache Spark (don't forget to save it)๐€๐ฉ๐š๐œ๐ก๐ž ๐’๐ฉ๐š๐ซ๐ค works on the principle of in-memory computation making it 100x faster and a highly performant distributed framework.Here is a detailed explanation on what happens internally when a spark job is executed using the spark-submit command - ๐Ÿ“‹๐’๐ญ๐ž๐ฉ 1 : Client application initiates the execution... Continue Reading →

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started