List of All azure / data / devops /ML Interview Q& ASave & Share.1. ๐๐๐๐ฟ๐ฒ ๐๐ฎ๐๐ฎ ๐๐ฎ๐ฐ๐๐ผ๐ฟ๐ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/dVzCmzcZ2. ๐๐๐๐ฟ๐ฒ ๐๐ฎ๐๐ฎ๐ฏ๐ฟ๐ถ๐ฐ๐ธ๐ ๐ฆ๐ฐ๐ฒ๐ป๐ฎ๐ฟ๐ถ๐ผ ๐ฏ๐ฎ๐๐ฒ๐ฑ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/dUCf8qf8๐ฏ. ๐ฅ๐ฒ๐ฎ๐น๐๐ถ๐บ๐ฒ ๐๐๐๐ฟ๐ฒ ๐๐ฎ๐๐ฎ ๐๐ฎ๐ฐ๐๐ผ๐ฟ๐ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/ex_Vixh๐ฐ.๐๐ฎ๐๐ฒ๐๐ ๐๐๐๐ฟ๐ฒ ๐๐ฒ๐๐ข๐ฝ๐ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/g7PdATm๐ฑ. ๐๐๐๐ฟ๐ฒ ๐๐ฐ๐๐ถ๐๐ฒ ๐๐ถ๐ฟ๐ฒ๐ฐ๐๐ผ๐ฟ๐ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/dtWYXTKN๐ฒ. ๐๐๐๐ฟ๐ฒ ๐๐ฎ๐๐ฎ ๐๐ฎ๐ธ๐ฒ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/dgr-uGQB๐ณ. ๐๐๐๐ฟ๐ฒ ๐๐ฝ๐ฝ ๐ฆ๐ฒ๐ฟ๐๐ถ๐ฐ๐ฒ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/dP4Afqkb๐ด. ๐๐๐๐ฟ๐ฒ ๐๐ฎ๐๐ฎ ๐๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ ๐๐ป๐๐ฒ๐ฟ๐๐ถ๐ฒ๐ ๐ค&๐https://lnkd.in/dj_m2yeQ๐ต.... Continue Reading →
Caching in Pyspark
Internals of Caching in PysparkCaching DataFrames in PySpark is a powerful technique to improve query performance. However, there's a subtle difference in how you can cache DataFrames in PySpark.cached_df = orders_df.cache() and orders_df.cache() are two common approaches & they serve different purposes.The choice between these two depends on your specific use case and whether you... Continue Reading →
INTERVIEW QUESTIONS ON APACHE SPARK ,PYSPARK FOR DATAENGINEERS
SET OF 82 QUESTIONS 1. How is Apache Spark different from MapReduce? Apache SparkMapReduceSpark processes data in batches as well as in real-timeMapReduce processes data in batches onlySpark runs almost 100 times faster than Hadoop Map ReduceHadoop MapReduce is slower when it comes to large sc processingSpark stores data in the RAM i.e. in-memory. So,... Continue Reading →
Google Cloud GCloud Commands Cheat Sheet
Google Cloud Config PURPOSECOMMANDList projectsgcloud config list, gcloud config list projectList projectsgcloud config list, gcloud config list projectShow project infogcloud compute project-info describeSwitch projectgcloud config set project <project-id>Set the active accountgcloud config set account <ACCOUNT>Set default regiongcloud config set compute/region us-westSet default zonegcloud config set compute/zone us-west1-bList configurationsgcloud config configurations listActivate configurationgcloud config configurations activate Google Cloud... Continue Reading →
Read CSV File by Spark
---------------Spark Interview Questions------------๐How to read a csv file in spark?Method 1: ---------------spark.read.csv("path")df=spark.read.csv("dbfs:/FileStore/small_zipcode.csv")df.show()---+-------+--------+-------------------+-----+----------+|_c0| _c1| _c2| _c3| _c4| _c5|+---+-------+--------+-------------------+-----+----------+| id|zipcode| type| city|state|population|| 1| 704|STANDARD| null| PR| 30100|| 2| 704| null|PASEO COSTA DEL SUR| PR| null|| 3| 709| null| BDA SAN LUIS| PR| 3700|| 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000|| 5| 76177|STANDARD| null| TX| null|+---+-------+--------+-------------------+-----+----------+Method 2 :--------------df=spark.read.format("csv").option("inferSchema",True).option("header",True).option("sep",",").load("dbfs:/FileStore/small_zipcode.csv")df.show()+---+-------+--------+-------------------+-----+----------+|... Continue Reading →
Free Spark Course
Don't pay for Apache Spark Course because it is in demand.You can learn for free here......1. Install spark from here....https://lnkd.in/gx_Dc8phhttps://lnkd.in/gg6-8xDz2. Learn spark Basics from here--https://lnkd.in/g-gCpUyihttps://lnkd.in/gkNhMnTZhttps://lnkd.in/gkbVB6YX2.1 Learn spark with Scala from here:https://lnkd.in/gtrZAmn42.2 Learn spark with python from here:https://lnkd.in/gQaeSjbH3. Learn pyspark from here:https://lnkd.in/g6kyihyW4. Work on Spark projects from here..https://lnkd.in/gE8hsyZxhttps://lnkd.in/gwWytS-Qhttps://lnkd.in/gR7DR6_5https://lnkd.in/gzngHhrChttps://lnkd.in/gACn6bK85. Finally list down your projects Here.....https://github.com/I highly recommend... Continue Reading →
Data Masking in Pyspark
Hide Credit card number:Accept 16 digit credit card number from user and display only last 4 characters of card numberinput :1234567891234567output :************4567We can use Py spark or pythonCode In Pyspark:---------------------from pyspark.sql import SparkSessionfrom pyspark.sql.functions import substring# Create a SparkSessionspark = SparkSession.builder.appName("HideCreditCard").getOrCreate()# Sample input credit card numberinput_cc_number = "1234567891234567"# Hide all characters except the last four... Continue Reading →
Insert, Update and Delete in PySpark
Here's the scenario: We had two data tables, Table_A and Table_B, each containing a "Name" and "Age" column. ๐๐กTable_A:Name | Age------------S1 | 20S2 | 23-------------------------Table_B:Name | Age------------S1 | 22S4 | 27Our mission was to determine the differences between these tables and generate a Action between Update, Delete, Insert๐ and here's the solution we came up... Continue Reading →
Spark – BTS
Internal working of Apache Spark (don't forget to save it)๐๐ฉ๐๐๐ก๐ ๐๐ฉ๐๐ซ๐ค works on the principle of in-memory computation making it 100x faster and a highly performant distributed framework.Here is a detailed explanation on what happens internally when a spark job is executed using the spark-submit command - ๐๐๐ญ๐๐ฉ 1 : Client application initiates the execution... Continue Reading →