Top 7 GCPย tools to learn for FREE.

Top 7 GCP #dataengineering tools to learn for FREE.....1. Google Bigqueryhttps://lnkd.in/g4Pvu8aq2. Google cloud Dataprochttps://lnkd.in/gZbJV_8shttps://lnkd.in/gkDeVqtb3. Google cloud Dataprephttps://lnkd.in/gF4G3uAK4. Google Cloud composerhttps://lnkd.in/gjfnYb3whttps://lnkd.in/grGTQYtT5. Google cloud Data Fusionhttps://lnkd.in/gfmxapqP6. Google Data Studiohttps://lnkd.in/gus75kYW7. Google cloud Dataflowhttps://lnkd.in/gyxKXaGU 8. Datawarehousing with Big query https://youtu.be/ZVgt1-LfWW4?si=dPVaNH9LgU-Wfo7s complete GCP Full course for FREE....https://lnkd.in/gi48NG3zResources are short and crispy, and definitely recommended.

Read CSV File by Spark

---------------Spark Interview Questions------------๐Ÿ“•How to read a csv file in spark?Method 1: ---------------spark.read.csv("path")df=spark.read.csv("dbfs:/FileStore/small_zipcode.csv")df.show()---+-------+--------+-------------------+-----+----------+|_c0| _c1| _c2| _c3| _c4| _c5|+---+-------+--------+-------------------+-----+----------+| id|zipcode| type| city|state|population|| 1| 704|STANDARD| null| PR| 30100|| 2| 704| null|PASEO COSTA DEL SUR| PR| null|| 3| 709| null| BDA SAN LUIS| PR| 3700|| 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000|| 5| 76177|STANDARD| null| TX| null|+---+-------+--------+-------------------+-----+----------+Method 2 :--------------df=spark.read.format("csv").option("inferSchema",True).option("header",True).option("sep",",").load("dbfs:/FileStore/small_zipcode.csv")df.show()+---+-------+--------+-------------------+-----+----------+|... Continue Reading →

AWS Certification

FREE AWS Certificate by Amazon that you can't miss in 20231. Getting Started with Data Analytics on AWS๐Ÿ”—https://lnkd.in/dwRhRAzM2. Practical Data Science on the AWS Cloud Specialization๐Ÿ”—https://lnkd.in/d3-3GZbG3. Getting Started with AWS Machine Learning๐Ÿ”—https://lnkd.in/dhAp-Vjh4. Introduction to Machine Learning on AWS๐Ÿ”—https://lnkd.in/detfDCWA5. Hands-on Machine Learning with AWS and NVIDIA๐Ÿ”—https://lnkd.in/dgGvATq26. AWS Fundamentals Specialization๐Ÿ”—https://lnkd.in/dSV9jhRz7. Building Modern Python Applications on AWS๐Ÿ”—https://lnkd.in/dQAinFGy8. AWS... Continue Reading →

Free Spark Course

Don't pay for Apache Spark Course because it is in demand.You can learn for free here......1. Install spark from here....https://lnkd.in/gx_Dc8phhttps://lnkd.in/gg6-8xDz2. Learn spark Basics from here--https://lnkd.in/g-gCpUyihttps://lnkd.in/gkNhMnTZhttps://lnkd.in/gkbVB6YX2.1 Learn spark with Scala from here:https://lnkd.in/gtrZAmn42.2 Learn spark with python from here:https://lnkd.in/gQaeSjbH3. Learn pyspark from here:https://lnkd.in/g6kyihyW4. Work on Spark projects from here..https://lnkd.in/gE8hsyZxhttps://lnkd.in/gwWytS-Qhttps://lnkd.in/gR7DR6_5https://lnkd.in/gzngHhrChttps://lnkd.in/gACn6bK85. Finally list down your projects Here.....https://github.com/I highly recommend... Continue Reading →

Big Data Resources

#Resources Referred by me for Big data Technologies These resources are available for free in YouTube, which helped me to crack CISCO.. and for you to crack product based companies also..1.Hadoop ,sqoop and Hive concepts by Saif shaik:https://lnkd.in/ewyYweTJ2.pyspark concepts in depth by karunakar goud:https://lnkd.in/eNtFkxmd3.Another spark playlist which useful Raja's Data Engineering channel.https://lnkd.in/eqiy7dBS4. Hadoop and Kafka... Continue Reading →

Git Guide

๐—” ๐—ฆ๐—ถ๐—บ๐—ฝ๐—น๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฑ ๐—š๐˜‚๐—ถ๐—ฑ๐—ฒ ๐˜๐—ผ ๐—š๐—ถ๐˜ ๐˜„๐—ถ๐˜๐—ต ๐—™๐—ฅ๐—˜๐—˜ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฟ๐—ฒ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐˜€ ๐Ÿ‘จ๐Ÿ’ปGit, the powerhouse of version control, transforms how developers manage code history and teamwork. Here's a quick breakdown:โ€ข ๐——๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ฒ๐—ฑ ๐—•๐—ฟ๐—ถ๐—น๐—น๐—ถ๐—ฎ๐—ป๐—ฐ๐—ฒ: Git repositories are self-contained, offering flexibility and stability. Every developer holds a complete project history, fostering autonomy.โ€ข ๐—ฆ๐—ป๐—ฎ๐—ฝ๐˜€๐—ต๐—ผ๐˜ ๐— ๐—ฎ๐˜€๐˜๐—ฒ๐—ฟ๐˜†: Git captures "snapshots" of files, enabling easy... Continue Reading →

System Design Blogs

30 Blogs to learn 30 System Design Concepts:1) Content Delivery Network (CDN): https://lnkd.in/gjJrEJeH2) Caching: https://lnkd.in/gC9piQbJ3) Distributed Caching: https://lnkd.in/g7WKydNg4) Latency vs Throughput: https://lnkd.in/g_amhAtN5) CAP Theorem: https://lnkd.in/g3hmVamx6) Load Balancing: https://lnkd.in/gQaa8sXK7) ACID Transactions: https://lnkd.in/gMe2JqaF8) SQL vs NoSQL: https://lnkd.in/g3WC_yxn9) Consistent Hashing: https://lnkd.in/gd3eAQKA10) Database Index: https://lnkd.in/gCeshYVt11) Rate Limiting: https://lnkd.in/gWsTDR3m12) Microservices Architecture: https://lnkd.in/gFXUrz_T13) Strong vs Eventual Consistency: https://lnkd.in/gJ-uXQXZ14) REST vs RPC:... Continue Reading →

SCD 2 with Pyspark

Implementing slowly changing dimension (SCD type2) in Pyspark earlier we saw in SQL https://lnkd.in/dH6j3MWE# Define the schema for the DataFrameschema = StructType([ StructField("id", IntegerType(), True), StructField("name", StringType(), True), StructField("salary", IntegerType(), True), StructField("department", StringType(), True), StructField("active", BooleanType(), True), StructField("start", StringType(), True), StructField("end", StringType(), True)])Employee_data = [ (1,"John", 100, "HR",True,'2023-10-20',None), (2,"Alice", 200, "Finance",True,'2023-10-20',None), (3,"Bob", 300, "Engineering",True,'2023-10-20',None), (4,"Jane",... Continue Reading →

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started