Top 7 GCP #dataengineering tools to learn for FREE.....1. Google Bigqueryhttps://lnkd.in/g4Pvu8aq2. Google cloud Dataprochttps://lnkd.in/gZbJV_8shttps://lnkd.in/gkDeVqtb3. Google cloud Dataprephttps://lnkd.in/gF4G3uAK4. Google Cloud composerhttps://lnkd.in/gjfnYb3whttps://lnkd.in/grGTQYtT5. Google cloud Data Fusionhttps://lnkd.in/gfmxapqP6. Google Data Studiohttps://lnkd.in/gus75kYW7. Google cloud Dataflowhttps://lnkd.in/gyxKXaGU 8. Datawarehousing with Big query https://youtu.be/ZVgt1-LfWW4?si=dPVaNH9LgU-Wfo7s complete GCP Full course for FREE....https://lnkd.in/gi48NG3zResources are short and crispy, and definitely recommended.
Read CSV File by Spark
---------------Spark Interview Questions------------๐How to read a csv file in spark?Method 1: ---------------spark.read.csv("path")df=spark.read.csv("dbfs:/FileStore/small_zipcode.csv")df.show()---+-------+--------+-------------------+-----+----------+|_c0| _c1| _c2| _c3| _c4| _c5|+---+-------+--------+-------------------+-----+----------+| id|zipcode| type| city|state|population|| 1| 704|STANDARD| null| PR| 30100|| 2| 704| null|PASEO COSTA DEL SUR| PR| null|| 3| 709| null| BDA SAN LUIS| PR| 3700|| 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000|| 5| 76177|STANDARD| null| TX| null|+---+-------+--------+-------------------+-----+----------+Method 2 :--------------df=spark.read.format("csv").option("inferSchema",True).option("header",True).option("sep",",").load("dbfs:/FileStore/small_zipcode.csv")df.show()+---+-------+--------+-------------------+-----+----------+|... Continue Reading →
AWS Certification
FREE AWS Certificate by Amazon that you can't miss in 20231. Getting Started with Data Analytics on AWS๐https://lnkd.in/dwRhRAzM2. Practical Data Science on the AWS Cloud Specialization๐https://lnkd.in/d3-3GZbG3. Getting Started with AWS Machine Learning๐https://lnkd.in/dhAp-Vjh4. Introduction to Machine Learning on AWS๐https://lnkd.in/detfDCWA5. Hands-on Machine Learning with AWS and NVIDIA๐https://lnkd.in/dgGvATq26. AWS Fundamentals Specialization๐https://lnkd.in/dSV9jhRz7. Building Modern Python Applications on AWS๐https://lnkd.in/dQAinFGy8. AWS... Continue Reading →
Free Spark Course
Don't pay for Apache Spark Course because it is in demand.You can learn for free here......1. Install spark from here....https://lnkd.in/gx_Dc8phhttps://lnkd.in/gg6-8xDz2. Learn spark Basics from here--https://lnkd.in/g-gCpUyihttps://lnkd.in/gkNhMnTZhttps://lnkd.in/gkbVB6YX2.1 Learn spark with Scala from here:https://lnkd.in/gtrZAmn42.2 Learn spark with python from here:https://lnkd.in/gQaeSjbH3. Learn pyspark from here:https://lnkd.in/g6kyihyW4. Work on Spark projects from here..https://lnkd.in/gE8hsyZxhttps://lnkd.in/gwWytS-Qhttps://lnkd.in/gR7DR6_5https://lnkd.in/gzngHhrChttps://lnkd.in/gACn6bK85. Finally list down your projects Here.....https://github.com/I highly recommend... Continue Reading →
Big Data Resources
#Resources Referred by me for Big data Technologies These resources are available for free in YouTube, which helped me to crack CISCO.. and for you to crack product based companies also..1.Hadoop ,sqoop and Hive concepts by Saif shaik:https://lnkd.in/ewyYweTJ2.pyspark concepts in depth by karunakar goud:https://lnkd.in/eNtFkxmd3.Another spark playlist which useful Raja's Data Engineering channel.https://lnkd.in/eqiy7dBS4. Hadoop and Kafka... Continue Reading →
Git Guide
๐ ๐ฆ๐ถ๐บ๐ฝ๐น๐ถ๐ณ๐ถ๐ฒ๐ฑ ๐๐๐ถ๐ฑ๐ฒ ๐๐ผ ๐๐ถ๐ ๐๐ถ๐๐ต ๐๐ฅ๐๐ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ฟ๐ฒ๐๐ผ๐๐ฟ๐ฐ๐ฒ๐ ๐จ๐ปGit, the powerhouse of version control, transforms how developers manage code history and teamwork. Here's a quick breakdown:โข ๐๐ถ๐๐๐ฟ๐ถ๐ฏ๐๐๐ฒ๐ฑ ๐๐ฟ๐ถ๐น๐น๐ถ๐ฎ๐ป๐ฐ๐ฒ: Git repositories are self-contained, offering flexibility and stability. Every developer holds a complete project history, fostering autonomy.โข ๐ฆ๐ป๐ฎ๐ฝ๐๐ต๐ผ๐ ๐ ๐ฎ๐๐๐ฒ๐ฟ๐: Git captures "snapshots" of files, enabling easy... Continue Reading →
System Design Blogs
30 Blogs to learn 30 System Design Concepts:1) Content Delivery Network (CDN): https://lnkd.in/gjJrEJeH2) Caching: https://lnkd.in/gC9piQbJ3) Distributed Caching: https://lnkd.in/g7WKydNg4) Latency vs Throughput: https://lnkd.in/g_amhAtN5) CAP Theorem: https://lnkd.in/g3hmVamx6) Load Balancing: https://lnkd.in/gQaa8sXK7) ACID Transactions: https://lnkd.in/gMe2JqaF8) SQL vs NoSQL: https://lnkd.in/g3WC_yxn9) Consistent Hashing: https://lnkd.in/gd3eAQKA10) Database Index: https://lnkd.in/gCeshYVt11) Rate Limiting: https://lnkd.in/gWsTDR3m12) Microservices Architecture: https://lnkd.in/gFXUrz_T13) Strong vs Eventual Consistency: https://lnkd.in/gJ-uXQXZ14) REST vs RPC:... Continue Reading →
Important Services for Data Engineers provided by AWS, Microsoft Azure & GCP
AWS Lambda :AWS Lambda is a serverless compute service allowing running code without provisioning or managing servers, paying only for actual usage.Amazon Redshift :Amazon Redshift is a fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze vast amounts of data using SQL and existing BI tools.AWS Glue :AWS Glue is... Continue Reading →
SCD 2 with Pyspark
Implementing slowly changing dimension (SCD type2) in Pyspark earlier we saw in SQL https://lnkd.in/dH6j3MWE# Define the schema for the DataFrameschema = StructType([ StructField("id", IntegerType(), True), StructField("name", StringType(), True), StructField("salary", IntegerType(), True), StructField("department", StringType(), True), StructField("active", BooleanType(), True), StructField("start", StringType(), True), StructField("end", StringType(), True)])Employee_data = [ (1,"John", 100, "HR",True,'2023-10-20',None), (2,"Alice", 200, "Finance",True,'2023-10-20',None), (3,"Bob", 300, "Engineering",True,'2023-10-20',None), (4,"Jane",... Continue Reading →
Mastering SCD Type 2: Handling Historical Changes in SQL
๐ Mastering SCD Type 2: Handling Historical Changes in SQLSlowly Changing Dimensions (SCD) are a crucial part of data warehousing and analytics. Among the different types of SCD, Type 2 is particularly interesting as it allows us to track historical changes in dimensions such as customer data, product information, and more.In a recent project, I... Continue Reading →