Github Repos for Developer

Github Repos for Developer that will reveal thousands of free resources. 1 The Algorithms: https://lnkd.in/dpzAd_vE2 freeCodeCamp : https://lnkd.in/diBh4dVy3 Freely available programming books : https://lnkd.in/d2bwBmU94 100 Days of ML Coding : https://lnkd.in/dz8dDr9U5 project-based tutorials: https://lnkd.in/dSiiKHXK6 Public APIs : https://lnkd.in/dvGamaUM7 Coding Interview University : https://lnkd.in/dhY5pCxH8 Developer Roadmap: https://lnkd.in/dJ4wAG2B9 Computer Science: https://lnkd.in/d2uFXzPz10 30 Seconds of Code : https://lnkd.in/dwDNk_VX11... Continue Reading →

November 11, 2023 0

Learn Apache Spark Step by Step

Learn Apache Spark Step by Step (Follow the Sequence)1. Getting started with Apache Sparkhttps://lnkd.in/gFRpe3-D2. A quick introduction to the Spark APIhttps://lnkd.in/g8Y3tdhX3. Overview of Spark - RDD, accumulators, broadcast variablehttps://lnkd.in/g7fepuFF4. Spark SQL, Datasets, and DataFrames:https://lnkd.in/g3iZp7zk5. PySpark - Processing data with Spark in Pythonhttps://lnkd.in/gBnh6PAi6. Processing data with SQL on the command linehttps://lnkd.in/ggnxDaUu7. Cluster Overviewhttps://lnkd.in/guCQnJnv8. Packaging and deploying... Continue Reading →

November 10, 2023 0

Databricks lakehouse fundamentals

You Can Try Free Databricks lakehouse fundamentals recorded videos and certification. Link is below. https://lnkd.in/gXx2GUH8#lakehouse #databricks

November 10, 2023 0

Basic to Medium #Python (pandas) interview questions for entry level Data analyst role

1. What are the differences between lists and tuples in Python, and how does this distinction relate to Pandas operations?2. What is a DataFrame in Pandas, and how does it differ from a Series?3. Can you explain how to handle missing data in Pandas, including the difference between 'fillna()' and 'dropna()'?4. Describe the process of... Continue Reading →

November 9, 2023 0

Data Engineering Blogs

75 Engineering blogs worth reading to improve your system design:High Scalability https://lnkd.in/eQ4eDw4EEngineering at Meta https://lnkd.in/e8tiSkEv AWS Architecture Blog https://lnkd.in/eEchKJif All Things Distributed https://lnkd.in/emXaQDaS The Nextflix Tech Blog https://lnkd.in/efPuR39b LinkedIn Engineering Blog https://lnkd.in/ehaePQth Uber Engineering Blog https://eng.uber.com/ Engineering at Quora https://lnkd.in/em-WkhJd Pinterest Engineering https://lnkd.in/esBTntjq Lyft Engineering Blog https://eng.lyft.com/ Twitter Engineering Blog https://lnkd.in/evMFNhEs Dropbox Engineering Blog https://dropbox.tech/... Continue Reading →

November 8, 2023 0

Insert, Update and Delete in PySpark

Here's the scenario: We had two data tables, Table_A and Table_B, each containing a "Name" and "Age" column. 📋💡Table_A:Name | Age------------S1 | 20S2 | 23-------------------------Table_B:Name | Age------------S1 | 22S4 | 27Our mission was to determine the differences between these tables and generate a Action between Update, Delete, Insert🚀 and here's the solution we came up... Continue Reading →

November 8, 2023 0

🚀🌐 𝗛𝗼𝘄 𝘁𝗼 𝗕𝘂𝗶𝗹𝗱 𝗮𝗻 𝗘𝘃𝗲𝗻𝘁-𝗗𝗿𝗶𝘃𝗲𝗻 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗘𝗧𝗟 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗼𝗻 𝗔𝗪𝗦

𝗘𝗧𝗟 => 𝗘𝘅𝘁𝗿𝗮𝗰𝘁 | 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺 | 𝗟𝗼𝗮𝗱Event-Driven Serverless ETL Pipelines is a data processing architecture that is used to process large amounts of data in real-time.Here data is processed as soon as it is generated, rather than being stored and processed later.This allows for faster processing times and more efficient use of resources.Here are the... Continue Reading →

November 7, 2023 0

FREE DATA ENGINEERING COURSES ON CLOUD

Data engineering is the backbone of the modern data-driven world. It’s the meticulous process of designing and building systems for collecting, storing, and analyzing data at scale. However, finding comprehensive projects and courses that are also free can be a challenge. To bridge this gap, I’ve created a list of five end-to-end data engineering courses... Continue Reading →

November 6, 2023 0

Pyspark UDF

#PySpark_UDF_with_the_help_of_an_example👉 👉 👉 The most important aspect of Spark SQL & DataFrame is PySpark UDF (i.e., User Defined Function), which is used to expand PySpark's built-in capabilities. UDFs in PySpark work similarly to UDFs in conventional databases.✍ We write a Python function and wrap it in PySpark SQL udf() or register it as udf and... Continue Reading →

November 6, 2023 0

Delete Duplicates in Pyspark Dataframe

#ScenarioThere are two ways to handle row duplication in PySpark dataframes. The distinct() function in PySpark is used to drop/remove duplicate rows (all columns) from a DataFrame, while dropDuplicates() is used to drop rows based on one or more columns. Here’s an example showing how to utilize the distinct() and dropDuplicates() methods- First, we need... Continue Reading →

November 6, 2023 0