System Design Blogs

30 Blogs to learn 30 System Design Concepts:1) Content Delivery Network (CDN): https://lnkd.in/gjJrEJeH2) Caching: https://lnkd.in/gC9piQbJ3) Distributed Caching: https://lnkd.in/g7WKydNg4) Latency vs Throughput: https://lnkd.in/g_amhAtN5) CAP Theorem: https://lnkd.in/g3hmVamx6) Load Balancing: https://lnkd.in/gQaa8sXK7) ACID Transactions: https://lnkd.in/gMe2JqaF8) SQL vs NoSQL: https://lnkd.in/g3WC_yxn9) Consistent Hashing: https://lnkd.in/gd3eAQKA10) Database Index: https://lnkd.in/gCeshYVt11) Rate Limiting: https://lnkd.in/gWsTDR3m12) Microservices Architecture: https://lnkd.in/gFXUrz_T13) Strong vs Eventual Consistency: https://lnkd.in/gJ-uXQXZ14) REST vs RPC:... Continue Reading →

November 24, 2023 0

Important Services for Data Engineers provided by AWS, Microsoft Azure & GCP

AWS Lambda :AWS Lambda is a serverless compute service allowing running code without provisioning or managing servers, paying only for actual usage.Amazon Redshift :Amazon Redshift is a fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze vast amounts of data using SQL and existing BI tools.AWS Glue :AWS Glue is... Continue Reading →

November 23, 2023 0

SCD 2 with Pyspark

Implementing slowly changing dimension (SCD type2) in Pyspark earlier we saw in SQL https://lnkd.in/dH6j3MWE# Define the schema for the DataFrameschema = StructType([ StructField("id", IntegerType(), True), StructField("name", StringType(), True), StructField("salary", IntegerType(), True), StructField("department", StringType(), True), StructField("active", BooleanType(), True), StructField("start", StringType(), True), StructField("end", StringType(), True)])Employee_data = [ (1,"John", 100, "HR",True,'2023-10-20',None), (2,"Alice", 200, "Finance",True,'2023-10-20',None), (3,"Bob", 300, "Engineering",True,'2023-10-20',None), (4,"Jane",... Continue Reading →

November 23, 2023 0

Mastering SCD Type 2: Handling Historical Changes in SQL

📊 Mastering SCD Type 2: Handling Historical Changes in SQLSlowly Changing Dimensions (SCD) are a crucial part of data warehousing and analytics. Among the different types of SCD, Type 2 is particularly interesting as it allows us to track historical changes in dimensions such as customer data, product information, and more.In a recent project, I... Continue Reading →

November 22, 2023 0

Data Engineering Questions – 1

if your #dataengineering experience grows more than 5 years you expect these questions in your interviews.....1. Explain me the architecture of spark?2. How does internals job execution happens?3. what will happen when you fire the Spark Job?4. How did you tune your jobs?5. Explain optimizations you have used in your project?6. How did you connected... Continue Reading →

November 21, 2023 0

Chatgpt for Interviews

ChatGPT can help you land your dream job twice as fast.Here are 10 powerful ChatGPT prompts will 10X your interview chances.1. Customizing Your ResumeChatGPT prompt: "Can you make changes to my resume to fit the [Job Title] role at [Company]?Here's the job description: [Paste Job Description], and resume: [Paste Resume]."2. Creating a Professional SummaryChatGPT prompt:... Continue Reading →

November 20, 2023 0

Database Indexes

Spend 2 minutes on this post, and you'll gain a good understanding of Database Indexing, which might take much longer to learn otherwise!Imagine managing a large-scale database:Database Size: 𝟱𝟬𝟬 𝗚𝗕Average Query Search Time Without Index: 𝟱 𝘀𝗲𝗰𝗼𝗻𝗱𝘀Number of Records: 𝟱𝟬 𝗺𝗶𝗹𝗹𝗶𝗼𝗻𝗟𝗲𝘁'𝘀 𝗱𝗶𝘃𝗲 𝗶𝗻𝘁𝗼 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱 𝗼𝗳 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴:1️⃣ 𝗪𝗵𝗮𝘁 𝗶𝘀 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴?A database index is... Continue Reading →

November 17, 2023 0

Data Masking in Pyspark

Hide Credit card number:Accept 16 digit credit card number from user and display only last 4 characters of card numberinput :1234567891234567output :************4567We can use Py spark or pythonCode In Pyspark:---------------------from pyspark.sql import SparkSessionfrom pyspark.sql.functions import substring# Create a SparkSessionspark = SparkSession.builder.appName("HideCreditCard").getOrCreate()# Sample input credit card numberinput_cc_number = "1234567891234567"# Hide all characters except the last four... Continue Reading →

November 16, 2023 0

PySpark: Cleansing Data with Regex

🔍 Delving into PySpark: Cleansing Data with Regex Magic!⚙️🌟 Example: Transforming Names with Special Characters 🚀Picture yourself in the realm of data, where you've stumbled upon a trove of Indian names. However, these names are shrouded in a layer of noise, with special characters cluttering them. 🔑 Step 1️⃣: The ChallengeImagine a dataset of Indian... Continue Reading →

November 15, 2023 0

AWS DE Questions

This post details AWS data engineering interview and highlights the most common concepts you can expect to be asked in interview processes.1. Start by providing a concise introduction to your professional projects, emphasizing your role as a data engineer.2. Share your knowledge of cloud platforms (AWS, GCP, Azure) as it pertains to data engineering.3. Discuss... Continue Reading →

November 14, 2023 0