Dynamic Column handling in file

โ€----------Spark Interview Questions-------------๐Ÿ“Important Note : This scenario is bit complex I would suggest go through it multiple times. (code implementation is in #databricks )๐Ÿ“•how to handle or how to read variable/dynamic number of columns details?id,name,location,emaild,phone1, aman2,abhi,Delhi3,john,chennai,sample123@gmail.com,688080in a scenario we are geeting not complete columnar information but vary from row to row.pyspark code :===============dbutils.fs.put("/dbfs/tmp/dynamic_columns.csv","""id,name,location,emaild,phone1, aman2,abhi,Delhi3,john,chennai,sample123@gmail.com,688080""")now lets... Continue Reading →

Azure Data Engineering by Deepak Goyal

List of All azure / data / devops /ML Interview Q& ASave & Share.1. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—™๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dVzCmzcZ2. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ๐—ฏ๐—ฟ๐—ถ๐—ฐ๐—ธ๐˜€ ๐—ฆ๐—ฐ๐—ฒ๐—ป๐—ฎ๐—ฟ๐—ถ๐—ผ ๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dUCf8qf8๐Ÿฏ. ๐—ฅ๐—ฒ๐—ฎ๐—น๐˜๐—ถ๐—บ๐—ฒ ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—™๐—ฎ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/ex_Vixh๐Ÿฐ.๐—Ÿ๐—ฎ๐˜๐—ฒ๐˜€๐˜ ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฒ๐˜ƒ๐—ข๐—ฝ๐˜€ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/g7PdATm๐Ÿฑ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐—”๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ ๐——๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ๐˜† ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dtWYXTKN๐Ÿฒ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—Ÿ๐—ฎ๐—ธ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dgr-uGQB๐Ÿณ. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐—”๐—ฝ๐—ฝ ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฐ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dP4Afqkb๐Ÿด. ๐—”๐˜‡๐˜‚๐—ฟ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค&๐—”https://lnkd.in/dj_m2yeQ๐Ÿต.... Continue Reading →

AWS Solution Architect in 2 months – Road Map

8-week journey toward becoming an AWS Solutions Architect AssociateHere's the breakdown:๐Ÿš€ Week 1: AWS Fundamentals- Introduction to AWS: Discover the basics and the core services that form the backbone of AWS- AWS Free Tier Account: Learn how to set up an account to leverage AWS's free offerings.- AWS Management Console: Navigate the user interface to... Continue Reading →

Spotify Cloud Project

Spotify Stream Analytics ๐ŸŽฅBuilt a synthetic data pipeline for real-time music insights, stunning dashboards, and actionable decisions.๐ŸŒŸ Project Overview:Addresses limited Spotify stream data access with a synthetic pipeline. Realistic events stream to Kafka, processed by Spark, stored in Deltalake. Airflow ensures a seamless pipeline, and dbt transforms data into captivating dashboards.๐Ÿ“Œ Key Features:Streamlined Infrastructure: Scripts... Continue Reading →

Caching in Pyspark

Internals of Caching in PysparkCaching DataFrames in PySpark is a powerful technique to improve query performance. However, there's a subtle difference in how you can cache DataFrames in PySpark.cached_df = orders_df.cache() and orders_df.cache() are two common approaches & they serve different purposes.The choice between these two depends on your specific use case and whether you... Continue Reading →

Google Cloud Associate Cloud engineer(ACE) Resources

I receive 10+ DMs daily regarding "How to start their journey in Google Cloud ". So I have curated a complete list of resources for The Google Cloud Associate Cloud engineer(ACE).1. Basics of Linux commands - https://lnkd.in/dN5BPhTq2. File system - https://lnkd.in/dkEAA_qU3. Linux Files Hierarchy Structure - https://lnkd.in/d8hQR5m44. Linux Directory Hierarchy Structure- https://lnkd.in/dWMNd6J95. Associate Cloud Engineer... Continue Reading →

Cloud Services in one line

If you are an aspiring Data Engineer then you must know these cloud services w.r.t AWS or AZURE or GCP ๐Ÿ‘‡ Save this post for future reference ...1๏ธโƒฃ Amazon Web Services (AWS)๐Ÿ›  AWS Data Pipeline: For creating complex data processing workloads.๐Ÿ“Š AWS Glue: Our favourite fully managed ETL service.๐Ÿ’พ Amazon S3: An object storage service... Continue Reading →

Google Cloud Developerโ€™s Cheat Sheet

All Products Compute Cloud Run: Serverless for containerized applications ๐Ÿ”— ๐Ÿ“„ Cloud Functions: Event-driven serverless functions ๐Ÿ”— ๐Ÿ“„ Compute Engine: VMs, GPUs, TPUs, Disks ๐Ÿ”— ๐Ÿ“„ Kubernetes Engine (GKE): Managed Kubernetes/containers ๐Ÿ”— ๐Ÿ“„ App Engine: Managed app platform ๐Ÿ”— ๐Ÿ“„ Bare Metal Solution: Hardware for specialized workloads ๐Ÿ”— Preemptible VMs: Short-lived compute instances ๐Ÿ”— ๐Ÿ“„ Shielded VMs: Hardened VMs ๐Ÿ”— ๐Ÿ“„ Sole-tenant nodes: Dedicated physical servers ๐Ÿ”— ๐Ÿ“„ Storage Cloud Filestore: Managed... Continue Reading →

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started