Internals of Caching in PysparkCaching DataFrames in PySpark is a powerful technique to improve query performance. However, there's a subtle difference in how you can cache DataFrames in PySpark.cached_df = orders_df.cache() and orders_df.cache() are two common approaches & they serve different purposes.The choice between these two depends on your specific use case and whether you... Continue Reading →
Google Cloud Associate Cloud engineer(ACE) Resources
I receive 10+ DMs daily regarding "How to start their journey in Google Cloud ". So I have curated a complete list of resources for The Google Cloud Associate Cloud engineer(ACE).1. Basics of Linux commands - https://lnkd.in/dN5BPhTq2. File system - https://lnkd.in/dkEAA_qU3. Linux Files Hierarchy Structure - https://lnkd.in/d8hQR5m44. Linux Directory Hierarchy Structure- https://lnkd.in/dWMNd6J95. Associate Cloud Engineer... Continue Reading →
Big Data Learning Resources
Complete Plan to learn Big Data Step by Step (All Free resources Included) by Sumit Sir.1. Learn SQL Basics - https://lnkd.in/g9NEJMVESQL will be used at a lot of places - Hive/Spark SQL/RDBMS queriesJoins & windowing functions are very important2. Learn Programming/Python for Data Engineering - https://lnkd.in/gr6fFPdULearn Python to an extent required for Data Engineers.3. Learn... Continue Reading →
Cloud Services in one line
If you are an aspiring Data Engineer then you must know these cloud services w.r.t AWS or AZURE or GCP π Save this post for future reference ...1οΈβ£ Amazon Web Services (AWS)π AWS Data Pipeline: For creating complex data processing workloads.π AWS Glue: Our favourite fully managed ETL service.πΎ Amazon S3: An object storage service... Continue Reading →
Google Cloud Developerβs Cheat Sheet
All Products Compute Cloud Run: Serverless for containerized applications π π Cloud Functions: Event-driven serverless functions π π Compute Engine: VMs, GPUs, TPUs, Disks π π Kubernetes Engine (GKE): Managed Kubernetes/containers π π App Engine: Managed app platform π π Bare Metal Solution: Hardware for specialized workloads π Preemptible VMs: Short-lived compute instances π π Shielded VMs: Hardened VMs π π Sole-tenant nodes: Dedicated physical servers π π Storage Cloud Filestore: Managed... Continue Reading →
INTERVIEW QUESTIONS ON APACHE SPARK ,PYSPARK FOR DATAENGINEERS
SET OF 82 QUESTIONS 1. How is Apache Spark different from MapReduce? Apache SparkMapReduceSpark processes data in batches as well as in real-timeMapReduce processes data in batches onlySpark runs almost 100 times faster than Hadoop Map ReduceHadoop MapReduce is slower when it comes to large sc processingSpark stores data in the RAM i.e. in-memory. So,... Continue Reading →
Google Cloud Compute Engine vs App Engine
Google Cloud Platform provides a wide range of computing services that target broad categories of user needs. The Google Cloud Platform provides mainly 6 types of compute options: β App Engine Compute Engine Kubernetes Engine Cloud Functions Cloud Run VMware Engine Now letβs talk about some of these services in brief. Compute Engine The Compute... Continue Reading →
Google Cloud Platform Services Summary
The complete list of services that form Google Cloud Platform is shown below. While Google offers many other services and APIs, only the services below are covered by the Google Cloud Platform terms of service, service level agreements (if applicable), and support offerings. Offerings identified below as Software or Premium Software are not Services under... Continue Reading →
Google Cloud GCloud Commands Cheat Sheet
Google Cloud Config PURPOSECOMMANDList projectsgcloud config list, gcloud config list projectList projectsgcloud config list, gcloud config list projectShow project infogcloud compute project-info describeSwitch projectgcloud config set project <project-id>Set the active accountgcloud config set account <ACCOUNT>Set default regiongcloud config set compute/region us-westSet default zonegcloud config set compute/zone us-west1-bList configurationsgcloud config configurations listActivate configurationgcloud config configurations activate Google Cloud... Continue Reading →
How to get Higher package
Must Have Notes:Never give expected ctc as a number at the beginning [Specially through phone call]Don't ask unrealistic numbersDon't show poor attitude while negotiating Don't talk about financial commitments [No company or recruiter interested in your financial conditions]Don't fall into any argument or fight [Politely refuse the offer and say thank you for this opportunity]Never... Continue Reading →