๐๐ง๐ => ๐๐
๐๐ฟ๐ฎ๐ฐ๐ | ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ | ๐๐ผ๐ฎ๐ฑ
Event-Driven Serverless ETL Pipelines is a data processing architecture that is used to process large amounts of data in real-time.
Here data is processed as soon as it is generated, rather than being stored and processed later.
This allows for faster processing times and more efficient use of resources.
Here are the steps involved in building an event-driven serverless ETL pipeline:
๐ ๐ฆ๐๐ฒ๐ฝ ๐ญ: ๐๐ฎ๐๐ฎ ๐๐ป๐ด๐ฒ๐๐๐ถ๐ผ๐ป
————————————
– The journey begins with the ingestion of data into a scalable data store like Amazon S3
– Here Amazon S3 serves as the primary data store for all your data. ๐๐๏ธ
๐ ๐ฆ๐๐ฒ๐ฝ ๐ฎ: ๐๐ฎ๐๐ฎ ๐๐ฎ๐๐ฎ๐น๐ผ๐ด๐ถ๐ป๐ด
————————————–
– Next, the ingested data needs to be cataloged based on its schema.
– This is where AWS Glue Data Catalog comes into play
– It automate and scale this process while applying security access rules. ๐ก๏ธ๐
๐ ๐ฆ๐๐ฒ๐ฝ ๐ฏ: ๐ง๐ฟ๐ถ๐ด๐ด๐ฒ๐ฟ๐ถ๐ป๐ด ๐๐ฎ๐๐ฎ ๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐๐๐ถ๐ป๐ด
——————————————————–
– To avoid paying for idle resources, the data processing is triggered upon data arrival in the S3 bucket using AWS Lambda function.
– This function starts an AWS Glue crawler that catalogs the data. ๐๐
๐ ๐ฆ๐๐ฒ๐ฝ ๐ฐ: ๐ ๐ฎ๐ป๐ฎ๐ด๐ถ๐ป๐ด ๐๐ฎ๐ฟ๐ด๐ฒ ๐ฉ๐ผ๐น๐๐บ๐ฒ๐ ๐ผ๐ณ ๐๐ฎ๐๐ฎ
————————————————————
– To manage large volumes of Amazon S3 triggered invocations, Amazon SQS is used
– Ensuring the ETL data pipeline can run jobs in parallel when required. ๐๐
๐ ๐ฆ๐๐ฒ๐ฝ ๐ฑ: ๐ฆ๐๐ฎ๐ฟ๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐๐ง๐ ๐๐ผ๐ฏ
———————————————-
– Once the AWS Glue crawler finishes storing metadata in the AWS Glue Data Catalog, a second Lambda function can be invoked using an Amazon EventBridge event rule.
– This function starts an AWS Glue ETL job to process and output data into another Amazon S3 bucket. ๐๐ฏ
๐ ๐ฆ๐๐ฒ๐ฝ ๐ฒ: ๐ ๐ผ๐ฑ๐ถ๐ณ๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐๐ง๐ ๐๐ผ๐ฏ
————————————————
– The ETL job can be modified to achieve objectives like more granular partitioning, compression, or enriching of the data.
– The result?
– An event-driven, scalable, and highly automated ETL data pipeline with no servers or underlying infrastructure to manage! ๐๐
๐ ๐ฆ๐๐ฒ๐ฝ ๐ณ: ๐ก๐ผ๐๐ถ๐ณ๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป
———————————-
– Finally, as soon as the ETL job finishes, another EventBridge rule sends an email notification using an Amazon Simple Notification Service (SNS) topic
– This indicates that your data was successfully processed. ๐ง๐
I hope you liked this post, follow me for more such technical content around Data Engineering and AWS Cloud
#dataengineering #awsdataengineer #bigdata #etl

Leave a comment