Stream CDC data with Amazon Redshift streaming, Amazon MSK and Debezium Connector

Episode 3: Redshift Serverless Streaming Ingestion   Introduction In the previous episodes, I covered the overall architecture design for this project and Debezium connector configuration for our CDC streaming pipeline. Now I’ll complete the series by diving deep into Amazon Redshift Serverless streaming ingestion — the final piece that enables real-time analytics on your CDC data. This episode focuses on the practical […]

Stream CDC data with Amazon Redshift streaming, Amazon MSK and Debezium Connector

Episode 2: Configuring Debezium Connector for Reliable CDC   Introduction In the previous episode, I covered the overall architecture and infrastructure setup for our CDC streaming pipeline. Now I’ll dive deep into Debezium — the open-source platform that captures row-level database changes in real-time and streams them to Kafka topics through MSK Connect. This episode focuses on the […]

Stream CDC data with Amazon Redshift streaming, Amazon MSK and Debezium Connector

Episode 1: Designing the End-to-End CDC Architecture and IaC Setup Introduction In today’s data-driven landscape, organizations need real-time insights from their operational databases to make informed decisions quickly. I developed a comprehensive Change Data Capture (CDC) streaming pipeline that captures database changes from Aurora MySQL and streams them in real-time to Amazon Redshift data warehouse for analytics. This solution […]

QuickSight Quickstart: From Blank AWS Account to Published Dashboards in One Command

The missing link for enterprise Quicksight Development   The Problem — Automating QuickSight Is Still Awkward QuickSight is pay-as-you-go BI, but its deployment story lags behind the rest of AWS. A single dashboard hides dozens of interlocked artefacts — data sources, datasets, analyses, themes, RDS/VPC plumbing, IAM policies, and invisible dependencies such as SPICE refresh schedules. […]

Databricks Asset Bundles

Background Photo by Luis Vaz on Unsplash A first look at the tool’s maturity, benefits and current limitations Databricks offers several ways to deploy resources like jobs, notebooks, clusters — each suited to different levels of automation, complexity and levels of control. Whether it’s managing infrastructure, orchestrating jobs, or promoting code across environments, choosing the right deployment tool is […]

The Art of Data Engineering: Applying Sun Tzu’s Principles

How Ancient Wisdom Can Transform Modern Data Practices Sun Tzu’s “The Art of War” offers timeless wisdom that transcends the battlefield, providing insights applicable to various domains, including data engineering. In this field, success is not merely about coming up with innovative ideas or implementing initial solutions. True success is measured by how effectively these […]

Databricks First Look: August 2024 Release, Deep Dive into Lakehouse Federation

Exploring the Power of Seamless Data Integration and Enhanced Security with Databricks Introduction In the fast-evolving landscape of data analytics, staying updated with the latest platform enhancements is crucial. The August 2024 release from Databricks brings a suite of impactful updates designed to boost security, compliance, and performance. Among these, Lakehouse Federation stands out, offering […]

Risk Management in Cloud Data Projects: Strategies for Success

Introduction to Risk Management in Cloud Data Projects Risk management is a critical component of any cloud data project. As organizations increasingly rely on cloud technologies to store, process, and analyze data, understanding the unique risks associated with these projects becomes essential. Cloud data projects involve various stakeholders and technologies, which introduce complexities in data […]