In today’s data-driven landscape, integrating diverse data sources into a cohesive system is a complex challenge. As an architect, I set out to design a solution that could seamlessly connect on-premises databases, cloud applications and file systems to a centralized data warehouse. Traditional ETL (extract, transform, load) processes often felt rigid and inefficient, struggling to keep pace with the rapid evolution of data ecosystems. My vision was to create an architecture that not only scaled effortlessly but also adapted dynamically to new requirements without constant manual rework.
The result of this vision is a metadata-driven ETL framework built on Azure Data Factory (ADF). By leveraging metadata to define and drive ETL processes, the system offers unparalleled flexibility and efficiency. In this article, I’ll share the thought process behind this design, the key architectural decisions I made and how I addressed the challenges that arose during its development.
Recognizing the need for a new approach
The proliferation of data sources — ranging from relational databases like SQL Server and Oracle to SaaS platforms like Salesforce and file-based systems like SFTP — exposed the limitations of conventional ETL strategies. Each new source typically requires a custom-built pipeline, which quickly became a maintenance burden. Adjusting these pipelines to accommodate shifting requirements was time-consuming and resource-intensive. I realized that a more agile and sustainable approach is essential.