A data integration process that pulls data from one or more source systems, transforms it in an external server, and then loads it into a target repository. This can be accomplished in batches or via streaming.
Added Perspectives
Simply speaking, ETL aimed to displace hand-written code for populating data warehouses (and marts) with automated procedures both in the initial build and, more importantly, in the ongoing changes needed as business needs evolved. Procedure design and editing is via graphical drag-and-drop. Metadata describing the steps and actions is stored and reused. This metadata is used either to drive a data processing engine or is transformed into code prior to execution.
- Barry Devlin in Automating Data Preparation in a Digitalized World, Part 1
March 10, 2016 (Blog)
We’ll define Big ETL as having a majority of the following properties (much like the familiar 4 Vs): The need to process “really big data” – your data volume is measured in multiple Terabytes or greater. The data includes semi-structured or unstructured types – JSON, Avro, etc. You are interacting with non-traditional data storage platforms – NoSQL, Hadoop, and other distributed file systems (S3, Gluster, etc).
- Joe Caserta in Big Data Requires Big ETL
February 26, 2015 (Blog)
Extract-Transform-Load (ETL)... is the most widely used data pipeline pattern. From the early 1990’s it was the de facto standard to integrate data into a data warehouse, and it continues to be a common pattern for data warehousing, data lakes, operational data stores, and master data hubs. Data is extracted from a data store such as an operational database, then transformed to cleanse, standardize, and integrate before loading into a target database. ETL processing is executed as scheduled batch processing, and data latency is inherent in batch processing. Mini-batch and micro-batch processing help to reduce data latency but zero-latency ETL is not practical. ETL works well when complex data transformations are required. It is especially well-suited for data integration when all data sources are not ready at the same time. As each individual source is ready, the data source is extracted independently of other sources. When all source data extracts are complete, processing continues with the transformation and loading of the entire set of data.
- Dave Wells in Data Pipeline Design Patterns
May 6, 2020 (Blog)
Relevant Content
Jun 19, 2016 - In an era of ever increasing volumes of ever dirtier data, the old concepts of data warehouses and ETL must be extended far beyond their current...
Related Terms
Unleash The Power Of Your Data
Providing modern, comprehensive data solutions so you can turn data into your most powerful asset and stay ahead of the competition.
Learn how we can help your organization create actionable data strategies and highly tailored solutions.
© Datalere, LLC. All rights reserved