flowchart LR
Source --> Extract --> Transform --> Load
flowchart LR
Source --> Extract --> Load --> Transform --> Load2
Data source
Data source is either in:
- Batch
- Data is gathered somewhere and then processed in one go.
- Interval processing
- Large volume of data
- Some latency will be there
- Serial
- Data keeps coming. We process it as it arrives
- We will usually have some hub which collects it and then passes it along. like IoT hub, etc.
- Low latency
ETL
- Extract Transform Load
- ETL used to be in earlier days when storage was costlier
- So we would transform the data and then go to load stage
- Transform is either
- Mapping
- Wrangling
- Complex (HD insights, etc)
- Load is stored in DB, or 202404261931 Azure Data warehouse and analytics
- And then finally analyze phase
ELT
- Storage is cheap now.
- So we store it in something like 202404121149 Azure Data Lake
- The benefit is that in future we may want to transform it in some new way
- But if we transform like in ETL then that data is lost and we can’t do anything
- Now we have loaded it in 202404121149 Azure Data Lake so we can use it later as needed
references:
Subscribe to NordLetter
A weekly newsletter on living in Finland.