The Changing Face of ETL Event-Driven Architectures for Data Engineers Photo by rmoff
@rmoff
Slide 2
Photo by Samuel Sianipar on Unsplash
Slide 3
Photo by Khai Sze Ong on Unsplash
Slide 4
Photo by Rainier Ridao on Unsplash
Slide 5
Photo by Rohit Tandon on Unsplash
Slide 6
Photo by Theodore Moore on Unsplash
Slide 7
Photo by Cristian Grecu on Unsplash
Slide 8
Photo by Patrick Fore on Unsplash
It used to be so simple
@rmoff
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 9
@rmoff
Photo by Eugenio Mazzone on Unsplash
More Sources
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 10
@rmoff
Photo by Tom Barrett on Unsplash
More Targets
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 11
@rmoff
Photo by Kirill on Unsplash
More Data
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 12
@rmoff
Batches and Buckets The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 13
@rmoff
Applications
Respond
Photo by Deva Darshan from Pexels
→ an order was placed!
Analytics
Tell Us What Happened → how many orders were placed
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 14
@rmoff
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 15
Photo by NASA on Unsplash
@rmoff
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 16
Photo by Mark Kamalov on Unsplash
Events
Slide 17
@rmoff
“
An event is both: * Notification * State transfer The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 18
@rmoff
A Customer Experience
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 19
@rmoff
A Sensor Reading
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 20
@rmoff
Events
Basket Bread
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 21
Events
@rmoff
Basket Bread
ItemAdd Bread
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 22
@rmoff
Events
Basket Bread
ItemAdd
ItemAdd
Bread
Baked Beans
Baked Beans
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 23
@rmoff
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
Bread
Baked Beans
Baked Beans
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 24
@rmoff
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 25
@rmoff
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 26
@rmoff
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 27
@rmoff
Events
Basket Bread
ItemAdd
ItemAdd
ItemRemove
ItemAdd
Bread
Baked Beans
Baked Beans
Tinned Spaghetti
Tinned Spaghetti
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 28
@rmoff
What is an Event Streaming Platform? Producer
Connectors
Consumer
The Log
Connectors
Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 29
@rmoff
Immutable Event Log
Old
New
Messages are added at the end of the log The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 30
@rmoff
Consumers have a position all of their own
New
Old
Sally is here
Scan
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 31
@rmoff
Consumers have a position all of their own
New
Old
Fred is here
Scan
Sally is here
Scan
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 32
@rmoff
Consumers have a position all of their own George is here
Scan
New
Old
Fred is here
Scan
Sally is here
Scan
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 33
@rmoff
The Connect API Producer
Connectors
Consumer
The Log
Connectors
Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 34
@rmoff
Streaming Integration with Kafka Connect syslog flat file CSV JSON
Sources
MQTT
Tasks
Workers
Kafka Connect Kafka Brokers
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 35
@rmoff
Streaming Integration with Kafka Connect
Amazon S3
Sinks MQTT
Tasks
Workers
Kafka Connect Kafka Brokers
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 36
@rmoff
Streaming Integration with Kafka Connect
Amazon S3
syslog flat file CSV JSON
Sources
Sinks
MQTT
MQTT
Tasks
Workers
Kafka Connect Kafka Brokers
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 37
Stream Processing in Kafka Producer
Connectors
@rmoff
Consumer
The Log
Connectors
Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 38
@rmoff
Kafka Streams API
final StreamsBuilder builder = new StreamsBuilder() .stream(“orders”, Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals(“COMPLETE”) ) .to(“complete_orders”, Produced.with(stringSerde, ordersSerde));
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 39
Stream Processing with KSQL
@rmoff
CREATE STREAM completedOrders AS SELECT * FROM orders WHERE status=’COMPLETE’;
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 40
@rmoff
Photo by Ash from Modern Afflatus on Unsplash
This is Something New
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 41
@rmoff
Events in Action Review events
reviews
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 42
@rmoff
Events in Action Review events
reviews
Operational dashboard
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 43
@rmoff
Events in Action Review events
reviews
Operational dashboard
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 44
@rmoff
Events in Action Review events
CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL; reviews reviews_clean Operational dashboard
Filter out bad data
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 45
@rmoff
Events in Action Existing apps
User data
RDBMS txn log
users
Kafka Connect Kafka The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 46
@rmoff
Events in Action Review events
reviews
users
reviews_clean Operational dashboard
User data Join events to users, and filter
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 47
@rmoff
Events in Action Review events
CREATE CREATE SELECT SELECT
STREAM enriched_reviews AS STREAM reviews_clean AS ** FROM reviews_clean r FROM reviews INNER JOIN users u WHERE id IS NOT NULL ON r.userid=u.userid;
reviews
users
reviews_clean enriched_reviews Operational dashboard
User data Join events to users, and filter
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 48
@rmoff
Events in Action Notification service Review events Operational dashboard
User data Join events to users, and filter
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 49
Events in Action Review events
@rmoff
CREATE STREAM unhappy_vips AS SELECT * FROM enriched_reviews WHERE rating Notification< 3 AND status = ‘Platinum’; service reviews users reviews_clean enriched_reviews Operational dashboard unhappy_vips
User data Join events to users, and filter
Data lake
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 50
Photo by rmoff
The Power of an Event-Driven Architecture
Slide 51
@rmoff
Not Everything is a Nail
Events
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 52
@rmoff
Not Everything is a Nail
Events
Elasticsearch
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 53
@rmoff
Not Everything is a Nail Graph
Events
Elasticsearch
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 54
Side-by-Side Tech Evaluation
@rmoff
Events
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 55
Side-by-Side Tech Evaluation
Events
@rmoff
BiqQuery
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 56
Side-by-Side Tech Evaluation
@rmoff
Snowflake
Events
BiqQuery
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 57
@rmoff
Evolve Data Sources Producer Onpremises
Consuming App A
Consuming App B
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 58
@rmoff
Evolve Data Sources Producer Onpremises
Producer
Consuming App A
Consuming App B
Cloud
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 59
@rmoff
Evolve Data Sources Consuming App A
Producer
Consuming App B
Cloud
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 60
Tight Coupling != Flexible
Orders
@rmoff
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 61
@rmoff
Tight Coupling != Flexible
Orders
RDBMS
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 62
@rmoff
Tight Coupling != Flexible
Orders
RDBMS
HDFS
App
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 63
@rmoff
Loose Coupling == Freedom to Evolve RDBMS
Orders
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 64
@rmoff
Loose Coupling == Freedom to Evolve RDBMS
Orders
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 65
@rmoff
Loose Coupling == Freedom to Evolve RDBMS
Orders
App
HDFS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 66
@rmoff
Transform Once, Use Many: Data Cleansing temp_raw App
IoT
App
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 67
@rmoff
Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT
reading 13.05 13.11 13.11 13.04
temp_raw App
App
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 68
@rmoff
Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT
reading 13.05 13.11 13.11 13.04
temp_raw Cleanse App
App
Cleanse
RDBMS Cleanse
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 69
@rmoff
Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129
reading 13.05 13.11 13.11 13.04
IoT
temp_clean sensor_id 42 42 42 App
time_epoch 1551136074 1551136125 1551138129
reading 13.05 13.11 13.04
App
temp_raw
SENSOR_ID IS NOT NULL
RDBMS
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 70
Photo by rmoff
Say NO to brittle pipelines
Slide 71
Photo by Benjamin Lambert on Unsplash
@rmoff
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 72
Latency requirements
Photo by Benjamin Lambert on Unsplash
Users of the data
!
Photo by Benjamin Lambert on Unsplash
Scale Data fidelity
Slide 73
@rmoff
App
App
App
App
cache
monitoring
cache
MQ
DWH
security
MQ
search
Hadoop
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 74
@rmoff
App
App
App
App
request-response
changelogs App
App
KAFKA
App
App DWH
Hadoop
messaging OR stream processing
streaming data pipelines
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 75
Photo by rmoff
Events model the real world
Slide 76
Event streaming platform
Photo by rmoff
Native stream processing Data when you need it
Data persistence Flexibility & scalability
Slide 77
@rmoff
http://cnfl.io/book-bundle
The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Slide 78
Photo by rmoff
confluent.io/download
http://cnfl.io/book-bundle http://cnfl.io/slack
@rmoff
Slide 79
@rmoff
Resources • CDC Spreadsheet
#EOF
• Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions
• BD team (#partners / [email protected]) can help with introductions on a given sales op
The Changing Face of ETL: Event-Driven Architectures for Data Engineers