The Changing Face of ETL: Event-Driven Architectures for Data Engineers

A presentation at Paris Apache Kafka Meetup in March 2019 in Paris, France by Robin Moffatt

Slide 1

Slide 1

The Changing Face of ETL Event-Driven Architectures for Data Engineers Photo by rmoff @rmoff Build data pipelines better Currently: Inflexible, Slow, & Brittle Technology now exists for scalable, flexible, low-latency pipelines —-Big Data Tech Warsaw: 30 minutes Kafka Paris Meetup: 43 minutes

Slide 2

Slide 2

Photo by Samuel Sianipar on Unsplash Think about pipelines • Traditional ETL - building DW/DL • Integration - building pipelines to feed other systems e.g. IoT -> timeseries, log aggregation, etc

Slide 3

Slide 3

Pipelines always start simple Photo by Khai Sze Ong on Unsplash

Slide 4

Slide 4

Pipelines increase in number Photo by Rainier Ridao on Unsplash

Slide 5

Slide 5

Photo by Rohit Tandon on Unsplash Pipelines grow larger, more complex

Slide 6

Slide 6

Photo by Theodore Moore on Unsplash Pipelines become intertwined, tightly-coupled; difficult to unravel

Slide 7

Slide 7

Photo by Cristian Grecu on Unsplash We’ve all got skeletons in our pipelines that we’re not proud of

Slide 8

Slide 8

@rmoff Photo by Patrick Fore on Unsplash It used to be so simple The Changing Face of ETL: Event-Driven Architectures for Data Engineers Used to be a single DB from a single mandated vendor with a few transactional systems Load it into a single centralised DW

Slide 9

Slide 9

@rmoff Photo by Eugenio Mazzone on Unsplash More Sources Microservices store data where they want Diverse technology, on-premises & cloud SaaS & third-party data The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 10

Slide 10

@rmoff Photo by Tom Barrett on Unsplash More Targets More users of the data. Not just a Data warehouse / data mart / data lake any more Other analytics platforms (e.g. S3, HDFS, Snowflake, BigQuery) Specialised technologies: Graph, Full Text Search, NoSQL The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 11

Slide 11

@rmoff Photo by Kirill on Unsplash More Data The Changing Face of ETL: Event-Driven Architectures for Data Engineers Seems obviously to mention Big Data but we build systems same way we did 40 years ago Orders of magnitude more data - IoT, mobile, app generated More diverse data sets too

Slide 12

Slide 12

@rmoff Batches and Buckets The Changing Face of ETL: Event-Driven Architectures for Data Engineers Despite this, we do things the same way we also did. Batch. We wait … and then we process data. We land it down. We pick it up, we process it. LATENCY. Downstream use dictated by upstream assumptions.

Slide 13

Slide 13

@rmoff Analytics Applications Tell Us What Happened Respond Photo by Deva Darshan from Pexels → an order was placed! → how many orders were placed The Changing Face of ETL: Event-Driven Architectures for Data Engineers Data flows used to be one way. We need to think beyond our silos. It’s the same data Historically, technology was such you had to have this divide. OLTP/OLAP compromise. Batch ETL was the inevitable sticking plaster on top of that.

Slide 14

Slide 14

@rmoff The Changing Face of ETL: Event-Driven Architectures for Data Engineers All systems need to have a way of exchanging data. Analytics generates new data which drives applications Applications need contextual data to improve the user experience Applications need to get data to analytics at lower latency

Slide 15

Slide 15

Photo by NASA on Unsplash @rmoff The Changing Face of ETL: Event-Driven Architectures for Data Engineers Ultimately we need a common way to work with data Systems and teams across a company need to use the same data in a loosely coupled way Not compromise, not crowbaring everything into new shiny technology Adopting a unified platform. Enables both apps and analytics to be better lower latency, more flexible architecture, more scalable The common denomintator here is events

Slide 16

Slide 16

Photo by Mark Kamalov on Unsplash Events All data is built from events Events are the lowest granularity of data Events describe our business

Slide 17

Slide 17

@rmoff “ An event is both: * Notification * State transfer The Changing Face of ETL: Event-Driven Architectures for Data Engineers • “We sold something” -> what did we sell, to whom did we sell it • “Someone clicked a link” -> what did they click, who clicked it

Slide 18

Slide 18

@rmoff A Customer Experience The Changing Face of ETL: Event-Driven Architectures for Data Engineers Events model our business. They can describe real world interactions

Slide 19

Slide 19

@rmoff A Sensor Reading The Changing Face of ETL: Event-Driven Architectures for Data Engineers Events can also be generated by machines

Slide 20

Slide 20

@rmoff Events Basket Bread Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers Usually we start with state and how we’re going to store it. Let’s consider an online retailer. A basket at checkout might look like this. What it doesn’t show is how that basket was created.

Slide 21

Slide 21

Events @rmoff Basket Bread ItemAdd Bread The Changing Face of ETL: Event-Driven Architectures for Data Engineers An event happened - “something was added to the basket” -> what was added

Slide 22

Slide 22

@rmoff Events Basket Bread ItemAdd ItemAdd Bread Baked Beans Baked Beans The Changing Face of ETL: Event-Driven Architectures for Data Engineers Then we add Baked beans

Slide 23

Slide 23

@rmoff Events Basket Bread ItemAdd ItemAdd ItemRemove Bread Baked Beans Baked Beans The Changing Face of ETL: Event-Driven Architectures for Data Engineers Then we change our minds, and take beans out

Slide 24

Slide 24

@rmoff Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers And add tinned spaghetti back in

Slide 25

Slide 25

@rmoff Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers The stream of events describes the behaviour - the interaction with our business If we simply capture state (the final basket) we lose the behaviour information

Slide 26

Slide 26

@rmoff Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers From a stream of events you can derive state Same concept as in analytics. You can aggregate up, but not down. From state we cannot discern the events that created it. But from events we can build state.

Slide 27

Slide 27

@rmoff Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers So all we actually need to accurately model our business is the event stream. Everything else can be built from that. This event stream can be implemented using an event streaming platform, like Apache Kafka

Slide 28

Slide 28

@rmoff Databases The Changing Face of ETL: Event-Driven Architectures for Data Engineers

More interestingly, databases are also streams of events. This might seem unintuitive at first. When most people think about databases, they immediately think of tables.

Slide 29

Slide 29

The Stream/Table Duality Time Stream Account ID Amount 12345 + €50 12345

  • €25 12345 -€60 @rmoff Table Account ID Balance 12345 €50 Account ID Balance 12345 €75 Account ID Balance 12345 €15 The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 30

Slide 30

@rmoff The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers Photo by Bobby Burch on Unsplash

Slide 31

Slide 31

@rmoff What is an Event Streaming Platform? Producer Connectors Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers Apache Kafka is an event streaming platform Persisted event stream Stream processing Integration It’s a distributed system providing horizontal scalability at scale (Netflix, Uber, etc)

Slide 32

Slide 32

@rmoff Immutable Event Log Old New Messages are added at the end of the log The Changing Face of ETL: Event-Driven Architectures for Data Engineers Distributed, Append-only, Immutable event log Persisted New messages written to the end Arranged as Topics, akin to Tables in DB

Slide 33

Slide 33

@rmoff Topics Clicks Orders Customers Topics are similar in concept to tables in a database The Changing Face of ETL: Event-Driven Architectures for Data Engineers Distributed, Append-only, Immutable event log Persisted New messages written to the end Arranged as Topics, akin to Tables in DB

Slide 34

Slide 34

@rmoff Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition The Changing Face of ETL: Event-Driven Architectures for Data Engineers Distributed, Append-only, Immutable event log Persisted New messages written to the end Arranged as Topics, akin to Tables in DB

Slide 35

Slide 35

Messages are just K/V bytes @rmoff plus headers + timestamp Clicks Header Timestamp Key Value The Changing Face of ETL: Event-Driven Architectures for Data Engineers Distributed, Append-only, Immutable event log Persisted New messages written to the end Arranged as Topics, akin to Tables in DB

Slide 36

Slide 36

@rmoff Messages are just K/V bytes With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers Distributed, Append-only, Immutable event log Persisted New messages written to the end Arranged as Topics, akin to Tables in DB

Slide 37

Slide 37

@rmoff Consumers have a position all of their own Old New Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers Consumers read from a position in the log Commit when done Kafka stores their offset

Slide 38

Slide 38

@rmoff Consumers have a position all of their own Old New Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers Other consumers can read the same data Data is not transient Data is persisted according to configured retention settings

Slide 39

Slide 39

@rmoff Consumers have a position all of their own George is here Scan Old New Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers Message Replay No slow consumer problem Topics don’t care who’s reading them. Different consumers can read from different offsets.

Slide 40

Slide 40

@rmoff The Connect API Producer Connectors Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers Kafka Connect for Integration

Slide 41

Slide 41

@rmoff Streaming Integration with Kafka Connect syslog flat file CSV JSON Sources MQTT Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers From databases, flat files, JMS etc

Slide 42

Slide 42

@rmoff Streaming Integration with Kafka Connect Amazon S3 Sinks MQTT Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers To Elasticsearch, HDFS, S3, BigQuery, InfluxDB, Snowflake, etc

Slide 43

Slide 43

@rmoff Streaming Integration with Kafka Connect Amazon S3 syslog flat file CSV JSON Sources Sinks MQTT MQTT Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers Build end-to-end pipelines, or integrate with other services. e.g. stream from DB to drive an microservices e.g. take output from an application and stream it to Elasticsearch

Slide 44

Slide 44

Stream Processing in Kafka Producer Connectors @rmoff Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers Stream processing - transform messages as they pass through Kafka, write back to another Kafka topic

Slide 45

Slide 45

@rmoff Kafka Streams API final StreamsBuilder builder = new StreamsBuilder() .stream(“orders”, Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals(“COMPLETE”) ) .to(“complete_orders”, Produced.with(stringSerde, ordersSerde)); The Changing Face of ETL: Event-Driven Architectures for Data Engineers Kafka Streams is part of Apache Kafka Java library, integrate stream processing capability natively into your application

Slide 46

Slide 46

Stream Processing with KSQL @rmoff CREATE STREAM completedOrders AS SELECT * FROM orders WHERE status=’COMPLETE’; The Changing Face of ETL: Event-Driven Architectures for Data Engineers KSQL is project from Confluent Build stream processing applications declared in a SQL-like language

Slide 47

Slide 47

@rmoff Photo by Ash from Modern Afflatus on Unsplash This is Something New The Changing Face of ETL: Event-Driven Architectures for Data Engineers Reset your assumptions, Squeegie your third eye Don’t do the same just because that’s what you always did • You don’t need a database! Event log can rebuild state if you need it - but you don’t always need it, so why add a database? Human nature to look for the parallel in a new situation to a current one • Driving a new car, where’s the handbrake, where’s the accelerator • event stream is different • where we’re going we don’t need cars

Slide 48

Slide 48

@rmoff Events in Action Review events reviews The Changing Face of ETL: Event-Driven Architectures for Data Engineers Let’s take the example from earlier of our online retailer. Users can leave reviews for products. These get streamed into Kafka. You’ll note that we’re not writing them to a database!

Slide 49

Slide 49

@rmoff Events in Action Review events reviews Operational dashboard The Changing Face of ETL: Event-Driven Architectures for Data Engineers We want to do different things with the reviews. We can use Kafka Connect to stream to Elasticsearch and provide a Kibana dashboard to customer ops team

Slide 50

Slide 50

@rmoff Events in Action Review events reviews Operational dashboard Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers We can stream the same data to our data lake

Slide 51

Slide 51

@rmoff Events in Action reviews Review events CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL; reviews_clean Operational dashboard Filter out bad data Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers We can also tidy up the data as it passes through the system Write transformed data back to Kafka and write that to the targets instead

Slide 52

Slide 52

@rmoff Events in Action Existing apps User data RDBMS txn log users Kafka Connect Kafka The Changing Face of ETL: Event-Driven Architectures for Data Engineers Whilst the review data is useful it only includes the user id, e.g. “42” We want to know more about the users, and this information (name, email, loyalty status) is held in a database Funnily enough databases are also built on top of immutable event logs - the transaction log! We can mine the txn log with Kafka Connect into Kafka.

Slide 53

Slide 53

@rmoff Events in Action Review events reviews users reviews_clean Operational dashboard User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers User data in a Kafka topic, synchronised with the database in realtime Join events as they arrive with the reference information, to improve the data written to the dashboard and data lake

Slide 54

Slide 54

@rmoff Events in Action Review events CREATE CREATE SELECT SELECT STREAM STREAM enriched_reviews reviews_clean AS AS ** FROM reviews_clean r FROM reviews INNER JOIN users u WHERE id IS NOT NULL ON r.userid=u.userid; reviews users reviews_clean enriched_reviews Operational dashboard User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers Use KSQL, or Kafka Streams, to do the transformation. Transform once, use many

Slide 55

Slide 55

@rmoff Events in Action Notification service Review events Operational dashboard User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers We can also drive applications from these events Let’s imagine we want to alert our ops team if an important customer leaves a bad review No need to write the reviews to a data store for the service to then poll and query Instead filter the reviews as they arrive and route them to a new topic Separation of responsibilities. Notification service just subscribes to the topic. Same data can also be sent to ops dashboard in Elasticsearch.

Slide 56

Slide 56

Events in Action Review events @rmoff CREATE STREAM unhappy_vips AS SELECT * FROM enriched_reviews WHERE rating < 3 Notification AND status = ‘Platinum’; service reviews users reviews_clean enriched_reviews Operational unhappy_vips dashboard User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 57

Slide 57

Photo by rmoff The Power of an Event-Driven Architecture There are some powerful things that event-first architecture gives you but just like when you get taken out of the matrix (your warm comfortable way of doing things currently), the shock can be extreme Understand what your pain points are and relate events to those be aware of the other potential of events so as to holistically build the best architecture key benefits accurate modeling of what happened Simplified, more powerful, more flexible archicture Data when you need it scale when you need it

Slide 58

Slide 58

@rmoff Not Everything is a Nail Events RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Best tool for the job

Slide 59

Slide 59

@rmoff Not Everything is a Nail Events Elasticsearch RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Best tool for the job

Slide 60

Slide 60

@rmoff Not Everything is a Nail Graph Events Elasticsearch RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Best tool for the job

Slide 61

Slide 61

Side-by-Side Tech Evaluation @rmoff Events HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Mix & match to find the best No slow consumer problem

Slide 62

Slide 62

Side-by-Side Tech Evaluation Events @rmoff BiqQuery HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Mix & match to find the best No slow consumer problem

Slide 63

Slide 63

Side-by-Side Tech Evaluation @rmoff Snowflake Events BiqQuery HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Mix & match to find the best No slow consumer problem

Slide 64

Slide 64

@rmoff Evolve Data Sources Producer Onpremises Consuming App A Consuming App B The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 65

Slide 65

@rmoff Evolve Data Sources Producer Onpremises Consuming App A Consuming Producer App B Cloud The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 66

Slide 66

@rmoff Evolve Data Sources Consuming App A Consuming Producer App B Cloud The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 67

Slide 67

Tight Coupling != Flexible Orders @rmoff RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc

Slide 68

Slide 68

@rmoff Tight Coupling != Flexible Orders RDBMS HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc

Slide 69

Slide 69

@rmoff Tight Coupling != Flexible Orders RDBMS HDFS App The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc

Slide 70

Slide 70

@rmoff Loose Coupling == Freedom to Evolve RDBMS Orders The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc

Slide 71

Slide 71

@rmoff Loose Coupling == Freedom to Evolve RDBMS Orders HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc

Slide 72

Slide 72

@rmoff Loose Coupling == Freedom to Evolve RDBMS Orders App HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc

Slide 73

Slide 73

@rmoff Transform Once, Use Many: Data Cleansing temp_raw App IoT App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transform once use many Cleansing data - filter out bad records

Slide 74

Slide 74

@rmoff Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT reading 13.05 13.11 13.11 13.04 temp_raw App App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transform once use many Cleansing data - filter out bad records

Slide 75

Slide 75

@rmoff Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT reading 13.05 13.11 13.11 13.04 temp_raw Cleanse App App Cleanse RDBMS Cleanse The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transform once use many Cleansing data - filter out bad records

Slide 76

Slide 76

@rmoff Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 reading 13.05 13.11 13.11 13.04 IoT time_epoch 1551136074 1551136125 1551138129 reading 13.05 13.11 13.04 App temp_raw SENSOR_ID IS NOT NULL Transform once use many Cleansing data - filter out bad records temp_clean sensor_id 42 42 42 App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 77

Slide 77

@rmoff Transform Once, Use Many: Data Enrichment RDBMS Events App 01 Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers Enriching data in user info to an event stream vs each application that needs it directly calling back to the source

Slide 78

Slide 78

@rmoff Transform Once, Use Many: Data Enrichment RDBMS Events App 01 Join Elasticsearch App 02 Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers Enriching data in user info to an event stream vs each application that needs it directly calling back to the source

Slide 79

Slide 79

@rmoff Transform Once, Use Many: Data Enrichment App 01 Events Elasticsearch Join RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Enriching data in user info to an event stream vs each application that needs it directly calling back to the source

Slide 80

Slide 80

Message Payload Compatibility @rmoff Producer Consuming App The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transforming Schema compatibility

Slide 81

Slide 81

Message Payload Compatibility @rmoff Producer Consuming App Producer The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transforming Schema compatibility

Slide 82

Slide 82

Message Payload Compatibility @rmoff Producer Consuming App Producer Triangles to Squares The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transforming Schema compatibility

Slide 83

Slide 83

@rmoff Build Resilient Pipelines with Schemas sales_csv Apply COL1 ID INT COL2 NAME VARCHAR schema App 01 Producer Apply App 02 COL1 ID INT COL2 NAME VARCHAR schema The Changing Face of ETL: Event-Driven Architectures for Data Engineers CSV -> Avro

Slide 84

Slide 84

@rmoff Build Resilient Pipelines with Schemas Schema Registry sales App 01 Producer App 02 sales_csv Apply schema COL1 ID INT COL2 NAME VARCHAR The Changing Face of ETL: Event-Driven Architectures for Data Engineers CSV -> Avro

Slide 85

Slide 85

Photo by rmoff Say NO to brittle pipelines Event streaming platform gives you the freedom to EVOLVE as REQUIREMENTS and TECHNOLOGY change

Slide 86

Slide 86

Photo by Benjamin Lambert on Unsplash @rmoff EVOLVE don’t GAMBLE How do I know what we want to use? at scale? next month? year? The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 87

Slide 87

Latency requirements Photo by Benjamin Lambert on Unsplash Users of the data ! Photo by Benjamin Lambert on Unsplash GAMBLE ON: Latency requirements Number of applications for the data Data fidelity / event stream -> behaviour Scale Scale Data fidelity

Slide 88

Slide 88

@rmoff App App App App cache monitoring cache MQ DWH security MQ search Hadoop The Changing Face of ETL: Event-Driven Architectures for Data Engineers Here is what I’ve seen. There are some apps that use an enterprise MQ, data moves around using custom ETL scripts in batch. Over time, this ad-hoc way of connecting every new type of source to every type of destination, where everything talks to everything else, just doesn’t scale.

Slide 89

Slide 89

@rmoff App App App App request-response changelogs App KAFKA App DWH Hadoop App messaging OR stream App processing streaming data pipelines The Changing Face of ETL: Event-Driven Architectures for Data Engineers

To make event-centric thinking available at a company-wide level is very much why we built Apache Kafka. We had a very particular vision for what a company would look like if you reimagined it’s use of data around streams of events

Slide 90

Slide 90

Photo by rmoff Events model the real world Don’t be afraid. It’s going to happen. “all I want to know is how many beans I sold” -> events can be aggregated to tell you that but if when your business wants to exploit the event stream data to understand customer interactions with the business then events provide these event streams give a perfect foundation for building “normal” ETL upon, so by adopting it now you prepare yourself for the future

Slide 91

Slide 91

Event streaming platform Data persistence Flexibility & scalability Photo by rmoff Native stream processing Data when you need it Don’t be afraid. It’s going to happen. “all I want to know is how many beans I sold” -> events can be aggregated to tell you that but if when your business wants to exploit the event stream data to understand customer interactions with the business then events provide these event streams give a perfect foundation for building “normal” ETL upon, so by adopting it now you prepare yourself for the future

Slide 92

Slide 92

@rmoff http://cnfl.io/book-bundle The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 93

Slide 93

Photo by rmoff confluent.io/download http://cnfl.io/book-bundle http://cnfl.io/slack EOF @rmoff

Slide 94

Slide 94

@rmoff Resources • CDC Spreadsheet #EOF • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / [email protected]) can help with introductions on a given sales op The Changing Face of ETL: Event-Driven Architectures for Data Engineers