A presentation at Strata Data Conference, London in in London, UK by Robin Moffatt
The Changing Face of ETL Event-Driven Architectures for Data Engineers Photo by rmoff @rmoff Build data pipelines better Currently: Inflexible, Slow, & Brittle Technology now exists for scalable, flexible, low-latency pipelines —-Big Data Tech Warsaw: 30 minutes Kafka Paris Meetup: 43 minutes
Photo by Samuel Sianipar on Unsplash Think about pipelines • Traditional ETL - building DW/DL • Integration - building pipelines to feed other systems e.g. IoT -> timeseries, log aggregation, etc
Pipelines always start simple Photo by Khai Sze Ong on Unsplash
Pipelines increase in number Photo by Rainier Ridao on Unsplash
Photo by Rohit Tandon on Unsplash Pipelines grow larger, more complex
Photo by Theodore Moore on Unsplash Pipelines become intertwined, tightly-coupled; difficult to unravel
Photo by Cristian Grecu on Unsplash We’ve all got skeletons in our pipelines of which we’re not proud
@rmoff #stratadata Photo by Patrick Fore on Unsplash It used to be so simple The Changing Face of ETL: Event-Driven Architectures for Data Engineers Used to be a single DB from a single mandated vendor with a few transactional systems Load it into a single centralised DW
@rmoff #stratadata Photo by Eugenio Mazzone on Unsplash More Sources Microservices store data where they want Diverse technology, on-premises & cloud SaaS & third-party data The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Photo by Tom Barrett on Unsplash More Targets More users of the data. Not just a Data warehouse / data mart / data lake any more Other analytics platforms (e.g. S3, HDFS, Snowflake, BigQuery) Specialised technologies: Graph, Full Text Search, NoSQL The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Photo by Kirill on Unsplash More Data The Changing Face of ETL: Event-Driven Architectures for Data Engineers Seems obviously to mention Big Data but we build systems same way we did 40 years ago Orders of magnitude more data - IoT, mobile, app generated More diverse data sets too
@rmoff #stratadata Batches and Buckets The Changing Face of ETL: Event-Driven Architectures for Data Engineers Despite this, we do things the same way we also did. Batch. We wait … and then we process data. We land it down. We pick it up, we process it. LATENCY. Downstream use dictated by upstream assumptions.
@rmoff #stratadata Analytics Applications Tell Us What Happened Respond Photo by Deva Darshan from Pexels → an order was placed! → how many orders were placed The Changing Face of ETL: Event-Driven Architectures for Data Engineers Data flows used to be one way. We need to think beyond our silos. It’s the same data Historically, technology was such you had to have this divide. OLTP/OLAP compromise. Batch ETL was the inevitable sticking plaster on top of that.
@rmoff #stratadata The Changing Face of ETL: Event-Driven Architectures for Data Engineers All systems need to have a way of exchanging data. Analytics generates new data which drives applications Applications need contextual data to improve the user experience Applications need to get data to analytics at lower latency
Photo by NASA on Unsplash @rmoff #stratadata The Changing Face of ETL: Event-Driven Architectures for Data Engineers Ultimately we need a common way to work with data Systems and teams across a company need to use the same data in a loosely coupled way Not compromise, not crowbaring everything into new shiny technology Adopting a unified platform. Enables both apps and analytics to be better lower latency, more flexible architecture, more scalable The common denomintator here is events
Photo by Mark Kamalov on Unsplash Events All data is built from events Events are the lowest granularity of data Events describe our business
@rmoff #stratadata “ An event is both: * Notification * State transfer The Changing Face of ETL: Event-Driven Architectures for Data Engineers • “We sold something” -> what did we sell, to whom did we sell it • “Someone clicked a link” -> what did they click, who clicked it
@rmoff #stratadata A Customer Experience The Changing Face of ETL: Event-Driven Architectures for Data Engineers Events model our business. They can describe real world interactions
@rmoff #stratadata A Sensor Reading The Changing Face of ETL: Event-Driven Architectures for Data Engineers Events can also be generated by machines
@rmoff #stratadata Events Basket Bread Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers Usually we start with state and how we’re going to store it. Let’s consider an online retailer. A basket at checkout might look like this. What it doesn’t show is how that basket was created.
Events @rmoff #stratadata Basket Bread ItemAdd Bread The Changing Face of ETL: Event-Driven Architectures for Data Engineers An event happened - “something was added to the basket” -> what was added
@rmoff #stratadata Events Basket Bread ItemAdd ItemAdd Bread Baked Beans Baked Beans The Changing Face of ETL: Event-Driven Architectures for Data Engineers Then we add Baked beans
@rmoff #stratadata Events Basket Bread ItemAdd ItemAdd ItemRemove Bread Baked Beans Baked Beans The Changing Face of ETL: Event-Driven Architectures for Data Engineers Then we change our minds, and take beans out
@rmoff #stratadata Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers And add tinned spaghetti back in
@rmoff #stratadata Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers The stream of events describes the behaviour - the interaction with our business If we simply capture state (the final basket) we lose the behaviour information
@rmoff #stratadata Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers From a stream of events you can derive state Same concept as in analytics. You can aggregate up, but not down. From state we cannot discern the events that created it. But from events we can build state.
@rmoff #stratadata Events Basket Bread ItemAdd ItemAdd ItemRemove ItemAdd Bread Baked Beans Baked Beans Tinned Spaghetti Tinned Spaghetti The Changing Face of ETL: Event-Driven Architectures for Data Engineers So all we actually need to accurately model our business is the event stream. Everything else can be built from that. This event stream can be implemented using an event streaming platform, like Apache Kafka
More interestingly, databases are also streams of events. This might seem unintuitive at first. When most people think about databases, they immediately think of tables.
The Stream/Table Duality Table @rmoff #stratadata Time Stream Account ID Amount 12345 + €50 12345
@rmoff #stratadata The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers Photo by Bobby Burch on Unsplash
@rmoff #stratadata What is an Event Streaming Platform? Producer Connectors Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers Apache Kafka is an event streaming platform Persisted event stream Stream processing Integration It’s a distributed system providing horizontal scalability at scale (Netflix, Uber, etc)
Immutable Event Log Old @rmoff #stratadata New Messages are added at the end of the log The Changing Face of ETL: Event-Driven Architectures for Data Engineers Distributed, Append-only, Immutable event log Persisted New messages written to the end
@rmoff #stratadata Topics Clicks Orders Customers Topics are similar in concept to tables in a database The Changing Face of ETL: Event-Driven Architectures for Data Engineers Arranged as Topics, akin to Tables in DB
@rmoff #stratadata Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Messages are just K/V bytes @rmoff #stratadata plus headers + timestamp Clicks Header Timestamp Key Value The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Messages are just K/V bytes @rmoff #stratadata With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Consumers have a position all of their own Old New Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers Consumers read from a position in the log Commit when done Kafka stores their offset
@rmoff #stratadata Consumers have a position all of their own Old New Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers Other consumers can read the same data Data is not transient Data is persisted according to configured retention settings
@rmoff #stratadata Consumers have a position all of their own George is here Scan Old New Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers Message Replay No slow consumer problem Topics don’t care who’s reading them. Different consumers can read from different offsets.
@rmoff #stratadata The Connect API Producer Connectors Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers Kafka Connect for Integration
@rmoff #stratadata Streaming Integration with Kafka Connect syslog Sources Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Streaming Integration with Kafka Connect Amazon S3 Google BigQuery Sinks Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Streaming Integration with Kafka Connect Amazon S3 syslog Google BigQuery Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Stream Processing in Kafka Producer Connectors @rmoff #stratadata Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers Stream processing - transform messages as they pass through Kafka, write back to another Kafka topic
@rmoff #stratadata Kafka Streams API final StreamsBuilder builder = new StreamsBuilder() .stream(“orders”, Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals(“COMPLETE”) ) .to(“complete_orders”, Produced.with(stringSerde, ordersSerde)); The Changing Face of ETL: Event-Driven Architectures for Data Engineers Kafka Streams is part of Apache Kafka Java library, integrate stream processing capability natively into your application
Stream Processing with KSQL @rmoff #stratadata CREATE STREAM completedOrders AS SELECT * FROM orders WHERE status=’COMPLETE’; The Changing Face of ETL: Event-Driven Architectures for Data Engineers KSQL is project from Confluent Build stream processing applications declared in a SQL-like language
@rmoff #stratadata Photo by Ash from Modern Afflatus on Unsplash This is Something New The Changing Face of ETL: Event-Driven Architectures for Data Engineers Reset your assumptions, Squeegie your third eye Don’t do the same just because that’s what you always did • You don’t need a database! Event log can rebuild state if you need it - but you don’t always need it, so why add a database? Human nature to look for the parallel in a new situation to a current one • Driving a new car, where’s the handbrake, where’s the accelerator • event stream is different • where we’re going we don’t need cars
@rmoff #stratadata Events in Action Review events reviews The Changing Face of ETL: Event-Driven Architectures for Data Engineers Let’s take the example from earlier of our online retailer. Users can leave reviews for products. These get streamed into Kafka. You’ll note that we’re not writing them to a database!
@rmoff #stratadata Events in Action Review events reviews Operational dashboard The Changing Face of ETL: Event-Driven Architectures for Data Engineers We want to do different things with the reviews. We can use Kafka Connect to stream to Elasticsearch and provide a Kibana dashboard to customer ops team
@rmoff #stratadata Events in Action Review events reviews Operational dashboard Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers We can stream the same data to our data lake
@rmoff #stratadata Events in Action reviews Review events CREATE STREAM reviews_clean AS SELECT * FROM reviews WHERE id IS NOT NULL; reviews_clean Operational dashboard Filter out bad data Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers We can also tidy up the data as it passes through the system Write transformed data back to Kafka and write that to the targets instead
@rmoff #stratadata Events in Action Existing apps User data RDBMS txn log users Kafka Connect Kafka The Changing Face of ETL: Event-Driven Architectures for Data Engineers Whilst the review data is useful it only includes the user id, e.g. “42” We want to know more about the users, and this information (name, email, loyalty status) is held in a database Funnily enough databases are also built on top of immutable event logs - the transaction log! We can mine the txn log with Kafka Connect into Kafka.
@rmoff #stratadata Events in Action Review events reviews users reviews_clean Operational dashboard User data Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers User data in a Kafka topic, synchronised with the database in realtime Join events as they arrive with the reference information, to improve the data written to the dashboard and data lake
@rmoff #stratadata Events in Action Review events CREATE CREATE SELECT SELECT STREAM STREAM enriched_reviews reviews_clean AS AS ** FROM reviews_clean r FROM reviews INNER JOIN users u WHERE id IS NOT NULL ON r.userid=u.userid; reviews users reviews_clean enriched_reviews Operational dashboard User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers Use KSQL, or Kafka Streams, to do the transformation. Transform once, use many
@rmoff #stratadata Events in Action Notification service Review events Operational dashboard User data Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers We can also drive applications from these events Let’s imagine we want to alert our ops team if an important customer leaves a bad review No need to write the reviews to a data store for the service to then poll and query Instead filter the reviews as they arrive and route them to a new topic Separation of responsibilities. Notification service just subscribes to the topic. Same data can also be sent to ops dashboard in Elasticsearch.
Events in Action Review events @rmoff #stratadata CREATE STREAM unhappy_vips AS SELECT * FROM enriched_reviews WHERE rating < 3 Notification AND status = ‘Platinum’; service reviews users reviews_clean enriched_reviews Operational unhappy_vips dashboard User data Join events to users, and filter Data lake The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Photo by rmoff The Power of an Event-Driven Architecture There are some powerful things that event-first architecture gives you but just like when you get taken out of the matrix (your warm comfortable way of doing things currently), the shock can be extreme Understand what your pain points are and relate events to those be aware of the other potential of events so as to holistically build the best architecture key benefits accurate modeling of what happened Simplified, more powerful, more flexible archicture Data when you need it scale when you need it
Not Everything is a Nail Events @rmoff #stratadata RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Best tool for the job
@rmoff #stratadata Not Everything is a Nail Events RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Best tool for the job
@rmoff #stratadata Not Everything is a Nail Events Elasticsearch RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Best tool for the job
@rmoff #stratadata Not Everything is a Nail Graph Events Elasticsearch RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Best tool for the job
Side-by-Side Tech Evaluation @rmoff #stratadata Events HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Mix & match to find the best No slow consumer problem
Side-by-Side Tech Evaluation Events @rmoff #stratadata BiqQuery HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Mix & match to find the best No slow consumer problem
Side-by-Side Tech Evaluation @rmoff #stratadata Snowflake Events BiqQuery HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Mix & match to find the best No slow consumer problem
@rmoff #stratadata Evolve Data Sources Producer Onpremises Consuming App A Consuming App B The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Evolve Data Sources Producer Onpremises Consuming App A Consuming Producer App B Cloud The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Evolve Data Sources Consuming App A Consuming Producer App B Cloud The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Tight Coupling != Flexible Orders @rmoff #stratadata RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata Tight Coupling != Flexible Orders RDBMS HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata Tight Coupling != Flexible Orders RDBMS HDFS App The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata Loose Coupling == Freedom to Evolve RDBMS Orders The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata Loose Coupling == Freedom to Evolve RDBMS Orders HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata Loose Coupling == Freedom to Evolve RDBMS Orders App HDFS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Still use data lake, sure, but load it through Kafka so that when you do want that feed of data you’re not trying to retrofit some abomination of a pipeline HDFS -> Kafka -> app/analytics “How do I get data from HDFS into Kafka?” -> you’re doing it wrong. Image of pipes taped together with gaffer tape / nail in a screw hole etc
@rmoff #stratadata Transform Once, Use Many: Data Cleansing temp_raw App IoT App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transform once use many Cleansing data - filter out bad records
@rmoff #stratadata Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT reading 13.05 13.11 13.11 13.04 temp_raw App App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transform once use many Cleansing data - filter out bad records
@rmoff #stratadata Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 IoT reading 13.05 13.11 13.11 13.04 temp_raw Cleanse App App Cleanse RDBMS Cleanse The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transform once use many Cleansing data - filter out bad records
@rmoff #stratadata Transform Once, Use Many: Data Cleansing sensor_id time_epoch 42 1551136074 42 1551136125 1551136125 42 1551138129 reading 13.05 13.11 13.11 13.04 IoT time_epoch 1551136074 1551136125 1551138129 reading 13.05 13.11 13.04 App temp_raw SENSOR_ID IS NOT NULL Transform once use many Cleansing data - filter out bad records temp_clean sensor_id 42 42 42 App RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers
@rmoff #stratadata Transform Once, Use Many: Data Enrichment RDBMS Events App 01 Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers Enriching data in user info to an event stream vs each application that needs it directly calling back to the source
@rmoff #stratadata Transform Once, Use Many: Data Enrichment RDBMS Events App 01 Join Elasticsearch App 02 Join The Changing Face of ETL: Event-Driven Architectures for Data Engineers Enriching data in user info to an event stream vs each application that needs it directly calling back to the source
@rmoff #stratadata Transform Once, Use Many: Data Enrichment App 01 Events Elasticsearch Join RDBMS The Changing Face of ETL: Event-Driven Architectures for Data Engineers Enriching data in user info to an event stream vs each application that needs it directly calling back to the source
Message Payload Compatibility @rmoff #stratadata Producer Consuming App The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transforming Schema compatibility
Message Payload Compatibility @rmoff #stratadata Producer Consuming App Producer The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transforming Schema compatibility
Message Payload Compatibility @rmoff #stratadata Producer Consuming App Producer Triangles to Squares The Changing Face of ETL: Event-Driven Architectures for Data Engineers Transforming Schema compatibility
@rmoff #stratadata Build Resilient Pipelines with Schemas sales_csv Apply COL1 ID INT COL2 NAME VARCHAR schema App 01 Producer Apply App 02 COL1 ID INT COL2 NAME VARCHAR schema The Changing Face of ETL: Event-Driven Architectures for Data Engineers CSV -> Avro
@rmoff #stratadata Build Resilient Pipelines with Schemas Schema Registry sales App 01 Producer App 02 sales_csv Apply schema COL1 ID INT COL2 NAME VARCHAR The Changing Face of ETL: Event-Driven Architectures for Data Engineers CSV -> Avro
Photo by rmoff Say NO to brittle pipelines Event streaming platform gives you the freedom to EVOLVE as REQUIREMENTS and TECHNOLOGY change
Photo by Benjamin Lambert on Unsplash @rmoff #stratadata EVOLVE don’t GAMBLE How do I know what we want to use? at scale? next month? year? The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Latency requirements Photo by Benjamin Lambert on Unsplash Users of the data ! Photo by Benjamin Lambert on Unsplash GAMBLE ON: Latency requirements Number of applications for the data Data fidelity / event stream -> behaviour Scale Scale Data fidelity
@rmoff #stratadata App App App App cache monitoring cache MQ DWH security MQ search Hadoop The Changing Face of ETL: Event-Driven Architectures for Data Engineers Here is what I’ve seen. There are some apps that use an enterprise MQ, data moves around using custom ETL scripts in batch. Over time, this ad-hoc way of connecting every new type of source to every type of destination, where everything talks to everything else, just doesn’t scale.
To make event-centric thinking available at a company-wide level is very much why we built Apache Kafka. We had a very particular vision for what a company would look like if you reimagined it’s use of data around streams of events
Photo by rmoff Events model the real world Don’t be afraid. It’s going to happen. “all I want to know is how many beans I sold” -> events can be aggregated to tell you that but if when your business wants to exploit the event stream data to understand customer interactions with the business then events provide these event streams give a perfect foundation for building “normal” ETL upon, so by adopting it now you prepare yourself for the future
Event streaming platform Data persistence Flexibility & scalability Photo by rmoff Native stream processing Data when you need it Don’t be afraid. It’s going to happen. “all I want to know is how many beans I sold” -> events can be aggregated to tell you that but if when your business wants to exploit the event stream data to understand customer interactions with the business then events provide these event streams give a perfect foundation for building “normal” ETL upon, so by adopting it now you prepare yourself for the future
@rmoff #stratadata http://cnfl.io/book-bundle The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Discount code! KS19Comm25 1
Photo by rmoff confluent.io/download http://cnfl.io/book-bundle http://cnfl.io/slack EOF @rmoff
@rmoff #stratadata Resources • CDC Spreadsheet #EOF • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / [email protected]) can help with introductions on a given sales op The Changing Face of ETL: Event-Driven Architectures for Data Engineers
Data integration in architectures built on static, update-in-place datastores inevitably end up with pathologically high degrees of coupling and poor scalability. This has been the standard practice for decades, as we attempt to build data pipelines on top of databases that do a poor job modeling the fundamental objects that drive our businesses and systems: events.
Events carry both notification and state, and form a powerful primitive on which to build systems for developers and data engineers alike. Developers benefit from the asynchronous communication that events enable between services, and data engineers benefit from the integration capabilities. Everyone gains from using the standards-based, scalable and resilient streaming platform.
In this talk, we’ll discuss the concepts of events, their relevance to both software engineers and data engineers and their ability to unify architectures in a powerful way. We’ll see how stream processing makes sense in both a microservices and ETL environment, and why analytics, data integration and ETL fit naturally into a streaming world.
The following resources were mentioned during the presentation or are useful additional information.
Here’s what was said about this presentation on social media.