Apache Kafka and ksqlDB in Action : Let’s Build a Streaming Data Pipeline!

A presentation at QCon London in March 2020 in London, UK by Robin Moffatt

Slide 1

Slide 1

Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline! @rmoff #QConLondon

Slide 2

Slide 2

Kafka is an Event Streaming Platform App App App @rmoff #QConLondon App request-response changelogs App App KAFKA App App DWH Hadoop messaging OR stream processing streaming data pipelines Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 3

Slide 3

What is an Event Streaming Platform? Producer Connectors @rmoff #QConLondon Consumer The Log Connectors Streaming Engine Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 4

Slide 4

Immutable Event Log Old @rmoff #QConLondon New Messages are added at the end of the log Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 5

Slide 5

Topics @rmoff #QConLondon Clicks Orders Customers Topics are similar in concept to tables in a database Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 6

Slide 6

Partitions Clicks @rmoff #QConLondon p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 7

Slide 7

Partition Leadership and Replication Partition 1 TopicX partition1 Partition 2 @rmoff #QConLondon TopicX partition1 TopicX partition1 TopicX partition2 TopicX partition2 TopicX partition2 TopicX partition3 TopicX partition3 Partition 3 TopicX partition3 Partition 4 TopicX partition4 TopicX partition4 Broker 1 Broker 2 TopicX partition4 Broker 3 Broker 4 Leader Follower Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 8

Slide 8

Partition Leadership and Replication Partition 1 TopicX partition1 Partition 2 @rmoff #QConLondon TopicX partition1 TopicX partition1 TopicX partition2 TopicX partition2 TopicX partition2 TopicX partition3 TopicX partition3 Partition 3 TopicX partition3 Partition 4 TopicX partition4 TopicX partition4 Broker 1 Broker 2 TopicX partition4 Broker 3 Broker 4 Leader Follower Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 9

Slide 9

Partition Leadership and Replication Partition 1 TopicX partition1 Partition 2 @rmoff #QConLondon TopicX partition1 TopicX partition1 TopicX partition2 TopicX partition2 TopicX partition2 TopicX partition3 TopicX partition3 Partition 3 TopicX partition3 Partition 4 TopicX partition4 TopicX partition4 Broker 1 Broker 2 TopicX partition4 Broker 3 Broker 4 Leader Follower Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 10

Slide 10

Producing to Kafka - No Key @rmoff #QConLondon Time Messages will be produced in a round robin fashion Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 11

Slide 11

@rmoff #QConLondon Producing to Kafka - With Key Time A B hash(key) % numPartitions = N C D Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 12

Slide 12

Messages are just K/V bytes @rmoff #QConLondon plus headers + timestamp Clicks Header Timestamp Key Value Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 13

Slide 13

Messages are just K/V bytes @rmoff #QConLondon With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 14

Slide 14

@rmoff #QConLondon Consumers have a position all of their own Old New Sally is here Scan Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 15

Slide 15

@rmoff #QConLondon Consumers have a position all of their own Old New Fred is here Scan Sally is here Scan Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 16

Slide 16

@rmoff #QConLondon Consumers have a position all of their own George is here Scan Old New Fred is here Scan Sally is here Scan Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 17

Slide 17

@rmoff #QConLondon Consuming From Kafka - Single Consumer Partition 1 Partition 2 Partition C 3 Partition 4 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 18

Slide 18

@rmoff #QConLondon Consuming From Kafka - Multiple Consumers Partition 1 Partition 2 Partition 3 C1 C2 Partition 4 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 19

Slide 19

@rmoff #QConLondon Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 CC C1 Partition 3 Partition 4 CC C2 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 20

Slide 20

@rmoff #QConLondon Consuming From Kafka - Grouped Consumers Partition 1 Partition C1 C2 C3 C4 2 Partition 3 Partition 4 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 21

Slide 21

@rmoff #QConLondon Consuming From Kafka - Grouped Consumers Partition 1 Partition C1 C2 C3 3 2 Partition 3 Partition 4 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 22

Slide 22

@rmoff #QConLondon Consuming From Kafka - Grouped Consumers Partition 1 Partition C1 C2 2 Partition 3 C3 Partition 4 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 23

Slide 23

Free Books! @rmoff #QConLondon https://rmoff.dev/ldn-qcon Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 24

Slide 24

Streaming Data Pipelines Photo by Victor Garcia on Unsplash @rmoff #QConLondon Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 25

Slide 25

Database Offload @rmoff #QConLondon Amazon S3 RDBMS Kafka Connect Kafka Connect HDFS Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 26

Slide 26

@rmoff #QConLondon Stream Processing with Apache Kafka and ksqlDB order events CDC RDBMS customer orders customer Stream Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 27

Slide 27

Real-time Event Stream Enrichment @rmoff #QConLondon order events customer orders C D C RDBMS <y> customer Stream Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 28

Slide 28

@rmoff #QConLondon Building a streaming data pipeline Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 29

Slide 29

@rmoff #QConLondon Stream Integration + Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 30

Slide 30

@rmoff #QConLondon Integration Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 31

Slide 31

@rmoff #QConLondon Streaming Integration with Kafka Connect syslog Sources Kafka Connect Kafka Brokers Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 32

Slide 32

@rmoff #QConLondon Streaming Integration with Kafka Connect Amazon S3 Sinks Google BigQuery Kafka Connect Kafka Brokers Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 33

Slide 33

@rmoff #QConLondon Streaming Integration with Kafka Connect Amazon S3 syslog Google BigQuery Kafka Connect Kafka Brokers Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 34

Slide 34

@rmoff #QConLondon Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } https://docs.confluent.io/current/connect/ “connection.url”: Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 35

Slide 35

@rmoff #QConLondon Serialisation & Schemas Avro -> Confluent Schema Registry Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 36

Slide 36

@rmoff #QConLondon The Confluent Schema Registry Avro Schema Schema Registry Target Source Kafka Connect Avro Message Avro Message Kafka Connect Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 37

Slide 37

@rmoff #QConLondon Extensible Connector Transform(s) Converter Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 38

Slide 38

@rmoff #QConLondon Confluent Hub hub.confluent.io Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 39

Slide 39

@rmoff #QConLondon Stream Integration + Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 40

Slide 40

@rmoff #QConLondon Stream Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 41

Slide 41

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff #QConLondon Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 42

Slide 42

@rmoff #QConLondon SELECT * FROM WIDGETS WHERE WEIGHT_G > 120 { SELECT COUNT(*) FROM WIDGETS GROUP BY PRODUCTION_LINE } SELECT AVG(TEMP_CELCIUS) AS TEMP FROM WIDGETS GROUP BY SENSOR_ID HAVING TEMP>20 Photo by Franck V. on Unsplash “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 CREATE SINK CONNECTOR dw WITH ( Object store, ‘connector.class’ = ‘S3Connector’, data warehouse, ‘topics’ = ‘widgets’ RDBMS …); Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 43

Slide 43

ksqlDB @rmoff #QConLondon The event streaming database purpose-built for stream processing applications. Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 44

Slide 44

Stream Processing with ksqlDB @rmoff #QConLondon Source stream Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 45

Slide 45

Stream Processing with ksqlDB @rmoff #QConLondon Source stream Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 46

Slide 46

Stream Processing with ksqlDB @rmoff #QConLondon Source stream Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 47

Slide 47

Stream Processing with ksqlDB @rmoff #QConLondon Source stream Analytics Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 48

Slide 48

Stream Processing with ksqlDB @rmoff #QConLondon Source stream Applications / Microservices Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 49

Slide 49

@rmoff #QConLondon Stream Processing with ksqlDB …SUM(TXN_AMT) GROUP BY AC_ID AC _I D= 42 BA LA NC AC E= _I 94 D= .0 42 0 Source stream Applications / Microservices Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 50

Slide 50

@rmoff #QConLondon Let’s Build It! Rating events App Pro d uc e rA PI { “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 51

Slide 51

@rmoff #QConLondon Let’s Build It! Rating events App a k f a K t c e n n o C App u s n o C uc e rA PI Kafka Connect a fk t Ka ec n RDBMS Pro d I P A r e m Operational Dashboard Elasticsearch n Co User data Push notification ksqlDB Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 52

Slide 52

@rmoff #QConLondon https://rmoff.dev/qcon-workshop Rating events App Pro d Push notification I P A r e m App u s n o C uc e rA PI Demo Time! a fk t Ka ec n RDBMS Operational Dashboard Elasticsearch n Co User data a k f a K t c e n n o C Kafka Connect ksqlDB Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 53

Slide 53

@rmoff #QConLondon Push notification Rating events App Kafka Connect a fk t Ka ec n RDBMS u s n o C uc e rA PI a k f a K t c e n n o C App Operational Dashboard Elasticsearch n Co User data Pro d I P A r e m ksqlDB Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 54

Slide 54

@rmoff #QConLondon Push notification to Slack Rating events App Pro d I P A r e m App u s n o C uc e rA PI Kafka Connect Operational Dashboard Elasticsearch n Co a fk t Ka ec n a k f a K t c e n n o C poor_ratings Data ratings ksqlDB User data RDBMS Lake Filter events S3/HDFS/ SnowflakeDB etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 55

Slide 55

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” @rmoff #QConLondon Filter all ratings where STARS<3 POOR_RATINGS } Producer API CREATE STREAM POOR_RATINGS AS SELECT * FROM ratings WHERE STARS <3 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 56

Slide 56

@rmoff #QConLondon Push notification to Slack Kafka Connect Rating events App Kafka Connect a fk t Ka ec n RDBMS u s n o C uc e rA PI a k f a K t c e n n o C App Operational Dashboard Elasticsearch n Co User data Pro d I P A r e m Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 57

Slide 57

@rmoff #QConLondon Producer API MySQL t c e n n o C a k f Ka m u i z e b e D Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 58

Slide 58

Time The Stream/Table Duality Stream Account ID Amount 12345 + £50 12345

  • £25 12345
  • £60 @rmoff #QConLondon Account ID Balance Table 12345 £50 Account ID Balance 12345 £75 Account ID Balance 12345 £15 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 59

Slide 59

The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash

Slide 60

Slide 60

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API @rmoff #QConLondon Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA t c e n n o C a k f a K { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM RATINGS_WITH_CUSTOMER_DATA AS SELECT * FROM RATINGS LEFT JOIN CUSTOMERS ON R.ID=C.ID; } Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 61

Slide 61

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API t c e n n o C a k f a K @rmoff #QConLondon Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA Filter for just PLATINUM customers UNHAPPY_PLATINUM_CUSTOMERS { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM UNHAPPY_PLATINUM_CUSTOMERS AS SELECT * FROM RATINGS_WITH_CUSTOMER_DATA WHERE STARS < 3 } Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 62

Slide 62

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” @rmoff #QConLondon CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) Join each rating to customer data FROM RATINGS_WITH_CUSTOMER_DATA Producer APWINDOW I TUMBLING RATINGS_WITH_CUSTOMER_DATA (SIZE 1 MINUTES) GROUP BY CLUB_STATUS; } t c e n n o C a k f a K { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected].com”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” } Aggregate per-minute by CLUB_STATUS RATINGS_BY_CLUB_STATUS_1MIN Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 63

Slide 63

Kafka Connect → Elasticsearch @rmoff #QConLondon Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 64

Slide 64

Photo by on @rmoff #QConLondon Want to learn more? CTAs, not CATs (sorry, not sorry) Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 65

Slide 65

kafka-summit.org Moffatt30 30% OFF* *Standard Priced Conference pass

Slide 66

Slide 66

Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io

Slide 67

Slide 67

Confluent Community Slack group @rmoff #QConLondon cnfl.io/slack Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 68

Slide 68

Free Books! @rmoff #QConLondon https://rmoff.dev/ldn-qcon Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 69

Slide 69

Resources @rmoff #QConLondon #EOF • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / [email protected]) can help with introductions on a given sales op Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!

Slide 70

Slide 70

Related Talks @rmoff #QConLondon •The Changing Face of ETL: Event-Driven Architectures for Data Engineers • 📖 Slides • 📽 Recording •ATM Fraud detection with Kafka and ksqlDB • 📖 Slides • 👾 Code • 📽 Recording •Embrace the Anarchy: Apache Kafka’s Role in Modern Data Architectures • 📖 Slides • 📽 Recording •Apache Kafka and ksqlDB in Action : Let’s Build a Streaming Data Pipeline! • 📖 Slides • 👾 Code • 📽 Recording •No More Silos: Integrating Databases and Apache Kafka • 📖 Slides • 👾 Code (MySQL) • 👾 Code (Oracle) • 📽 Recording Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!