Building stream processing applications for Apache Kafka using KSQL

A presentation at Copenhagen Kafka Meetup in November 2019 in Copenhagen, Denmark by Robin Moffatt

Slide 1

Slide 1

Building stream processing applications with Apache Kafka @rmoff #KafkaMeetup

Slide 2

Slide 2

Building stream processing applications with Apache Kafka @rmoff #KafkaMeetup

Slide 3

Slide 3

STREAM PROCESSING

Slide 4

Slide 4

PROCESSING STREAM

Slide 5

Slide 5

PROCESSING STREAM a of EVENTS

Slide 6

Slide 6

@rmoff STREAMS ARE of EVENTS EVERYWHERE

Slide 7

Slide 7

@rmoff A Customer Experience Building stream processing applications for Apache Kafka using KSQL

Slide 8

Slide 8

@rmoff A Sale Building stream processing applications for Apache Kafka using KSQL

Slide 9

Slide 9

@rmoff A Sensor Reading Building stream processing applications for Apache Kafka using KSQL

Slide 10

Slide 10

@rmoff An Application Log Entry Building stream processing applications for Apache Kafka using KSQL

Slide 11

Slide 11

@rmoff Databases Building stream processing applications for Apache Kafka using KSQL

Slide 12

Slide 12

@rmoff Immutable event log Building stream processing applications for Apache Kafka using KSQL

Slide 13

Slide 13

Immutable Event Log Old @rmoff @oredev New Messages are added at the end of the log The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 14

Slide 14

@rmoff @oredev Topics Clicks Orders Customers Topics are similar in concept to tables in a database The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 15

Slide 15

@rmoff @oredev Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 16

Slide 16

Messages are just K/V bytes @rmoff @oredev plus headers + timestamp Clicks Header Timestamp Key Value The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 17

Slide 17

Messages are just K/V bytes @rmoff @oredev With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 18

Slide 18

@rmoff @oredev Consumers have a position all of their own New Old Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 19

Slide 19

@rmoff @oredev Consumers have a position all of their own New Old Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 20

Slide 20

@rmoff @oredev Consumers have a position all of their own George is here Scan New Old Fred is here Scan Sally is here Scan The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 21

Slide 21

@rmoff @oredev The Connect API Producer Connectors Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 22

Slide 22

@rmoff @oredev Streaming Integration with Kafka Connect Amazon S3 syslog Google BigQuery Tasks Workers Kafka Connect Kafka Brokers The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 23

Slide 23

Stream Processing in Kafka Producer Connectors @rmoff @oredev Consumer The Log Connectors Streaming Engine The Changing Face of ETL: Event-Driven Architectures for Data Engineers

Slide 24

Slide 24

@rmoff #KafkaMeetup Streams of events Time Building stream processing applications for Apache Kafka using KSQL

Slide 25

Slide 25

Stream Processing with KSQL @rmoff #KafkaMeetup Stream: widgets Stream: widgets_red Building stream processing applications for Apache Kafka using KSQL

Slide 26

Slide 26

Stream Processing with KSQL @rmoff #KafkaMeetup Stream: widgets CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red Building stream processing applications for Apache Kafka using KSQL

Slide 27

Slide 27

Stream Processing with Kafka Streams @rmoff #KafkaMeetup Stream: widgets final StreamsBuilder builder = new StreamsBuilder() .stream(“widgets”, Consumed.with(stringSerde, widgetsSerde)) .filter( (key, widget) -> widget.getColour().equals(“RED”) ) .to(“widgets_red”, Produced.with(stringSerde, widgetsSerde)); Stream: widgets_red Building stream processing applications for Apache Kafka using KSQL

Slide 28

Slide 28

Stream Processing with KSQL @rmoff #KafkaMeetup Source stream Building stream processing applications for Apache Kafka using KSQL

Slide 29

Slide 29

Stream Processing with KSQL @rmoff #KafkaMeetup Source stream Building stream processing applications for Apache Kafka using KSQL

Slide 30

Slide 30

Stream Processing with KSQL @rmoff #KafkaMeetup Source stream Analytics Applications / Microservices Building stream processing applications for Apache Kafka using KSQL

Slide 31

Slide 31

@rmoff #KafkaMeetup KSQL in action 🚀 https://rmoff.dev/kssf19-ksql-code Building stream processing applications for Apache Kafka using KSQL

Slide 32

Slide 32

@rmoff https://rmoff.dev/kssf19-ksql-code Building stream processing applications for Apache Kafka using KSQL

Slide 33

Slide 33

DEMO https://rmoff.dev/kssf19-ksql-code

Slide 34

Slide 34

Code! @rmoff #KafkaMeetup https://rmoff.dev/kssf19-ksql-code Building stream processing applications for Apache Kafka using KSQL

Slide 35

Slide 35

MQTT + Kafka + KSQL + Elastic = ❤ @rmoff #KafkaMeetup Building stream processing applications for Apache Kafka using KSQL

Slide 36

Slide 36

@rmoff Building stream processing applications for Apache Kafka using KSQL

Slide 37

Slide 37

@rmoff Building stream processing applications for Apache Kafka using KSQL

Slide 38

Slide 38

@rmoff http://confluent.cloud/signup Building stream processing applications for Apache Kafka using KSQL

Slide 39

Slide 39

@rmoff #KafkaMeetup Interacting with KSQL 📬 Building stream processing applications for Apache Kafka using KSQL

Slide 40

Slide 40

KSQL - Confluent Control Center @rmoff #KafkaMeetup Building stream processing applications for Apache Kafka using KSQL

Slide 41

Slide 41

KSQL - CLI @rmoff #KafkaMeetup Building stream processing applications for Apache Kafka using KSQL

Slide 42

Slide 42

KSQL - REST API @rmoff #KafkaMeetup Building stream processing applications for Apache Kafka using KSQL

Slide 43

Slide 43

@rmoff #KafkaMeetup KSQL operations and deployment 💾 Building stream processing applications for Apache Kafka using KSQL

Slide 44

Slide 44

KSQL in Development and Production Interactive KSQL for development and testing @rmoff #KafkaMeetup Headless KSQL for Production REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” Building stream processing applications for Apache Kafka using KSQL

Slide 45

Slide 45

How to run KSQL @rmoff #KafkaMeetup DEB, RPM, ZIP, TAR downloads http://confluent.io/ksql Docker images KSQL Server confluentinc/cp-ksql-server confluentinc/cp-ksql-cli (JVM process) …and many more… Building stream processing applications for Apache Kafka using KSQL

Slide 46

Slide 46

Slide 47

Slide 47

@rmoff #KafkaMeetup Think Applications, not database instances Building stream processing applications for Apache Kafka using KSQL

Slide 48

Slide 48

@rmoff #KafkaMeetup Monitoring KSQL Confluent Control Center JMX https://www.confluent.io/blog/troubleshooting-ksql-part-2 Building stream processing applications for Apache Kafka using KSQL

Slide 49

Slide 49

@rmoff #KafkaMeetup http://cnfl.io/book-bundle Building stream processing applications for Apache Kafka using KSQL

Slide 50

Slide 50

@rmoff #KafkaMeetup #EOF 💬 Join the Confluent Community Slack group at http://cnfl.io/slack https://talks.rmoff.net

Slide 51

Slide 51

@rmoff #KafkaMeetup Related Talks •The Changing Face of ETL: Event-Driven Architectures for Data Engineers •Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline! • 📖 Slides • 📖 Slides • 📽 Recording • 👾 Code • 📽 Recording •ATM Fraud detection with Kafka and KSQL • 📖 Slides •No More Silos: Integrating Databases and Apache Kafka • 👾 Code • 📖 Slides • 📽 Recording • 👾 Code (MySQL) • 👾 Code (Oracle) •Embrace the Anarchy: Apache Kafka’s Role in Modern Data Architectures • 📽 Recording • 📖 Slides • 📽 Recording Building stream processing applications for Apache Kafka using KSQL

Slide 52

Slide 52

Bonus content!

Slide 53

Slide 53

@rmoff #KafkaMeetup KSQL in action 🚀 Building stream processing applications for Apache Kafka using KSQL

Slide 54

Slide 54

@rmoff #KafkaMeetup Filtering with KSQL ORDERS Building stream processing applications for Apache Kafka using KSQL

Slide 55

Slide 55

@rmoff #KafkaMeetup Filtering with KSQL ORDERS KSQL CREATE STREAM ORDERS_NY AS SELECT * FROM ORDERS WHERE ADDRESS->STATE=’New York’; Building stream processing applications for Apache Kafka using KSQL

Slide 56

Slide 56

@rmoff #KafkaMeetup Filtering with KSQL ORDERS KSQL CREATE STREAM ORDERS_NY AS SELECT * FROM ORDERS WHERE ADDRESS->STATE=’New York’; ORDERS_NY Building stream processing applications for Apache Kafka using KSQL

Slide 57

Slide 57

Schema manipulation with KSQL ORDERS @rmoff #KafkaMeetup { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } } Building stream processing applications for Apache Kafka using KSQL

Slide 58

Slide 58

Schema manipulation with KSQL @rmoff #KafkaMeetup { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } } ORDERS_NO_ADDRESS_DATA AS ORDERS KSQL CREATE STREAM SELECT ORDERTIME, ORDERID, ITEMID, ORDERUNITS FROM ORDERS; Building stream processing applications for Apache Kafka using KSQL

Slide 59

Slide 59

Schema manipulation with KSQL @rmoff #KafkaMeetup { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } AS ORDERS_NO_ADDRESS_DATA } ORDERS KSQL CREATE STREAM SELECT TIMESTAMPTOSTRING(ROWTIME, ‘yyyy-MM-dd HH:mm:ss’) AS ORDER_TIMESTAMP, ORDERID, ITEMID, ORDERUNITS FROM ORDERS; ORDERS_NO_ADDRESS_DATA { “order_ts”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5 } Building stream processing applications for Apache Kafka using KSQL

Slide 60

Slide 60

Schema manipulation with KSQL @rmoff #KafkaMeetup { ORDERS } “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } Building stream processing applications for Apache Kafka using KSQL

Slide 61

Slide 61

Schema manipulation with KSQL ORDERS KSQL @rmoff #KafkaMeetup { “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } CREATE STREAM ORDERS_FLAT AS SELECT […] } ADDRESS->STREET AS ADDRESS_STREET, ADDRESS->CITY AS ADDRESS_CITY, ADDRESS->STATE AS ADDRESS_STATE FROM ORDERS; Building stream processing applications for Apache Kafka using KSQL

Slide 62

Slide 62

Schema manipulation with KSQL @rmoff #KafkaMeetup { ORDERS KSQL “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address”: { “street”: “243 Utah Way”, “city”: “Orange”, “state”: “California” } CREATE STREAM ORDERS_FLAT AS SELECT […] } ADDRESS->STREET AS ADDRESS_STREET, ADDRESS->CITY AS ADDRESS_CITY, ADDRESS->STATE AS ADDRESS_STATE FROM ORDERS; ORDERS_FLAT {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} Building stream processing applications for Apache Kafka using KSQL

Slide 63

Slide 63

Reserialising data with KSQL ORDERS @rmoff #KafkaMeetup {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} Building stream processing applications for Apache Kafka using KSQL

Slide 64

Slide 64

@rmoff #KafkaMeetup Reserialising data with KSQL ORDERS KSQL {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} CREATE STREAM ORDERS_CSV WITH (VALUE_FORMAT=’DELIMITED’) AS SELECT * FROM ORDERS_FLAT; Building stream processing applications for Apache Kafka using KSQL

Slide 65

Slide 65

@rmoff #KafkaMeetup Reserialising data with KSQL ORDERS KSQL {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “address-street”: “243 Utah Way”, “address-city”: “Orange”, “address-state”: “California”} CREATE STREAM ORDERS_CSV WITH (VALUE_FORMAT=’DELIMITED) AS SELECT * FROM ORDERS_FLAT; ORDERS_CSV 1560045914101,24644,Item_0,1,43078 De 1560047305664,24643,Item_29,3,209 Mon 1560057079799,24642,Item_38,18,3 Autu 1560088652051,24647,Item_6,6,82893 Ar 1560105559145,24648,Item_0,12,45896 W 1560108336441,24646,Item_33,4,272 Hef 1560123862235,24641,Item_15,16,0 Dort 1560124799053,24645,Item_12,1,71 Knut Building stream processing applications for Apache Kafka using KSQL

Slide 66

Slide 66

Lookups and Joins with KSQL ORDERS @rmoff #KafkaMeetup {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} Building stream processing applications for Apache Kafka using KSQL

Slide 67

Slide 67

Lookups and Joins with KSQL @rmoff #KafkaMeetup { “id”: “Item_9”, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9 ITEMS ORDERS } {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} Building stream processing applications for Apache Kafka using KSQL

Slide 68

Slide 68

@rmoff #KafkaMeetup Lookups and Joins with KSQL { “id”: “Item_9”, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9 ITEMS } ORDERS KSQL CREATE STREAM ORDERS_ENRICHED AS SELECT O., I., O.ORDERUNITS * I.UNIT_COST AS TOTAL_ORDER_VALUE, FROM ORDERS O INNER JOIN ITEMS I ON O.ITEMID = I.ID ; {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} Building stream processing applications for Apache Kafka using KSQL

Slide 69

Slide 69

@rmoff #KafkaMeetup Lookups and Joins with KSQL { “id”: “Item_9”, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9 ITEMS } ORDERS KSQL CREATE STREAM ORDERS_ENRICHED AS SELECT O., I., O.ORDERUNITS * I.UNIT_COST AS TOTAL_ORDER_VALUE, FROM ORDERS O INNER JOIN ITEMS I ON O.ITEMID = I.ID ; {“ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5} ORDERS_ENRICHED { } “ordertime”: 1560070133853, “orderid”: 67, “itemid”: “Item_9”, “orderunits”: 5, “make”: “Boyle-McDermott”, “model”: “Apiaceae”, “unit_cost”: 19.9, “total_order_value”: 99.5 Building stream processing applications for Apache Kafka using KSQL

Slide 70

Slide 70

@rmoff #KafkaMeetup Connecting to other systems with Kafka Connect KSQL CREATE STREAM ORDERS_ENRICHED AS SELECT […] FROM ORDERS O INNER JOIN ITEMS I ON O.ITEMID = I.ID ; Kafka Connect Building stream processing applications for Apache Kafka using KSQL

Slide 71

Slide 71

@rmoff #KafkaMeetup Stateful Aggregation with KSQL ORDERS Building stream processing applications for Apache Kafka using KSQL

Slide 72

Slide 72

@rmoff #KafkaMeetup Stateful Aggregation with KSQL ORDERS SELECT MAKE, COUNT(*) AS ORDER_COUNT FROM ORDERS_ENRICHED GROUP BY MAKE; Building stream processing applications for Apache Kafka using KSQL

Slide 73

Slide 73

@rmoff #KafkaMeetup Stateful Aggregation with KSQL ORDERS SELECT MAKE, COUNT(*) AS ORDER_COUNT FROM ORDERS_ENRICHED GROUP BY MAKE; Building stream processing applications for Apache Kafka using KSQL

Slide 74

Slide 74

@rmoff #KafkaMeetup Transform data with KSQL - merge streams ORDERS US US UK ORDERS_UK UK Building stream processing applications for Apache Kafka using KSQL

Slide 75

Slide 75

@rmoff #KafkaMeetup Transform data with KSQL - merge streams ORDERS US US INSERT INTO ORDERS_COMBINED SELECT ‘US’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS FROM ORDERS; UK ORDERS_UK UK INSERT INTO ORDERS_COMBINED SELECT ‘UK’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS FROM ORDERS_UK; Building stream processing applications for Apache Kafka using KSQL

Slide 76

Slide 76

@rmoff #KafkaMeetup Transform data with KSQL - merge streams ORDERS US UK US INSERT INTO ORDERS_COMBINED SELECT ‘US’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS US FROM ORDERS; ORDERS_UK UK UK UK INSERT INTO ORDERS_COMBINED SELECT ‘UK’ AS SOURCE, ORDERTIME, ITEMID, ORDERUNITS, ADDRESS US FROM ORDERS_UK; ORDERS_COMBINED Building stream processing applications for Apache Kafka using KSQL

Slide 77

Slide 77

@rmoff #KafkaMeetup Transform data with KSQL - split streams US UK UK US ORDERS_COMBINED Building stream processing applications for Apache Kafka using KSQL

Slide 78

Slide 78

@rmoff #KafkaMeetup Transform data with KSQL - split streams US UK CREATE STREAM ORDERS_US AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’US’; UK US ORDERS_COMBINED CREATE STREAM ORDERS_UK AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’UK’; Building stream processing applications for Apache Kafka using KSQL

Slide 79

Slide 79

@rmoff #KafkaMeetup Transform data with KSQL - split streams US UK CREATE STREAM ORDERS_US AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’US’; US US ORDERS_US US UK ORDERS_COMBINED CREATE STREAM ORDERS_UK AS SELECT * FROM ORDERS_COMBINED WHERE SOURCE =’UK’; UK UK ORDERS_UK Building stream processing applications for Apache Kafka using KSQL