Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

A presentation at LISA 18 in October 2018 in Nashville, TN, USA by Robin Moffatt

Slide 1

Slide 1

confluent.io/ksql Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline! USENIX Large Installation System Administration Conference (LISA) October 31 2018 @rmoff [email protected]

Slide 2

Slide 2

@rmoff / http://cnfl.io/ksql $ whoami • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle Developer Champion • Blogging : http://rmoff.net & http://cnfl.io/rmoff • Twitter: • Geek stuff • Beer & Fried Breakfasts @rmoff https://speakerdeck.com/rmoff/ Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 3

Slide 3

@rmoff / http://cnfl.io/ksql Kafka Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 4

Slide 4

Kafka is a Streaming Platform App App App App @rmoff / http://cnfl.io/ksql request-response changelogs App App KAFKA App App DWH Hadoop messaging OR stream processing streaming data pipelines Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 5

Slide 5

@rmoff / http://cnfl.io/ksql Streaming is not just for realtime Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 6

Slide 6

@rmoff / http://cnfl.io/ksql Streaming is for everyone Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 7

Slide 7

@rmoff / http://cnfl.io/ksql All data is events Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 8

Slide 8

A Dumb Pipeline Logs @rmoff / http://cnfl.io/ksql HDFS / S3 / BigQuery etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 9

Slide 9

A Dumb Pipeline @rmoff / http://cnfl.io/ksql Logs HDFS / S3 / BigQuery etc Logs Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 10

Slide 10

@rmoff / http://cnfl.io/ksql Stream Processing with Apache Kafka and KSQL All logs Errors HDFS / S3 / BigQuery etc Logs Stream Processing Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 11

Slide 11

@rmoff / http://cnfl.io/ksql Real-time Event Stream Enrichment order events customer orders C D C RDBMS <y> customer Stream Processing Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 12

Slide 12

Transform Once, Use Many @rmoff / http://cnfl.io/ksql order events customer orders C D C RDBMS <y> customer Stream Processing New App <x> Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 13

Slide 13

Transform Once, Use Many @rmoff / http://cnfl.io/ksql order events customer orders C D C RDBMS <y> HDFS / S3 / etc customer Stream Processing New App <x> Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 14

Slide 14

@rmoff / http://cnfl.io/ksql Let’s Build It! Push notification Rating events Operational Dashboard User data Join events to users, and filter Data Lake Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 15

Slide 15

@rmoff / http://cnfl.io/ksql Let’s Build It! Rating events App Pro d Push notification I P A r e m App u s n o C uc e rA PI Operational Dashboard Elasticsearch User data RDBMS Join events to users, and filter Data Lake S3/HDFS/ SnowflakeDB etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 16

Slide 16

@rmoff / http://cnfl.io/ksql Kafka Connect Rating events App a k f a K t c e n n o C App u s n o C uc e rA PI Kafka Connect a fk t Ka ec n RDBMS I P A r e m Operational Dashboard Elasticsearch n Co User data Pro d Push notification Join events to users, and filter Data Lake S3/HDFS/ SnowflakeDB etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 17

Slide 17

@rmoff / http://cnfl.io/ksql Kafka Connect An API of Apache Kafka, providing reliable and scalable integration of Kafka with other systems – no coding required. { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo”, “table.whitelist”: “sales,orders,customers” } https://docs.confluent.io/current/connect/ Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 18

Slide 18

@rmoff / http://cnfl.io/ksql Streaming Integration with Kafka Connect syslog flat file CSV JSON Sources MQTT Tasks Workers Kafka Connect Kafka Brokers Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 19

Slide 19

@rmoff / http://cnfl.io/ksql Streaming Integration with Kafka Connect Amazon S3 Sinks MQT Tasks Workers Kafka Connect Kafka Brokers Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 20

Slide 20

@rmoff / http://cnfl.io/ksql Streaming Integration with Kafka Connect Amazon S3 syslog flat file CSV JSON Sources Sinks MQT MQTT Tasks Workers Kafka Connect Kafka Brokers Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 21

Slide 21

Confluent Hub @rmoff / http://cnfl.io/ksql • One-stop place to discover and download : • Connectors • Transformations • Converters hub.confluent.io Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 22

Slide 22

@rmoff / http://cnfl.io/ksql Kafka Connect + Schema Registry = WIN Avro Schema Schema Registry Elasticsearch RDBMS Kafka Connect Avro Message Kafka Connect Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 23

Slide 23

@rmoff / http://cnfl.io/ksql Kafka Connect + Schema Registry = WIN Avro Schema Schema Registry Elasticsearch RDBMS Kafka Connect Avro Message Kafka Connect Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 24

Slide 24

@rmoff / http://cnfl.io/ksql Kafka → Elasticsearch curl -X “POST” “http://kafka-connect-cp:18083/connectors/” \ -H “Content-Type: application/json” \ -d ‘{ “name”: “es_sink_lisa18”, “config”: { “connector.class”: “io.confluent.connect.elasticsearch.ElasticsearchSinkConnector”, “key.converter”: “org.apache.kafka.connect.storage.StringConverter”, “value.converter”: “org.apache.kafka.connect.json.JsonConverter”, “value.converter.schemas.enable”: false, “topics”: “lisa18”, “key.ignore”: “true”, “schema.ignore”: “true”, “type.name”: “type.name=kafkaconnect”, “connection.url”: “http://elasticsearch:9200” } }’ Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 25

Slide 25

@rmoff / http://cnfl.io/ksql Demo Time! Producer API MySQL t c e n n o C a k f Ka m u i z e b e D Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 26

Slide 26

@rmoff / http://cnfl.io/ksql Let’s Build It! Rating events App a k f a K t c e n n o C App u s n o C uc e rA PI Kafka Connect a fk t Ka ec n RDBMS I P A r e m Operational Dashboard Elasticsearch n Co User data Pro d Push notification Join events to users, and filter Data Lake S3/HDFS/ SnowflakeDB etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 27

Slide 27

@rmoff / http://cnfl.io/ksql KSQL Push notification Rating events App Kafka Connect a fk t Ka ec n RDBMS u s n o C uc e rA PI a k f a K t c e n n o C App Operational Dashboard Elasticsearch n Co User data Pro d I P A r e m KSQL Join events to users, and filter Data Lake S3/HDFS/ SnowflakeDB etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 28

Slide 28

@rmoff / http://cnfl.io/ksql KSQL is a Declarative Stream Processing Language Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 29

Slide 29

@rmoff / http://cnfl.io/ksql KSQL is the Streaming SQL Engine for Apache Kafka Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 30

Slide 30

@rmoff / http://cnfl.io/ksql KSQL for Real-Time Monitoring • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE ‘%Invalid user%’; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 31

Slide 31

@rmoff / http://cnfl.io/ksql KSQL for Streaming ETL Joining, filtering, and aggregating streams of event data CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = ‘Platinum’; Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 32

Slide 32

@rmoff / http://cnfl.io/ksql KSQL for Anomaly Detection Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count() > 3; Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 33

Slide 33

@rmoff / http://cnfl.io/ksql KSQL for Data Transformation Make simple derivations of existing topics from the command line CREATE STREAM pageviews WITH (PARTITIONS=4, VALUE_FORMAT=’AVRO’) AS SELECT * FROM pageviews_json; Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 34

Slide 34

@rmoff / http://cnfl.io/ksql KSQL in Development and Production Interactive KSQL for development and testing Headless KSQL for Production REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 35

Slide 35

@rmoff / http://cnfl.io/ksql Demo Time! Producer API MySQL t c e n n o C a k f Ka m u i z e b e D Kafka Connect Elasticsearch Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 36

Slide 36

@rmoff / http://cnfl.io/ksql { “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” Filter all ratings where STARS<3 POOR_RATINGS } Producer API CREATE STREAM POOR_RATINGS AS SELECT * FROM ratings WHERE STARS <3 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 37

Slide 37

@rmoff / http://cnfl.io/ksql Do you think that’s a table you are querying? Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 38

Slide 38

The Stream Table Duality Time Stream Account ID Amount 12345 + €50 12345

  • €25 12345 -€60 Read more: https://cnfl.io/stream-table-duality @rmoff / http://cnfl.io/ksql Account ID Balance Table 12345 €50 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 39

Slide 39

@rmoff / http://cnfl.io/ksql The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline! Photo by Bobby Burch on Unsplash

Slide 40

Slide 40

@rmoff / http://cnfl.io/ksql { “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA t c e n n o C a k f a K { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM RATINGS_WITH_CUSTOMER_DATA AS SELECT * FROM RATINGS LEFT JOIN CUSTOMERS ON R.ID=C.ID; } Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 41

Slide 41

@rmoff / http://cnfl.io/ksql { “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API t c e n n o C a k f a K Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA Filter for just PLATINUM customers UNHAPPY_PLATINUM_CUSTOMERS { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM UNHAPPY_PLATINUM_CUSTOMERS AS SELECT * FROM RATINGS_WITH_CUSTOMER_DATA WHERE STARS < 3 } Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 42

Slide 42

Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free! @rmoff / http://cnfl.io/ksql Log Events Database Changes loT Data Web Events … Confluent Platform Data Integration Real-time Applications Monitoring & Administration Confluent Control Center | Security Confluent Platform Transformations Hadoop Operations Replicator | Auto Data Balancing Custom Apps Database Data Compatibility Schema Registry SQL Stream Processing KSQL Data Warehouse Development and Connectivity Clients | Connectors | REST Proxy | CLI CRM Monitoring Apache Kafka® Core | Connect API | Streams API … CUSTOMER SELF-MANAGED Datacenter Analytics … CONFLUENT FULLY-MANAGED Public Cloud Confluent Cloud Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 43

Slide 43

@rmoff / http://cnfl.io/ksql If you remember one thing… (or three) •Kafka Connect • Integration between Kafka and other data stores •Kafka • Provides stream processing natively •KSQL • Build stream processing apps with just SQL Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 44

Slide 44

Free Books! @rmoff / http://cnfl.io/ksql https://www.confluent.io/apache-kafka-stream-processing-book-bundle Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 45

Slide 45

Try it out! @rmoff / http://cnfl.io/ksql https://cnfl.io/kafka-ksql-elastic Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 46

Slide 46

https://www.confluent.io/ksql http://cnfl.io/slack @rmoff [email protected]

Slide 47

Slide 47

Useful links @rmoff / http://cnfl.io/ksql • Embrace the Anarchy : Apache Kafka’s Role in Modern Data Architectures Recording & Slides • Look Ma, no Code! Building Streaming Data Pipelines with Apache Kafka and KSQL • Steps to Building a Streaming ETL Pipeline with Apache Kafka and KSQL Recording & Slides • https://www.confluent.io/blog/ksql-in-action-real-time-streaming-etl-from-oracle-transactional-data • https://github.com/confluentinc/ksql/ Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 48

Slide 48

Resources @rmoff / http://cnfl.io/ksql • CDC Spreadsheet #EOF • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / [email protected]) can help with introductions on a given sales op Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!