Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline! @rmoff #javaBinOnline
A presentation at javaBin Online in May 2020 in Oslo, Norway by Robin Moffatt
Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline! @rmoff #javaBinOnline
Kafka is an Event Streaming Platform App App App @rmoff #javaBinOnline App request-response changelogs App App KAFKA App App DWH Hadoop messaging OR stream processing streaming data pipelines Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
What is an Event Streaming Platform? Producer Connectors @rmoff #javaBinOnline Consumer The Log Connectors Streaming Engine Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Immutable Event Log Old @rmoff #javaBinOnline New Messages are added at the end of the log Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Topics @rmoff #javaBinOnline Clicks Orders Customers Topics are similar in concept to tables in a database Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Partitions Clicks @rmoff #javaBinOnline p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Messages are just K/V bytes @rmoff #javaBinOnline plus headers + timestamp Clicks Header Timestamp Key Value Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Messages are just K/V bytes @rmoff #javaBinOnline With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf -> Confluent Schema Registry JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Consumers have a position all of their own Old New Sally is here Scan Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Consumers have a position all of their own Old New Fred is here Scan Sally is here Scan Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Consumers have a position all of their own George is here Scan Old New Fred is here Scan Sally is here Scan Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Free Books! @rmoff #javaBinOnline https://rmoff.dev/javabin Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Streaming Data Pipelines Photo by Victor Garcia on Unsplash @rmoff #javaBinOnline Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Database Offload @rmoff #javaBinOnline Amazon S3 RDBMS Kafka Connect Kafka Connect HDFS Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Stream Processing with Apache Kafka and ksqlDB order events CDC RDBMS customer orders customer Stream Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Real-time Event Stream Enrichment @rmoff #javaBinOnline order events customer orders C D C RDBMS <y> customer Stream Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Building a streaming data pipeline Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Stream Integration + Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Integration Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Streaming Integration with Kafka Connect syslog Sources Kafka Connect Kafka Brokers Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Streaming Integration with Kafka Connect Amazon S3 Sinks Google BigQuery Kafka Connect Kafka Brokers Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Streaming Integration with Kafka Connect Amazon S3 syslog Google BigQuery Kafka Connect Kafka Brokers Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } https://docs.confluent.io/current/connect/ “connection.url”: Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Messages are just K/V bytes @rmoff #javaBinOnline With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf -> Confluent Schema Registry JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Extensible Connector Transform(s) Converter Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Confluent Hub hub.confluent.io Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Stream Integration + Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Stream Processing Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
ksqlDB @rmoff #javaBinOnline The event streaming database purpose-built for stream processing applications. Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Stream Processing with ksqlDB @rmoff #javaBinOnline Source stream Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Stream Processing with ksqlDB @rmoff #javaBinOnline Source stream Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Stream Processing with ksqlDB @rmoff #javaBinOnline Source stream Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Stream Processing with ksqlDB @rmoff #javaBinOnline Source stream Analytics Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Stream Processing with ksqlDB @rmoff #javaBinOnline Source stream Applications / Microservices Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Stream Processing with ksqlDB …SUM(TXN_AMT) GROUP BY AC_ID AC _I D= 42 BA LA NC AC E= _I 94 D= .0 42 0 Source stream Applications / Microservices Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Let’s Build It! Rating events App Pro d uc e rA PI { “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Let’s Build It! Rating events App a k f a K t c e n n o C App u s n o C uc e rA PI Kafka Connect a fk t Ka ec n RDBMS Pro d I P A r e m Operational Dashboard Elasticsearch n Co User data Push notification ksqlDB Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline https://rmoff.dev/qcon-workshop Rating events App Pro d Push notification I P A r e m App u s n o C uc e rA PI Demo Time! a fk t Ka ec n RDBMS Operational Dashboard Elasticsearch n Co User data a k f a K t c e n n o C Kafka Connect ksqlDB Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Push notification Rating events App Kafka Connect a fk t Ka ec n RDBMS u s n o C uc e rA PI a k f a K t c e n n o C App Operational Dashboard Elasticsearch n Co User data Pro d I P A r e m ksqlDB Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Push notification to Slack Rating events App Pro d I P A r e m App u s n o C uc e rA PI Kafka Connect Operational Dashboard Elasticsearch n Co a fk t Ka ec n a k f a K t c e n n o C poor_ratings Data ratings ksqlDB User data RDBMS Lake Filter events S3/HDFS/ SnowflakeDB etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” @rmoff #javaBinOnline Filter all ratings where STARS<3 POOR_RATINGS } Producer API CREATE STREAM POOR_RATINGS AS SELECT * FROM ratings WHERE STARS <3 Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Push notification to Slack Kafka Connect Rating events App Kafka Connect a fk t Ka ec n RDBMS u s n o C uc e rA PI a k f a K t c e n n o C App Operational Dashboard Elasticsearch n Co User data Pro d I P A r e m Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Producer API MySQL t c e n n o C a k f Ka m u i z e b e D Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Time The Stream/Table Duality Stream Account ID Amount 12345 + £50 12345
The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash
{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API @rmoff #javaBinOnline Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA t c e n n o C a k f a K { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM RATINGS_WITH_CUSTOMER_DATA AS SELECT * FROM RATINGS LEFT JOIN CUSTOMERS ON R.ID=C.ID; } Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API t c e n n o C a k f a K @rmoff #javaBinOnline Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA Filter for just PLATINUM customers UNHAPPY_PLATINUM_CUSTOMERS { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM UNHAPPY_PLATINUM_CUSTOMERS AS SELECT * FROM RATINGS_WITH_CUSTOMER_DATA WHERE STARS < 3 } Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” @rmoff #javaBinOnline CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) Join each rating to customer data FROM RATINGS_WITH_CUSTOMER_DATA Producer APWINDOW I TUMBLING RATINGS_WITH_CUSTOMER_DATA (SIZE 1 MINUTES) GROUP BY CLUB_STATUS; } t c e n n o C a k f a K { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” } Aggregate per-minute by CLUB_STATUS RATINGS_BY_CLUB_STATUS_1MIN Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Kafka Connect → Elasticsearch @rmoff #javaBinOnline Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Photo by on @rmoff #javaBinOnline Want to learn more? CTAs, not CATs (sorry, not sorry) Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io
Confluent Community Slack group @rmoff #javaBinOnline cnfl.io/slack Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Free Books! @rmoff #javaBinOnline https://rmoff.dev/javabin Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
Resources @rmoff #javaBinOnline #EOF • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / [email protected]) can help with introductions on a given sales op Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!
@rmoff #javaBinOnline Related Talks •The Changing Face of ETL: Event-Driven Architectures for Data Engineers : https://rmoff.dev/oredev19-changing-face-of-etl •From Zero to Hero with Kafka Connect: http://rmoff.dev/ksldn19-kafka-connect •An introduction to ksqlDB: https://rmoff.dev/kssf19-ksql-slides •No More Silos: Integrating Databases and Apache Kafka http://rmoff.dev/ksny19-no-more-silos Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!