Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

A presentation at NDC Oslo in June 2019 in Oslo, Norway by Robin Moffatt

Slide 1

Slide 1

https://cnfl.io/ksql-workshop-prereq ® Kafka Apache and KSQL in Action : Let’s Build a Streaming Data Pipeline! @rmoff #NDCOslo

Slide 2

Slide 2

@rmoff #NDCOslo What is an Event Streaming Platform? Producer Connectors Consumer The Log Connectors Streaming Engine Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 3

Slide 3

Immutable Event Log Old @rmoff #NDCOslo New Messages are added at the end of the log Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 4

Slide 4

@rmoff #NDCOslo Topics Clicks Orders Customers Topics are similar in concept to tables in a database Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 5

Slide 5

@rmoff #NDCOslo Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 6

Slide 6

@rmoff #NDCOslo Partition Leadership and Replication Partition 1 TopicX partition1 Partition 2 TopicX partition1 TopicX partition1 TopicX partition2 TopicX partition2 TopicX partition2 TopicX partition3 TopicX partition3 Partition 3 TopicX partition3 Partition 4 TopicX partition4 TopicX partition4 Broker 1 Broker 2 TopicX partition4 Broker 3 Broker 4 Leader Follower Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 7

Slide 7

@rmoff #NDCOslo Partition Leadership and Replication - node failure Partition 1 TopicX partition1 Partition 2 TopicX partition1 TopicX partition1 TopicX partition2 TopicX partition2 TopicX partition2 TopicX partition3 TopicX partition3 Partition 3 TopicX partition3 Partition 4 TopicX partition4 TopicX partition4 Broker 1 Broker 2 TopicX partition4 Broker 3 Broker 4 Leader Follower Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 8

Slide 8

Producing to Kafka - No Key @rmoff #NDCOslo Time Messages will be produced in a round robin fashion Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 9

Slide 9

Producing to Kafka - With Key @rmoff #NDCOslo Time A B hash(key) % numPartitions = N C D Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 10

Slide 10

Messages are just K/V bytes @rmoff #NDCOslo plus headers + timestamp Clicks Header Timestamp Key Value Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 11

Slide 11

Messages are just K/V bytes @rmoff #NDCOslo With great power comes great responsibility Avro -> Confluent Schema Registry Protobuf JSON CSV https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 12

Slide 12

@rmoff #NDCOslo Consumers have a position all of their own Old New Sally is here Scan Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 13

Slide 13

@rmoff #NDCOslo Consumers have a position all of their own Old New Fred is here Scan Sally is here Scan Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 14

Slide 14

@rmoff #NDCOslo Consumers have a position all of their own George is here Scan Old New Fred is here Scan Sally is here Scan Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 15

Slide 15

@rmoff #NDCOslo Consuming From Kafka - Single Consumer Partition 1 Partition 2 Partition C 3 Partition 4 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 16

Slide 16

@rmoff #NDCOslo Consuming From Kafka - Multiple Consumers Partition 1 Partition 2 Partition 3 C1 C2 Partition 4 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 17

Slide 17

@rmoff #NDCOslo Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 CC C1 Partition 3 Partition 4 CC C2 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 18

Slide 18

@rmoff #NDCOslo Consuming From Kafka - Grouped Consumers Partition 1 Partition C1 C2 C3 C4 2 Partition 3 Partition 4 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 19

Slide 19

@rmoff #NDCOslo Consuming From Kafka - Grouped Consumers Partition 1 Partition C1 C2 C3 C34 2 Partition 3 Partition 4 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 20

Slide 20

@rmoff #NDCOslo Consuming From Kafka - Grouped Consumers Partition 1 Partition C1 C2 C3 C34 2 Partition 3 Partition 4 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 21

Slide 21

@rmoff #NDCOslo The Connect API Producer Connectors Consumer The Log Connectors Streaming Engine Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 22

Slide 22

@rmoff #NDCOslo Streaming Integration with Kafka Connect syslog Sources Kafka Connect Kafka Brokers Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 23

Slide 23

@rmoff #NDCOslo Streaming Integration with Kafka Connect Amazon S3 Sinks Google BigQuery Kafka Connect Kafka Brokers Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 24

Slide 24

@rmoff #NDCOslo Streaming Integration with Kafka Connect Amazon S3 syslog Google BigQuery Kafka Connect Kafka Brokers Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 25

Slide 25

Stream Processing in Kafka Producer Connectors @rmoff #NDCOslo Consumer The Log Connectors Streaming Engine Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 26

Slide 26

@rmoff #NDCOslo Kafka Streams API final StreamsBuilder builder = new StreamsBuilder() .stream(“orders”, Consumed.with(stringSerde, ordersSerde)) .filter( (key, order) -> order.getStatus().equals(“COMPLETE”) ) .to(“complete_orders”, Produced.with(stringSerde, ordersSerde)); Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 27

Slide 27

Stream Processing with KSQL @rmoff #NDCOslo CREATE STREAM completedOrders AS SELECT * FROM orders WHERE status=’COMPLETE’; Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 28

Slide 28

@rmoff #NDCOslo http://cnfl.io/book-bundle Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 29

Slide 29

@rmoff #NDCOslo A bit of a mess… App App App App cache monitoring cache MQ DWH security MQ search Hadoop Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 30

Slide 30

Kafka is a Streaming Platform App App App App @rmoff #NDCOslo request-response changelogs App App KAFKA App App DWH Hadoop messaging OR stream processing streaming data pipelines Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 31

Slide 31

Analytics - Database Offload RDBMS CDC @rmoff #NDCOslo HDFS / S3 / BigQuery etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 32

Slide 32

@rmoff #NDCOslo Stream Processing with Apache Kafka and KSQL order events CDC RDBMS customer orders customer Stream Processing Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 33

Slide 33

@rmoff #NDCOslo Real-time Event Stream Enrichment order events customer orders C D C RDBMS <y> customer Stream Processing Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 34

Slide 34

Transform Once, Use Many @rmoff #NDCOslo order events customer orders C D C RDBMS <y> customer Stream Processing New App <x> Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 35

Slide 35

Transform Once, Use Many @rmoff #NDCOslo order events customer orders C D C RDBMS <y> HDFS / S3 / etc customer Stream Processing New App <x> Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 36

Slide 36

@rmoff #NDCOslo Let’s Build It! Rating events App a k f a K t c e n n o C App u s n o C uc e rA PI Kafka Connect a fk t Ka ec n RDBMS I P A r e m Operational Dashboard Elasticsearch n Co User data Pro d Push notification KSQL Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 37

Slide 37

@rmoff #NDCOslo Confluent Community Components Apache Kafka with a bunch of cool stuff! For free! Log Events Database Changes loT Data Web Events … Confluent Platform Data Integration Real-time Applications Monitoring & Administration Confluent Control Center | Security Confluent Platform Transformations Hadoop Operations Replicator | Auto Data Balancing Custom Apps Database Data Compatibility Schema Registry SQL Stream Processing KSQL Data Warehouse Development and Connectivity Clients | Connectors | REST Proxy | CLI CRM Monitoring Apache Kafka® Core | Connect API | Streams API … CUSTOMER SELF-MANAGED Datacenter Public Cloud Analytics … CONFLUENT FULLY-MANAGED Confluent Cloud Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 38

Slide 38

KSQL Push notification to Slack Rating events App Kafka Connect a fk t Ka ec n RDBMS u s n o C uc e rA PI a k f a K t c e n n o C ratings App Operational Dashboard Elasticsearch n Co User data Pro d I P A r e m @rmoff #NDCOslo poor_ratings Data KSQL Filter events Lake S3/HDFS/ SnowflakeDB etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 39

Slide 39

@rmoff #NDCOslo KSQL is the Streaming SQL Engine for Apache Kafka Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 40

Slide 40

Filter messages with KSQL @rmoff #NDCOslo completedOrders orders → → → → → → → → → → → 01, £10.00, 05, £10.00, 06, £24.00, 02, £12.33, 04, £5.50, → COMPLETE COMPLETE COMPLETE PENDING COMPLETE CREATE STREAM completedOrders AS SELECT * FROM orders WHERE status=’COMPLETE’; → → → → → → → → → → → 01, £10.00, 06, £24.00, 02, £12.33, 04, £5.50, → COMPLETE COMPLETE COMPLETE COMPLETE Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 41

Slide 41

@rmoff #NDCOslo Drop columns with KSQL customer → → → → → → → → → → →→ {“id”:1, {“id”:2, {“id”:3, “name”:”Dana Lidgerton”, “name”:”Milo Wellsman”, “name”:”Dolph Cleeton”, “card”:”5048370182840140} “card”:”3557977885537506} “card”:”3586303633007251} CREATE STREAM customerNoCC AS SELECT ID, NAME customerNoCC FROM customer; → → → → → → → → → → →→ {“id”:1, {“id”:2, {“id”:3, “name”:”Dana Lidgerton”} “name”:”Milo Wellsman”} “name”:”Dolph Cleeton”} Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 42

Slide 42

Stateful aggregation with KSQL @rmoff #NDCOslo customersByCountry customer → → → → → → → → → → →→ {“id”:1, {“id”:2, {“id”:3, “name”:”Dana Lidgerton”, “name”:”Milo Wellsman”, “name”:”Dolph Cleeton”, “country”:”UK”} “country”:”UK”} “country”:”Germany”} CREATE STREAM customersByCountry AS SELECT country, COUNT(*) AS customerCount FROM customer WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY country; → → → → → → → → → → →→ {“country”:”UK”, {“country”:”Germany”, “customerCount”:2} “customerCount”:1} Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 43

Slide 43

@rmoff #NDCOslo KSQL for Anomaly Detection Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count() FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count() > 3; Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 44

Slide 44

@rmoff #NDCOslo KSQL for Data Transformation Make simple derivations of existing topics from the command line CREATE STREAM pageviews WITH (PARTITIONS=4, VALUE_FORMAT=’AVRO’) AS SELECT * FROM pageviews_json; Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 45

Slide 45

KSQL for Streaming ETL @rmoff #NDCOslo Joining, filtering, and aggregating streams of event data CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = ‘Platinum’; Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 46

Slide 46

@rmoff #NDCOslo KSQL in Development and Production Interactive KSQL for development and testing Headless KSQL for Production REST Desired KSQL queries have been identified “Hmm, let me try out this idea…” Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 47

Slide 47

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” @rmoff #NDCOslo Filter all ratings where STARS<3 POOR_RATINGS } Producer API CREATE STREAM POOR_RATINGS AS SELECT * FROM ratings WHERE STARS <3 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 48

Slide 48

@rmoff #NDCOslo https://cnfl.io/ksql-workshop-prereq • Make sure you allocate Docker >=8GB memory docker system info | grep Memory • Clone the repo • Pull the git images as instructed in the doc https://cnfl.io/start-ksql-workshop 3. Start Confluent Platform Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 49

Slide 49

@rmoff #NDCOslo https://cnfl.io/start-ksql-workshop 4. KSQL 5. Querying and filtering streams of data 6. Creating a Kafka topic populated by a filtered stream Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 50

Slide 50

Let’s Build It! Rating events App a k f a K t c e n n o C App u s n o C uc e rA PI Kafka Connect a fk t Ka ec n RDBMS I P A r e m Operational Dashboard Elasticsearch n Co User data Pro d Push notification to Slack @rmoff #NDCOslo Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 51

Slide 51

Kafka Connect Rating events App a k f a K t c e n n o C App u s n o C uc e rA PI Kafka Connect a fk t Ka ec n RDBMS I P A r e m Operational Dashboard Elasticsearch n Co User data Pro d Push notification to Slack @rmoff #NDCOslo Join events to users, and filter Data Lake SnowflakeDB/ S3/HDFS/etc Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 52

Slide 52

@rmoff #NDCOslo Streaming Integration with Kafka Connect Amazon S3 syslog Google BigQuery Kafka Connect Kafka Brokers Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 53

Slide 53

@rmoff #NDCOslo Kafka Connect Reliable and scalable integration of Kafka with other systems – no coding required. ✓ Centralized management and configuration ✓ Fault tolerant and automatically load balanced ✓ Support for hundreds of technologies ✓ Extensible API including RDBMS, Elasticsearch, HDFS, S3 ✓ Supports CDC ingest of events from RDBMS ✓ Preserves data schema ✓ Single Message Transforms ✓ Part of Apache Kafka, included in Confluent Platform { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo”, “table.whitelist”: “sales,orders,customers” } https://docs.confluent.io/current/connect/ Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 54

Slide 54

@rmoff #NDCOslo Kafka Connect + Schema Registry = WIN Avro Schema Schema Registry Elasticsearch RDBMS Kafka Connect Avro Message Kafka Connect Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 55

Slide 55

@rmoff #NDCOslo Kafka Connect + Schema Registry = WIN Avro Schema Schema Registry Elasticsearch RDBMS Kafka Connect Avro Message Kafka Connect Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 56

Slide 56

Confluent Hub @rmoff #NDCOslo • One-stop place to discover and download : • Connectors • Transformations • Converters hub.confluent.io Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 57

Slide 57

@rmoff #NDCOslo Demo Time! Producer API MySQL t c e n n o C a k f Ka m u i z e b e D Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 58

Slide 58

@rmoff #NDCOslo Do you think that’s a table you are querying? Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 59

Slide 59

Time The Stream/Table Duality Stream Account ID Amount 12345 + €50 12345

  • €25 12345 -€60 @rmoff #NDCOslo Account ID Balance Table 12345 €50 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 60

Slide 60

The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash

Slide 61

Slide 61

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API @rmoff #NDCOslo Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA t c e n n o C a k f a K { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM RATINGS_WITH_CUSTOMER_DATA AS SELECT * FROM RATINGS LEFT JOIN CUSTOMERS ON R.ID=C.ID; } Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 62

Slide 62

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” } Producer API t c e n n o C a k f a K @rmoff #NDCOslo Join each rating to customer data RATINGS_WITH_CUSTOMER_DATA Filter for just PLATINUM customers UNHAPPY_PLATINUM_CUSTOMERS { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” CREATE STREAM UNHAPPY_PLATINUM_CUSTOMERS AS SELECT * FROM RATINGS_WITH_CUSTOMER_DATA WHERE STARS < 3 } Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 63

Slide 63

{ “rating_id”: 5313, “user_id”: 3, “stars”: 4, “route_id”: 6975, “rating_time”: 1519304105213, “channel”: “web”, “message”: “worst. flight. ever. #neveragain” @rmoff #NDCOslo CREATE TABLE RATINGS_BY_CLUB_STATUS AS SELECT CLUB_STATUS, COUNT(*) Join each rating to customer data FROM RATINGS_WITH_CUSTOMER_DATA Producer API RATINGS_WITH_CUSTOMER_DATA WINDOW TUMBLING (SIZE 1 MINUTES) GROUP BY CLUB_STATUS; } t c e n n o C a k f a K { “id”: 3, “first_name”: “Merilyn”, “last_name”: “Doughartie”, “email”: “[email protected]”, “gender”: “Female”, “club_status”: “platinum”, “comments”: “none” } Aggregate per-minute by CLUB_STATUS RATINGS_BY_CLUB_STATUS_1MIN Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 64

Slide 64

Stream to Elasticsearch @rmoff #NDCOslo Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 65

Slide 65

@rmoff #NDCOslo https://cnfl.io/start-ksql-workshop 7. Kafka Connect / Integrating Kafka with a database 8. The Stream/Table duality 9. Joining Data in KSQL 10. Streaming Aggregates 11. Optional: Stream data to Elasticsearch Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 66

Slide 66

https://www.confluent.io/ksql http://cnfl.io/demo-scene http://cnfl.io/book-bundle http://cnfl.io/slack @rmoff #NDCOslo @rmoff Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 67

Slide 67

@rmoff #NDCOslo Related Talks •The Changing Face of ETL: Event-Driven Architectures for Data Engineers • 📖 Slides •Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline! • 📖 Slides • 👾 Code •ATM Fraud detection with Kafka and KSQL • 📽 Recording • 📖 Slides • 👾 Code • 📽 Recording •No More Silos: Integrating Databases and Apache Kafka • 📖 Slides • 👾 Code (MySQL) •Embrace the Anarchy: Apache Kafka’s Role in Modern Data Architectures • 📖 Slides • 👾 Code (Oracle) • 📽 Recording • 📽 Recording Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!

Slide 68

Slide 68

@rmoff #NDCOslo Resources #EOF • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / [email protected]) can help with introductions on a given sales op Apache Kafka and KSQL in Action : Let’s Build a Streaming Data Pipeline!