Kafka as a Platform: the Ecosystem from the Ground Up

A presentation at Kafka Summit London 2022 in April 2022 in London, UK by Robin Moffatt

• • EVENTS d e n e p p a h g n i h t e Som d e n e p p a h t a Wh

Human generated events A Sale A Stock movement @rmoff

Machine generated events Networking IoT Applications @rmoff

Immutable Event Log Old New Events are added at the end of the log @rmoff

Topics Clicks Orders Customers Topics are similar in concept to tables in a database @rmoff

Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition @rmoff

Producing data Old New Messages are added at the end of the log @rmoff

package main import ( “gopkg.in/confluentinc/confluent-kafka-go.v1/kafka” ) func main() { topic := “test_topic” p, _ := kafka.NewProducer(&kafka.ConfigMap{ “bootstrap.servers”: “localhost:9092”}) defer p.Close() p.Produce(&kafka.Message{ TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: 0}, Value: []byte(“Hello world”)}, nil) }

Producing to Kafka - No Key Time Partition 1 Partition 2 Partition 3 Partition 4 Messages will be batched and randomly distributed across the partitions @rmoff

Producing to Kafka - With Key Time Partition 1 A Partition 2 B hash(key) % numPartitions = N Partition 3 C Partition 4 D @rmoff

Producers • A client application • Puts messages into topics • Handles partitioning, network protocol • Java, Go, .NET, C/C++, Python • Also every other language Plus REST proxy if not @rmoff

Consuming data - access is only sequential Read to offset & scan Old New @rmoff

Consumers have a position of their own Old Victoria is here New Scan @rmoff

Consumers have a position of their own Old Victoria is here New Scan Tim is here Scan @rmoff

Consumers have a position of their own Rick is here Scan Old Victoria is here New Scan Tim is here Scan @rmoff

c, _ := kafka.NewConsumer(&cm) defer c.Close() c.Subscribe(topic, nil) for { select { case ev := <-c.Events(): switch ev.(type) { case *kafka.Message: km := ev.(*kafka.Message) fmt.Printf(“✅ Message ‘%v’ received from topic ‘%v’\n”, string(km.Value), string(*km.TopicPartition.Topic)) } } }

Consuming From Kafka - Single Consumer Partition 1 App1 Partition 2 Partition 3 Partition 4 @rmoff

Consuming From Kafka - Multiple Consumers Partition 1 A1 1 App Partition 2 Partition 3 App2 Partition 4 @rmoff

Consuming From Kafka - Grouped Consumers Partition 1 A App 1 App11 Partition 2 Partition 3 App2 Partition 4 @rmoff

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 App1 Partition 4 @rmoff

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 3 App1 Partition 4 @rmoff

Consumers • A client application App App11 A App2 • Reads messages from topics • Horizontally, elastically scalable (if stateless) • Java, Go, .NET, C/C++, Python, everything else Plus REST proxy if not @rmoff

Leader Partition Leadership and Replication Follower Partition 1 Partition 2 Partition 3 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

So far, this is Pretty good but I’ve not finished yet… @rmoff

Streaming Pipelines Amazon S3 RDBMS HDFS @rmoff

Streaming Integration with Kafka Connect syslog Sources Kafka Connect Kafka Brokers @rmoff

Streaming Integration with Kafka Connect Amazon Sinks Google Kafka Connect Kafka Brokers @rmoff

Streaming Integration with Kafka Connect Amazon syslog Google Kafka Connect Kafka Brokers @rmoff

Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } @rmoff

Extensible Connector Transform(s) Converter @rmoff

Lack of schemas – Coupling teams and services 2001 2001 Citrus Heights-Sunrise Blvd Citrus_Hghts 60670001 3400293 34 SAC Sacramento SV Sacramento Valley SAC Sacramento County APCD SMA8 Sacramento Metropolitan Area CA 6920 Sacramento 28 6920 13588 7400 Sunrise Blvd 95610 38 41 56 38.6988889 121 16 15.98999977 -121.271111 10 4284781 650345 52 @rmoff

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV @rmoff

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV 👍 👍 👍 😬 https://rmoff.dev/qcon-schemas @rmoff

Schemas Schema Registry Topic producer … consumer @rmoff

.stream(“widgets”, Consumed.with(stringSerde, widgetsSerde)) .filter( (key, widget) -> widget.getColour().equals(“RED”) ) .to(“widgets_red”, Produced.with(stringSerde, widgetsSerde));

Stream Processing with ksqlDB Stream: widgets ksqlDB CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red @rmoff

FROM WIDGETS WHERE WEIGHT_G > 120 { SELECT COUNT(*) FROM WIDGETS GROUP BY PRODUCTION_LINE SELECT AVG(TEMP_CELCIUS) AS TEMP FROM WIDGETS GROUP BY SENSOR_ID HAVING TEMP>20 ‘connector.class’ = ‘S3Connector’, ‘topics’ = ‘widgets’ …);

ksqlDB or Kafka Streams? @rmoff Photo by Ramiz Dedaković on Unsplash

Standing on the Shoulders of Streaming Giants ksqlDB Powered by Ease of use ksqlDB UDFs Kafka Streams Powered by Producer, Consumer APIs Flexibility @rmoff

Producer Consumer The Log Connectors @rmoff

Producer Consumer The Log Connectors Streaming Engine @rmoff

Apache Kafka Producer Consumer The Log Connectors Streaming Engine @rmoff

Producer Security Schema Registry Consumer The Log Streaming Engine ksqlDB REST Proxy Connectors Confluent Control Center

F2 00 OF RM Free money! (additional $200 towards your bill 😄 ) Fully Managed Kafka as a Service fl fl
T&C: https://www.con uent.io/con uent-cloud-promo-disclaimer

s e l c i t r a eep-dive D • • • • • ka? f a K e ds h c n a e r p T A d s i e lat e R What . s v g min a e per r e t e S K t o n o e Z v E ut o h t i w fka a a K k f n a i K s : e t f ante KRa r a u G & ns o i t c a s n a Tr ge a r o t S & ng Processi tals n e m a d Fun • • • • • e c n a m r o f r Kafka Pe a k f a K e v i t ms e t s y S Cloud-na e s ba a t a D g n Streami fka a K e h c a p ls a n Testing A r e t n I s fka’ a K e r o l Exp • • • • • Over 10 Apache K afka 101 Kafka Co nnect 10 1 Kafka Str eams 101 ksqlDB 1 01 Inside ks qlDB hours of • • • • f ree cou rses Spring F ramewo rk and K Building afka Data Pip elines wi Event So th Kafka urcing w ith Kafka Data Me sh 101 Plus: Hands-on Quick Starts and Client Language Guides + Event Streaming Patterns + More fl developer.con uent.io

#EOF @rmoff rmoff.dev/talks youtube.com/rmoff

Robin Moffatt
@rmoff

1 / 94

Kafka has become a key data infrastructure technology, and we all have at least a vague sense that it is a messaging system, but what else is it? How can an overgrown message bus be getting this much buzz? Well, because Kafka is merely the center of a rich streaming data platform that invites detailed exploration.

In this talk, we’ll look at the entire streaming platform provided by Apache Kafka and the Confluent community components. Starting with a lonely key-value pair, we’ll build up topics, partitioning, replication, and low-level Producer and Consumer APIs. We’ll group consumers into elastically scalable, fault-tolerant application clusters, then layer on more sophisticated stream processing APIs like Kafka Streams and ksqlDB. We’ll help teams collaborate around data formats with schema management. We’ll integrate with legacy systems without writing custom code. By the time we’re done, the open-source project we thought was Big Data’s answer to message queues will have become an enterprise-grade streaming platform, all in 45 minutes.

Resources

The following resources were mentioned during the presentation or are useful additional information.

🎥 Recording
☁️Confluent Cloud☁️

Managed Apache Kafka, ksqlDB, and Schema Registry. Use code RMOFF200 when you sign up!
Kafka Internals - free training

Full length course written by Jun Rao, one of the original creators of Apache Kafka.
Apache Kafka 101

Free training course
Confluent Developer

The pre-eminent resource for learning Apache Kafka. There are free training courses, event streaming patterns, deep-dive articles, and language-specific client programming guides. Check it out!

Buzz and feedback

Here’s what was said about this presentation on social media.

Great session on the Kafka Ecosystem @rmoff #kafkasummit pic.twitter.com/nzRaaloq83
— DT® (@tanoe09) April 25, 2022
#kafkasummit First time speaking in person since I spoke at @QCon in London back in 2020. Feels VERY strange, and I’d be lying if I said I wasn’t nervous! pic.twitter.com/Poh9PgTIL6
— Robin Moffatt 🍻🏃🥓 (@rmoff) April 25, 2022
New to @apachekafka, or want a recap of what it is and its surrounding ecosystem? In his talk, @rmoff shares Kafka's concepts and capabilities from the ground up. #KafkaSummit https://t.co/k6620ZMsFg pic.twitter.com/QQAiUGK8Kv
— Confluent (@confluentinc) April 25, 2022
An @rmoff talking about the Kafka Ecosystem at #kafkasummit pic.twitter.com/TDdMHJOql2
— Tim Berglund (@tlberglund) April 25, 2022
"Life doesn't happen in batch" - more wise words from @rmoff https://t.co/BzOcWRtMXH
— Steve Cellini (@randomthinking) April 25, 2022
Confluent's @rmoff just started the afternoon round session here at @apachekafka #kafkasummit and the room is packed as expected. pic.twitter.com/C2olHB7WL8
— Hans-Peter Grahsl 🕊 (@hpgrahsl) April 25, 2022
#speakerselfie #kafkasummit #streamingselfie #devrel pic.twitter.com/U5FmTa7hi1
— Robin Moffatt 🍻🏃🥓 (@rmoff) April 25, 2022

Kafka as a Platform: the Ecosystem from the Ground Up

Link for this presentation:

HTML code for embedding:

Share on social media:

Resources

🎥 Recording

☁️Confluent Cloud☁️

Kafka Internals - free training

Apache Kafka 101

Confluent Developer

Buzz and feedback