Kafka as a Platform: the Ecosystem from the Ground Up Robin Moffatt @rmoff

EVENTS @rmoff

EVENTS @rmoff

• • EVENTS d e n e p p a h g n i h t e Som d e n e p p a h t a Wh

Human generated events A Sale A Stock movement @rmoff

Machine generated events Networking IoT Applications @rmoff

EVENTS are EVERYWHERE @rmoff

EVENTS y r e v ^ are POWERFUL @rmoff

K V

LOG @rmoff

K V

K V

K V

K V

K V

K V

K V

Immutable Event Log Old New Events are added at the end of the log @rmoff

TOPICS @rmoff

Topics Clicks Orders Customers Topics are similar in concept to tables in a database @rmoff

PARTITIONS @rmoff

Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition @rmoff

PUB / SUB @rmoff

PUB / SUB @rmoff

Producing data Old New Messages are added at the end of the log @rmoff

partition 0 … partition 1 producer … partition 2 … Partitioned Topic

package main import ( “gopkg.in/confluentinc/confluent-kafka-go.v1/kafka” ) func main() { topic := “test_topic” p, _ := kafka.NewProducer(&kafka.ConfigMap{ “bootstrap.servers”: “localhost:9092”}) defer p.Close() p.Produce(&kafka.Message{ TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: 0}, Value: []byte(“Hello world”)}, nil) }

Producing to Kafka - No Key Time Partition 1 Partition 2 Partition 3 Messages will be produced in a round robin fashion Partition 4 @rmoff

Producing to Kafka - With Key Time Partition 1 A Partition 2 B hash(key) % numPartitions = N Partition 3 C Partition 4 D @rmoff

Producers partition 0 … partition 1 producer … partition 2 … Partitioned Topic • A client application • Puts messages into topics • Handles partitioning, network protocol • Java, Go, .NET, C/C++, Python • Also every other language

PUB / SUB @rmoff

Consuming data - access is only sequential Read to offset & scan Old New @rmoff

Consumers have a position of their own Old New Sally is here Scan @rmoff

Consumers have a position of their own Old New Fred is here Scan Sally is here Scan @rmoff

Consumers have a position of their own Rick is here Scan Old New Fred is here Scan Sally is here Scan @rmoff

c, _ := kafka.NewConsumer(&cm) defer c.Close() c.Subscribe(topic, nil) for { select { case ev := <-c.Events(): switch ev.(type) { case *kafka.Message: km := ev.(*kafka.Message) fmt.Printf(“✅ Message ‘%v’ received from topic ‘%v’\n”, string(km.Value), string(*km.TopicPartition.Topic)) } } }

Consuming From Kafka - Single Consumer Partition 1 Partition 2 C Partition 3 Partition 4 @rmoff

Consuming From Kafka - Multiple Consumers Partition 1 Partition 2 Partition 3 C1 C2 Partition 4 @rmoff

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 CC1 1 CC1 1 C2 Partition 4 @rmoff

CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 Ci Cii Ciii Civ Partition 4 @rmoff

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 Ci Cii Ciii Civ Partition 4 @rmoff

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 Ci Cii Ciii Partition 4 @rmoff

Consumers partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … Partitioned Topic consumer B • A client application • Reads messages from topics • Horizontally, elastically scalable (if stateless) • Java, Go, .NET, C/C++, Python, everything else

BROKERS and REPLICATION @rmoff

Leader Partition Leadership and Replication Follower Partition 1 Partition 2 Partition 3 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff

DEMO @rmoff Photo by Raoul Droog on Unsplas

So far, this is Pretty good @rmoff

So far, this is Pretty good but I’ve not finished yet… @rmoff

Streaming Pipelines Amazon S3 RDBMS HDFS @rmoff

Evolve processing from old systems to new Existing App New App <x> RDBMS @rmoff

Streaming Integration with Kafka Connect syslog Sources Kafka Connect Kafka Brokers @rmoff

Streaming Integration with Kafka Connect Amazon Sinks Google Kafka Connect Kafka Brokers @rmoff

Streaming Integration with Kafka Connect Amazon syslog Google Kafka Connect Kafka Brokers @rmoff

Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } @rmoff

Extensible Connector Transform(s) Converter @rmoff

hub.confluent.io @rmoff

Single Kafka Connect node S3 Task #1 JDBC Task #1 JDBC Task #2 Kafka Connect cluster Worker Worker Offsets Config Status @rmoff

Kafka Connect - scalable and fault-tolerant S3 Task #1 JDBC Task #1 Kafka Connect cluster JDBC Task #2 Worker Worker Offsets Config Status @rmoff

Automatic fault tolerance S3 Task #1 JDBC Task #1 JDBC Task #2 Kafka Connect cluster Worker Worker Offsets Config Status @rmoff

K V

K V

K V

K V

K V ? s i h t s ’ t a h w … t i a W

How do you serialise your data? JSON Avro Protobuf Schema JSON CSV @rmoff

APIs are contracts between services {user_id: 53, address: “2 Elm st.”} Profile service Quote service {user_id: 53, quote: 580} @rmoff

But not all services {user_id: 53, address: “2 Elm st.”} Profile service Quote service {user_id: 53, quote: 580} @rmoff

And naturally… {user_id: 53, address: “2 Elm st.”} Profile service Quote service Profile database Stream processing @rmoff

Schemas are about how teams work together {user_id: 53, timestamp: 1497842472} new Date(timestamp) Profile service Quote service Profile database create table ( user_id number, timestamp number) Stream processing @rmoff

Things change… {user_id: 53, timestamp: “June 28, 2017 4:00pm”} Profile service Quote service Profile database Stream processing @rmoff

Moving fast and breaking things {user_id: 53, timestamp: “June 28, 2017 4:00pm”} Profile service Quote service Profile database Stream processing @rmoff

Lack of schemas – Coupling teams and services 2001 2001 Citrus Heights-Sunrise Blvd Citrus_Hghts 60670001 3400293 34 SAC Sacramento SV Sacramento Valley SAC Sacramento County APCD SMA8 Sacramento Metropolitan Area CA 6920 Sacramento 28 6920 13588 7400 Sunrise Blvd 95610 38 41 56 38.6988889 121 16 15.98999977 -121.271111 10 4284781 650345 52 @rmoff

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV @rmoff

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV 👍 👍 👍 😬 https://rmoff.dev/qcon-schemas @rmoff

Schemas Schema Registry Topic producer … consumer

n a y l l Actua ! c i p o t l a n r e t in Schemas Schema Registry Topic producer … consumer

Producers contain serializers props.put(“key.serializer”, “org.apache.kafka.serializers.StringSerializer”); props.put(“value.serializer”, “io.confluent.kafka.serializers.KafkaAvroSerializer”); props.put(“schema.registry.url”, “http://schema-registry:8081”); … producer<String, LogLine> producer = new KafkaProducer<String, LogLine>(props); @rmoff

partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … consumer B Partitioned Topic @rmoff

consumer A consumer A consumer A @rmoff

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff

Streams of events Time @rmoff

Stream Processing Stream: widgets Stream: widgets_red @rmoff

Stream Processing with Kafka Streams Stream: widgets final StreamsBuilder builder = new StreamsBuilder() .stream(“widgets”, Consumed.with(stringSerde, widgetsSerde)) .filter( (key, widget) -> widget.getColour().equals(“RED”) ) .to(“widgets_red”, Produced.with(stringSerde, widgetsSerde)); Stream: widgets_red @rmoff

Streams Application Streams Application Streams Application @rmoff

Stream Processing with ksqlDB Stream: widgets ksqlDB CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red @rmoff

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff

SELECT * FROM WIDGETS WHERE WEIGHT_G > 120 { SELECT COUNT(*) FROM WIDGETS GROUP BY PRODUCTION_LINE } SELECT AVG(TEMP_CELCIUS) AS TEMP FROM WIDGETS GROUP BY SENSOR_ID HAVING TEMP>20 Photo by Franck V. on Unsplash “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 CREATE SINK CONNECTOR dw WITH ( Object store, ‘connector.class’ = ‘S3Connector’, data warehouse, ‘topics’ = ‘widgets’ RDBMS …); @rmoff

Stream Processing with ksqlDB Source stream @rmoff

Stream Processing with ksqlDB Source stream @rmoff

Stream Processing with ksqlDB Source stream @rmoff

Stream Processing with ksqlDB Source stream Analytics @rmoff

Stream Processing with ksqlDB Source stream Applications / Microservices @rmoff

Stream Processing with ksqlDB …SUM(TXN_AMT) GROUP BY AC_ID AC _I D= 42 BA LA NC AC E= _I 94 D= .0 42 0 Source stream Applications / Microservices @rmoff

ksqlDB or Kafka Streams? @rmoff Photo by Ramiz Dedaković on Unsplash

Standing on the Shoulders of Streaming Giants ksqlDB Powered by Ease of use ksqlDB UDFs Kafka Streams Powered by Producer, Consumer APIs Flexibility @rmoff

DEMO @rmoff Photo by Raoul Droog on Unsplas

Summary @rmoff

@rmoff

K V @rmoff

K V @rmoff

The Log @rmoff

Producer Consumer The Log @rmoff

Producer Consumer The Log Connectors @rmoff

Producer Consumer The Log Connectors Streaming Engine @rmoff

Apache Kafka Producer Consumer The Log Connectors Streaming Engine @rmoff

Producer Security Schema Registry Consumer The Log Streaming Engine ksqlDB REST Proxy Connectors Confluent Control Center

EVENTS are EVERYWHERE @rmoff

EVENTS y r e v ^ are POWERFUL @rmoff

Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io

@rmoff #EOF rmoff.dev/talks