Kafka as a Platform: the Ecosystem from the Ground Up

A presentation at GOTOpia in November 2020 in by Robin Moffatt

Slide 1

Slide 1

Kafka as a Platform: the Ecosystem from the Ground Up Robin Moffatt | #GOTOpia | @rmoff

Slide 2

Slide 2

EVENTS @rmoff | #GOTOpia | @confluentinc

Slide 3

Slide 3

EVENTS @rmoff | #GOTOpia | @confluentinc

Slide 4

Slide 4

• • EVENTS d e n e p p a h g n i h t e Som d e n e p p a h t a Wh

Slide 5

Slide 5

Human generated events A Sale A Stock movement @rmoff | #GOTOpia | @confluentinc

Slide 6

Slide 6

Machine generated events IoT Networking Applications @rmoff | #GOTOpia | @confluentinc

Slide 7

Slide 7

EVENTS are EVERYWHERE @rmoff | #GOTOpia | @confluentinc

Slide 8

Slide 8

EVENTS y r e v ^ are POWERFUL @rmoff | #GOTOpia | @confluentinc

Slide 9

Slide 9

Slide 10

Slide 10

Slide 11

Slide 11

K V

Slide 12

Slide 12

LOG @rmoff | #GOTOpia | @confluentinc

Slide 13

Slide 13

K V

Slide 14

Slide 14

K V

Slide 15

Slide 15

K V

Slide 16

Slide 16

K V

Slide 17

Slide 17

K V

Slide 18

Slide 18

K V

Slide 19

Slide 19

K V

Slide 20

Slide 20

Immutable Event Log Old New Events are added at the end of the log @rmoff | #GOTOpia | @confluentinc

Slide 21

Slide 21

TOPICS @rmoff | #GOTOpia | @confluentinc

Slide 22

Slide 22

Topics Clicks Orders Customers Topics are similar in concept to tables in a database @rmoff | #GOTOpia | @confluentinc

Slide 23

Slide 23

PARTITIONS @rmoff | #GOTOpia | @confluentinc

Slide 24

Slide 24

Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition @rmoff | #GOTOpia | @confluentinc

Slide 25

Slide 25

PUB / SUB @rmoff | #GOTOpia | @confluentinc

Slide 26

Slide 26

PUB / SUB @rmoff | #GOTOpia | @confluentinc

Slide 27

Slide 27

Producing data Old New Messages are added at the end of the log @rmoff | #GOTOpia | @confluentinc

Slide 28

Slide 28

partition 0 … partition 1 producer … partition 2 … Partitioned Topic

Slide 29

Slide 29

package main import ( “gopkg.in/confluentinc/confluent-kafka-go.v1/kafka” ) func main() { topic := “test_topic” p, _ := kafka.NewProducer(&kafka.ConfigMap{ “bootstrap.servers”: “localhost:9092”}) defer p.Close() p.Produce(&kafka.Message{ TopicPartition: kafka.TopicPartition{Topic: &topic, Partition: 0}, Value: []byte(“Hello world”)}, nil) }

Slide 30

Slide 30

Producing to Kafka - No Key Time Partition 1 Partition 2 Partition 3 Messages will be produced in a round robin fashion Partition 4 @rmoff | #GOTOpia | @confluentinc

Slide 31

Slide 31

Producing to Kafka - With Key Time Partition 1 A Partition 2 B hash(key) % numPartitions = N Partition 3 C Partition 4 D @rmoff | #GOTOpia | @confluentinc

Slide 32

Slide 32

Producers partition 0 … partition 1 producer … partition 2 … Partitioned Topic • A client application • Puts messages into topics • Handles partitioning, network protocol • Java, Go, .NET, C/C++, Python • Also every other language Plus REST proxy if not

Slide 33

Slide 33

PUB / SUB @rmoff | #GOTOpia | @confluentinc

Slide 34

Slide 34

Consuming data - access is only sequential Read to offset & scan Old New @rmoff | #GOTOpia | @confluentinc

Slide 35

Slide 35

Consumers have a position of their own Old New Sally is here @rmoff | Scan #GOTOpia | @confluentinc

Slide 36

Slide 36

Consumers have a position of their own Old New Fred is here Sally is here Scan @rmoff | Scan #GOTOpia | @confluentinc

Slide 37

Slide 37

Consumers have a position of their own Rick is here Scan Old New Fred is here Sally is here Scan @rmoff | Scan #GOTOpia | @confluentinc

Slide 38

Slide 38

c, _ := kafka.NewConsumer(&cm) defer c.Close() c.Subscribe(topic, nil) for { select { case ev := <-c.Events(): switch ev.(type) { case *kafka.Message: km := ev.(*kafka.Message) fmt.Printf(“✅ Message ‘%v’ received from topic ‘%v’\n”, string(km.Value), string(*km.TopicPartition.Topic)) } } }

Slide 39

Slide 39

Consuming From Kafka - Single Consumer Partition 1 Partition 2 C Partition 3 Partition 4 @rmoff | #GOTOpia | @confluentinc

Slide 40

Slide 40

Consuming From Kafka - Multiple Consumers C1 Partition 1 Partition 2 Partition 3 C2 Partition 4 @rmoff | #GOTOpia | @confluentinc

Slide 41

Slide 41

Consuming From Kafka - Grouped Consumers CC1 1 CC1 1 Partition 1 Partition 2 Partition 3 C2 Partition 4 @rmoff | #GOTOpia | @confluentinc

Slide 42

Slide 42

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 C3 C4 Partition 4 @rmoff | #GOTOpia | @confluentinc

Slide 43

Slide 43

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 C3 3 #GOTOpia | Partition 4 @rmoff | @confluentinc

Slide 44

Slide 44

Consuming From Kafka - Grouped Consumers Partition 1 C1 Partition 2 Partition 3 C2 C3 Partition 4 @rmoff | #GOTOpia | @confluentinc

Slide 45

Slide 45

Consumers partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … Partitioned Topic consumer B • A client application • Reads messages from topics • Horizontally, elastically scalable (if stateless) • Java, Go, .NET, C/C++, Python, everything else Plus REST proxy if not

Slide 46

Slide 46

BROKERS and REPLICATION @rmoff | #GOTOpia | @confluentinc

Slide 47

Slide 47

Leader Partition Leadership and Replication Follower Partition 1 Partition 2 Partition 3 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff | #GOTOpia | @confluentinc

Slide 48

Slide 48

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff | #GOTOpia | @confluentinc

Slide 49

Slide 49

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff | #GOTOpia | @confluentinc

Slide 50

Slide 50

So far, this is Pretty good @rmoff | #GOTOpia | @confluentinc

Slide 51

Slide 51

So far, this is Pretty good but I’ve not finished yet… @rmoff | #GOTOpia | @confluentinc

Slide 52

Slide 52

Streaming Pipelines Amazon S3 RDBMS HDFS @rmoff | #GOTOpia | @confluentinc

Slide 53

Slide 53

Evolve processing from old systems to new Existing New App <x> App RDBMS @rmoff | #GOTOpia | @confluentinc

Slide 54

Slide 54

Slide 55

Slide 55

Streaming Integration with Kafka Connect syslog Sources Kafka Connect @rmoff | Kafka Brokers #GOTOpia | @confluentinc

Slide 56

Slide 56

Streaming Integration with Kafka Connect Amazon Sinks Google Kafka Connect @rmoff | Kafka Brokers #GOTOpia | @confluentinc

Slide 57

Slide 57

Streaming Integration with Kafka Connect Amazon syslog Google Kafka Connect @rmoff | Kafka Brokers #GOTOpia | @confluentinc

Slide 58

Slide 58

Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } @rmoff | #GOTOpia | @confluentinc

Slide 59

Slide 59

Extensible Connector Transform(s) @rmoff Converter | #GOTOpia | @confluentinc

Slide 60

Slide 60

hub.confluent.io @rmoff | #GOTOpia | @confluentinc

Slide 61

Slide 61

K V

Slide 62

Slide 62

K V

Slide 63

Slide 63

K V

Slide 64

Slide 64

K V

Slide 65

Slide 65

K V ? s i h t s ’ t a h w … t i a W

Slide 66

Slide 66

Lack of schemas – Coupling teams and services 2001 2001 Citrus Heights-Sunrise Blvd Citrus_Hghts 60670001 3400293 34 SAC Sacramento SV Sacramento Valley SAC Sacramento County APCD SMA8 Sacramento Metropolitan Area CA 6920 Sacramento 28 6920 13588 7400 Sunrise Blvd 95610 38 41 56 38.6988889 121 16 15.98999977 -121.271111 10 4284781 650345 52 @rmoff | #GOTOpia | @confluentinc

Slide 67

Slide 67

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV @rmoff | #GOTOpia | @confluentinc

Slide 68

Slide 68

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV 👍 👍 👍 😬 https://rmoff.dev/qcon-schemas @rmoff | #GOTOpia | @confluentinc

Slide 69

Slide 69

Schemas Schema Registry Topic producer … consumer

Slide 70

Slide 70

partition 0 consumer A … consumer A partition 1 … consumer A partition 2 … consumer B Partitioned Topic @rmoff | #GOTOpia | @confluentinc

Slide 71

Slide 71

consumer A consumer A consumer A @rmoff | #GOTOpia | @confluentinc

Slide 72

Slide 72

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff | #GOTOpia | @confluentinc

Slide 73

Slide 73

Streams of events Time @rmoff | #GOTOpia | @confluentinc

Slide 74

Slide 74

Stream Processing Stream: widgets Stream: widgets_red @rmoff | #GOTOpia | @confluentinc

Slide 75

Slide 75

Stream Processing with Kafka Streams Stream: widgets final StreamsBuilder builder = new StreamsBuilder() .stream(“widgets”, Consumed.with(stringSerde, widgetsSerde)) .filter( (key, widget) -> widget.getColour().equals(“RED”) ) .to(“widgets_red”, Produced.with(stringSerde, widgetsSerde)); Stream: widgets_red @rmoff | #GOTOpia | @confluentinc

Slide 76

Slide 76

Streams Application Streams Application Streams Application @rmoff | #GOTOpia | @confluentinc

Slide 77

Slide 77

Stream Processing with ksqlDB Stream: widgets ksqlDB CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red @rmoff | #GOTOpia | @confluentinc

Slide 78

Slide 78

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff | #GOTOpia | @confluentinc

Slide 79

Slide 79

SELECT * FROM WIDGETS WHERE WEIGHT_G > 120 { SELECT COUNT(*) FROM WIDGETS GROUP BY PRODUCTION_LINE } “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash SELECT AVG(TEMP_CELCIUS) AS TEMP FROM WIDGETS GROUP BY SENSOR_ID HAVING TEMP>20 CREATE SINK CONNECTOR dw WITH ( Object store, ‘connector.class’ = ‘S3Connector’, data warehouse, ‘topics’ = ‘widgets’ RDBMS …); @rmoff | #GOTOpia | @confluentinc

Slide 80

Slide 80

Photo by Raoul Droog on Unsplas DEMO @rmoff | #GOTOpia | @confluentinc

Slide 81

Slide 81

Summary @rmoff | #GOTOpia | @confluentinc

Slide 82

Slide 82

@rmoff | #GOTOpia | @confluentinc

Slide 83

Slide 83

K V @rmoff | #GOTOpia | @confluentinc

Slide 84

Slide 84

K V @rmoff | #GOTOpia | @confluentinc

Slide 85

Slide 85

The Log @rmoff | #GOTOpia | @confluentinc

Slide 86

Slide 86

Producer Consumer The Log @rmoff | #GOTOpia | @confluentinc

Slide 87

Slide 87

Producer Consumer Connectors The Log @rmoff | #GOTOpia | @confluentinc

Slide 88

Slide 88

Producer Consumer Connectors The Log Streaming Engine @rmoff | #GOTOpia | @confluentinc

Slide 89

Slide 89

Apache Kafka Producer Consumer Connectors The Log Streaming Engine @rmoff | #GOTOpia | @confluentinc

Slide 90

Slide 90

Confluent Platform ksqlDB Producer Consumer Connectors The Log Schema Registry Streaming Engine @rmoff | #GOTOpia | @confluentinc

Slide 91

Slide 91

Free Books! https://rmoff.dev/gotopia @rmoff | #GOTOpia | @confluentinc

Slide 92

Slide 92

60 DE VA DV $200 USD off your bill each calendar month for the first three months when you sign up https://rmoff.dev/ccloud Free money! (additional $60 towards your bill 😄 ) Fully Managed Kafka as a Service * T&C: https://www.confluent.io/confluent-cloud-promo-disclaimer

Slide 93

Slide 93

Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io

Slide 94

Slide 94

#EOF @rmoff rmoff.dev/talks youtube.com/rmoff