Kafka as a Platform: the Ecosystem from the Ground Up

A presentation at Budapest Data Forum in September 2020 in Budapest, Hungary by Robin Moffatt

Slide 1

Slide 1

Kafka as a Platform: the Ecosystem from the Ground Up Robin Moffatt | #BudapestData | @rmoff

Slide 2

Slide 2

$ whoami • Robin Moffatt (@rmoff) • Senior Developer Advocate at Confluent (Apache Kafka, not Wikis 😉) • Working in data & analytics since 2001 • Oracle ACE Director (Alumnus) http://rmoff.dev/talks · http://rmoff.dev/blog · http://rmoff.dev/youtube @rmoff | #BudapestData | @confluentinc

Slide 3

Slide 3

EVENTS @rmoff | #BudapestData | @confluentinc

Slide 4

Slide 4

EVENTS @rmoff | #BudapestData | @confluentinc

Slide 5

Slide 5

EVENTS n o i t a c fi i Not • e g n a h c e t a t S •

Slide 6

Slide 6

Human generated events A Sale A Stock movement @rmoff | #BudapestData | @confluentinc

Slide 7

Slide 7

Machine generated events IoT Networking Applications @rmoff | #BudapestData | @confluentinc

Slide 8

Slide 8

EVENTS are EVERYWHERE @rmoff | #BudapestData | @confluentinc

Slide 9

Slide 9

EVENTS y r e v ^ are POWERFUL @rmoff | #BudapestData | @confluentinc

Slide 10

Slide 10

Slide 11

Slide 11

Slide 12

Slide 12

K V

Slide 13

Slide 13

LOG @rmoff | #BudapestData | @confluentinc

Slide 14

Slide 14

K V

Slide 15

Slide 15

K V

Slide 16

Slide 16

K V

Slide 17

Slide 17

K V

Slide 18

Slide 18

K V

Slide 19

Slide 19

K V

Slide 20

Slide 20

K V

Slide 21

Slide 21

Immutable Event Log Old New Events are added at the end of the log @rmoff | #BudapestData | @confluentinc

Slide 22

Slide 22

TOPICS @rmoff | #BudapestData | @confluentinc

Slide 23

Slide 23

Topics Clicks Orders Customers Topics are similar in concept to tables in a database @rmoff | #BudapestData | @confluentinc

Slide 24

Slide 24

PARTITIONS @rmoff | #BudapestData | @confluentinc

Slide 25

Slide 25

Partitions Clicks p0 P1 P2 Messages are guaranteed to be strictly ordered within a partition @rmoff | #BudapestData | @confluentinc

Slide 26

Slide 26

PUB / SUB @rmoff | #BudapestData | @confluentinc

Slide 27

Slide 27

PUB / SUB @rmoff | #BudapestData | @confluentinc

Slide 28

Slide 28

Producing data Old New Messages are added at the end of the log @rmoff | #BudapestData | @confluentinc

Slide 29

Slide 29

partition 0 … partition 1 producer … partition 2 … Partitioned Topic

Slide 30

Slide 30

try (KafkaProducer<String, Payment> producer = new KafkaProducer<String, Payment>(props)) { for (long i = 0; i < 10; i++) { final String orderId = “id” + Long.toString(i); final Payment payment = new Payment(orderId, 1000.00d); final ProducerRecord<String, Payment> record = new ProducerRecord<String, Payment>(“transactions”, payment.getId().toString(), payment); producer.send(record); } } catch (final InterruptedException e) { e.printStackTrace(); }

Slide 31

Slide 31

Producing to Kafka - No Key Time Partition 1 Partition 2 Partition 3 Messages will be produced in a round robin fashion Partition 4 @rmoff | #BudapestData | @confluentinc

Slide 32

Slide 32

Producing to Kafka - With Key Time Partition 1 A Partition 2 B hash(key) % numPartitions = N Partition 3 C Partition 4 D @rmoff | #BudapestData | @confluentinc

Slide 33

Slide 33

Producers partition 0 … partition 1 producer … partition 2 … Partitioned Topic • A client application • Puts messages into topics • Handles partitioning, network protocol • Java, Go, .NET, C/C++, Python • Also every other language

Slide 34

Slide 34

PUB / SUB @rmoff | #BudapestData | @confluentinc

Slide 35

Slide 35

Consuming data - access is only sequential Read to offset & scan Old New @rmoff | #BudapestData | @confluentinc

Slide 36

Slide 36

Consumers have a position of their own Old New Sally is here @rmoff | Scan #BudapestData | @confluentinc

Slide 37

Slide 37

Consumers have a position of their own Old New Fred is here Sally is here Scan @rmoff | Scan #BudapestData | @confluentinc

Slide 38

Slide 38

Consumers have a position of their own Rick is here Scan Old New Fred is here Sally is here Scan @rmoff | Scan #BudapestData | @confluentinc

Slide 39

Slide 39

try (final KafkaConsumer<String, Payment> consumer = new KafkaConsumer<>(props)) { consumer.subscribe(Collections.singletonList(TOPIC)); while (true) { ConsumerRecords<String, Payment> records = consumer.poll(100); for (ConsumerRecord<String, Payment> record : records) { String key = record.key(); Payment value = record.value(); System.out.printf(“key = %s, value = %s%n”, key, value); } } }

Slide 40

Slide 40

partition 0 … partition 1 … partition 2 … Partitioned Topic consumer A

Slide 41

Slide 41

partition 0 … consumer A partition 1 … partition 2 … Partitioned Topic consumer B

Slide 42

Slide 42

Consuming From Kafka - Multiple Consumers C1 Partition 1 Partition 2 Partition 3 C2 Partition 4 @rmoff | #BudapestData | @confluentinc

Slide 43

Slide 43

Consuming From Kafka - Grouped Consumers CC1 1 CC1 1 Partition 1 Partition 2 Partition 3 C2 Partition 4 @rmoff | #BudapestData | @confluentinc

Slide 44

Slide 44

CONSUMERS CONSUMER GROUP COORDINATOR CONSUMER GROUP

Slide 45

Slide 45

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 C3 C4 Partition 4 @rmoff | #BudapestData | @confluentinc

Slide 46

Slide 46

Consuming From Kafka - Grouped Consumers Partition 1 Partition 2 Partition 3 C1 C2 C3 3 Partition 4 @rmoff | #BudapestData | @confluentinc

Slide 47

Slide 47

Consuming From Kafka - Grouped Consumers Partition 1 C1 Partition 2 Partition 3 C2 C3 Partition 4 @rmoff | #BudapestData | @confluentinc

Slide 48

Slide 48

Consumers partition 0 … partition 1 … consumer A consumer A consumer A partition 2 … Partitioned Topic consumer B • A client application • Reads messages from topics • Horizontally, elastically scalable (if stateless) • Java, Go, .NET, C/C++, Python, everything else

Slide 49

Slide 49

BROKERS and REPLICATION @rmoff | #BudapestData | @confluentinc

Slide 50

Slide 50

Leader Partition Leadership and Replication Follower Partition 1 Partition 2 Partition 3 Partition 4 Broker 1 Broker 2 @rmoff Broker 3 | #BudapestData | @confluentinc

Slide 51

Slide 51

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff | #BudapestData | @confluentinc

Slide 52

Slide 52

Leader Partition Leadership and Replication Follower Partition 1 Partition 1 Partition 1 Partition 2 Partition 2 Partition 2 Partition 3 Partition 3 Partition 3 Partition 4 Partition 4 Partition 4 Broker 1 Broker 2 Broker 3 @rmoff | #BudapestData | @confluentinc

Slide 53

Slide 53

Photo by Raoul Droog on Unsplas DEMO @rmoff | #BudapestData | @confluentinc

Slide 54

Slide 54

So far, this is Pretty good @rmoff | #BudapestData | @confluentinc

Slide 55

Slide 55

So far, this is Pretty good but I’ve not finished yet… @rmoff | #BudapestData | @confluentinc

Slide 56

Slide 56

Streaming Pipelines Amazon S3 RDBMS HDFS @rmoff | #BudapestData | @confluentinc

Slide 57

Slide 57

Evolve processing from old systems to new Existing New App <x> App RDBMS @rmoff | #BudapestData | @confluentinc

Slide 58

Slide 58

Slide 59

Slide 59

Streaming Integration with Kafka Connect syslog Sources Kafka Connect @rmoff | Kafka Brokers #BudapestData | @confluentinc

Slide 60

Slide 60

Streaming Integration with Kafka Connect Amazon Sinks Google Kafka Connect @rmoff | Kafka Brokers #BudapestData | @confluentinc

Slide 61

Slide 61

Streaming Integration with Kafka Connect Amazon syslog Google Kafka Connect @rmoff | Kafka Brokers #BudapestData | @confluentinc

Slide 62

Slide 62

Look Ma, No Code! { “connector.class”: “io.confluent.connect.jdbc.JdbcSourceConnector”, “connection.url”: “jdbc:mysql://asgard:3306/demo”, “table.whitelist”: “sales,orders,customers” } @rmoff | #BudapestData | @confluentinc

Slide 63

Slide 63

Extensible Connector Transform(s) @rmoff | Converter #BudapestData | @confluentinc

Slide 64

Slide 64

hub.confluent.io @rmoff | #BudapestData | @confluentinc

Slide 65

Slide 65

Single Kafka Connect node S3 Task #1 JDBC Task #1 JDBC Task #2 Kafka Connect cluster Worker Worker Offsets Config Status @rmoff | #BudapestData | @confluentinc

Slide 66

Slide 66

Kafka Connect - scalable and fault-tolerant S3 Task #1 JDBC Task #1 Kafka Connect cluster JDBC Task #2 Worker Worker Offsets Config Status @rmoff | #BudapestData | @confluentinc

Slide 67

Slide 67

Automatic fault tolerance S3 Task #1 JDBC Task #1 JDBC Task #2 Kafka Connect cluster Worker Worker Offsets Config Status @rmoff | #BudapestData | @confluentinc

Slide 68

Slide 68

K V

Slide 69

Slide 69

K V

Slide 70

Slide 70

K V

Slide 71

Slide 71

K V

Slide 72

Slide 72

K V ? s i h t s ’ t a h w … t i a W

Slide 73

Slide 73

How do you serialise your data? JSON Avro Protobuf Schema JSON CSV @rmoff | #BudapestData | @confluentinc

Slide 74

Slide 74

APIs are contracts between services {user_id: 53, address: “2 Elm st.”} Profile service Quote service {user_id: 53, quote: 580} @rmoff | #BudapestData | @confluentinc

Slide 75

Slide 75

But not all services {user_id: 53, address: “2 Elm st.”} Profile service Quote service {user_id: 53, quote: 580} @rmoff | #BudapestData | @confluentinc

Slide 76

Slide 76

And naturally… {user_id: 53, address: “2 Elm st.”} Profile service Quote service Profile database @rmoff Stream processing | #BudapestData | @confluentinc

Slide 77

Slide 77

Schemas are about how teams work together {user_id: 53, timestamp: 1497842472} new Date(timestamp) Profile service Quote service Profile database Stream processing create table ( user_id number, timestamp number) @rmoff | #BudapestData | @confluentinc

Slide 78

Slide 78

Things change… {user_id: 53, timestamp: “June 28, 2017 4:00pm”} Profile service Quote service Profile database @rmoff Stream processing | #BudapestData | @confluentinc

Slide 79

Slide 79

Moving fast and breaking things {user_id: 53, timestamp: “June 28, 2017 4:00pm”} Profile service Quote service Profile database @rmoff Stream processing | #BudapestData | @confluentinc

Slide 80

Slide 80

Lack of schemas – Coupling teams and services 2001 2001 Citrus Heights-Sunrise Blvd Citrus_Hghts 60670001 3400293 34 SAC Sacramento SV Sacramento Valley SAC Sacramento County APCD SMA8 Sacramento Metropolitan Area CA 6920 Sacramento 28 6920 13588 7400 Sunrise Blvd 95610 38 41 56 38.6988889 121 16 15.98999977 -121.271111 10 4284781 650345 52 @rmoff | #BudapestData | @confluentinc

Slide 81

Slide 81

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV @rmoff | #BudapestData | @confluentinc

Slide 82

Slide 82

Serialisation & Schemas JSON Avro Protobuf Schema JSON CSV 👍 👍 👍 😬 https://qconnewyork.com/system/files/presentation-slides/qcon_17_-_schemas_and_apis.pdf @rmoff | #BudapestData | @confluentinc

Slide 83

Slide 83

Schemas Schema Registry Topic producer … consumer

Slide 84

Slide 84

n a y l l Actua ! c i p o t l a n r e t in Schemas Schema Registry Topic producer … consumer

Slide 85

Slide 85

Producers contain serializers props.put(“key.serializer”, “org.apache.kafka.serializers.StringSerializer”); props.put(“value.serializer”, “io.confluent.kafka.serializers.KafkaAvroSerializer”); props.put(“schema.registry.url”, “http://schema-registry:8081”); … producer<String, LogLine> producer = new KafkaProducer<String, LogLine>(props); @rmoff | #BudapestData | @confluentinc

Slide 86

Slide 86

partition 0 consumer A … consumer A partition 1 … consumer A partition 2 … consumer B Partitioned Topic @rmoff | #BudapestData | @confluentinc

Slide 87

Slide 87

consumer A consumer A consumer A @rmoff | #BudapestData | @confluentinc

Slide 88

Slide 88

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff | #BudapestData | @confluentinc

Slide 89

Slide 89

Streams of events Time @rmoff | #BudapestData | @confluentinc

Slide 90

Slide 90

Stream Processing Stream: widgets Stream: widgets_red @rmoff | #BudapestData | @confluentinc

Slide 91

Slide 91

Stream Processing with Kafka Streams Stream: widgets final StreamsBuilder builder = new StreamsBuilder() .stream(“widgets”, Consumed.with(stringSerde, widgetsSerde)) .filter( (key, widget) -> widget.getColour().equals(“RED”) ) .to(“widgets_red”, Produced.with(stringSerde, widgetsSerde)); Stream: widgets_red @rmoff | #BudapestData | @confluentinc

Slide 92

Slide 92

Streams Application Streams Application Streams Application @rmoff | #BudapestData | @confluentinc

Slide 93

Slide 93

Stream Processing with ksqlDB Stream: widgets ksqlDB CREATE STREAM widgets_red AS SELECT * FROM widgets WHERE colour=’RED’; Stream: widgets_red @rmoff | #BudapestData | @confluentinc

Slide 94

Slide 94

} “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash { @rmoff | #BudapestData | @confluentinc

Slide 95

Slide 95

SELECT * FROM WIDGETS WHERE WEIGHT_G > 120 { SELECT COUNT(*) FROM WIDGETS GROUP BY PRODUCTION_LINE } “reading_ts”: “2020-02-14T12:19:27Z”, “sensor_id”: “aa-101”, “production_line”: “w01”, “widget_type”: “acme94”, “temp_celcius”: 23, “widget_weight_g”: 100 Photo by Franck V. on Unsplash SELECT AVG(TEMP_CELCIUS) AS TEMP FROM WIDGETS GROUP BY SENSOR_ID HAVING TEMP>20 CREATE SINK CONNECTOR dw WITH ( Object store, ‘connector.class’ = ‘S3Connector’, data warehouse, ‘topics’ = ‘widgets’ RDBMS …); @rmoff | #BudapestData | @confluentinc

Slide 96

Slide 96

Stream Processing with ksqlDB Source stream @rmoff | #BudapestData | @confluentinc

Slide 97

Slide 97

Stream Processing with ksqlDB Source stream @rmoff | #BudapestData | @confluentinc

Slide 98

Slide 98

Stream Processing with ksqlDB Source stream @rmoff | #BudapestData | @confluentinc

Slide 99

Slide 99

Stream Processing with ksqlDB Source stream Analytics @rmoff | #BudapestData | @confluentinc

Slide 100

Slide 100

Stream Processing with ksqlDB Source stream Applications / Microservices @rmoff | #BudapestData | @confluentinc

Slide 101

Slide 101

Stream Processing with ksqlDB …SUM(TXN_AMT) GROUP BY AC_ID AC _I D= 42 BA LA NC AC E= _I 94 D= .0 42 0 Source stream Applications / Microservices @rmoff | #BudapestData | @confluentinc

Slide 102

Slide 102

ksqlDB or Kafka Streams? @rmoff | #BudapestData | @confluentinc Photo by Ramiz Dedaković on Unsplash

Slide 103

Slide 103

Standing on the Shoulders of Streaming Giants ksqlDB Powered by Ease of use ksqlDB UDFs Kafka Streams Powered by Producer, Consumer APIs Flexibility @rmoff | #BudapestData | @confluentinc

Slide 104

Slide 104

Photo by Raoul Droog on Unsplas DEMO @rmoff | #BudapestData | @confluentinc

Slide 105

Slide 105

Summary @rmoff | #BudapestData | @confluentinc

Slide 106

Slide 106

@rmoff | #BudapestData | @confluentinc

Slide 107

Slide 107

K V @rmoff | #BudapestData | @confluentinc

Slide 108

Slide 108

K V @rmoff | #BudapestData | @confluentinc

Slide 109

Slide 109

The Log @rmoff | #BudapestData | @confluentinc

Slide 110

Slide 110

Producer Consumer The Log @rmoff | #BudapestData | @confluentinc

Slide 111

Slide 111

Producer Consumer Connectors The Log @rmoff | #BudapestData | @confluentinc

Slide 112

Slide 112

Producer Consumer Connectors The Log Streaming Engine @rmoff | #BudapestData | @confluentinc

Slide 113

Slide 113

Apache Kafka Producer Consumer Connectors The Log Streaming Engine @rmoff | #BudapestData | @confluentinc

Slide 114

Slide 114

EVENTS are EVERYWHERE @rmoff | #BudapestData | @confluentinc

Slide 115

Slide 115

EVENTS y r e v ^ are POWERFUL @rmoff | #BudapestData | @confluentinc

Slide 116

Slide 116

Free Books! https://rmoff.dev/budapestdata @rmoff | #BudapestData | @confluentinc

Slide 117

Slide 117

60 DE VA DV $50 USD off your bill each calendar month for the first three months when you sign up https://rmoff.dev/690 Free money! (additional $60 towards your bill 😄 ) Fully Managed Kafka as a Service * Limited availability. Activate by 11th September 2020. Expires after 90 days of activation. Any unused promo value on the expiration date will be forfeited.

Slide 118

Slide 118

Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io

Slide 119

Slide 119

Confluent Community Slack group cnfl.io/slack @rmoff | #BudapestData | @confluentinc

Slide 120

Slide 120

@rmoff #BudapestData #EOF rmoff.dev/talks