🐲 Here be Dragons^H^H Stacktraces β€” Flink SQL for Non-Java Developers

A presentation at Kafka Summit 2024 in March 2024 in London, UK by Robin Moffatt

Slide 1

Slide 1

@rmoff / 19 Mar 2024 / #kafkasum t 🐲 Here be Dragons Stacktraces Flink SQL for Non-Java Developers i m Robin Moffatt, Principal DevEx Engineer @ Decodable

Slide 2

Slide 2

@decodableco @rmoff / #kafkasum m i i m Actual footage of a SQL Developer looking at Apache Flink for the first t e t

Slide 3

Slide 3

i @rmoff / #kafkasum m @decodableco t

Slide 4

Slide 4

@decodableco @rmoff / #kafkasum i m What Is Apache Flink? t

Slide 5

Slide 5

@decodableco @rmoff / #kafkasum i m A Brief History of Flink t

Slide 6

Slide 6

@decodableco @rmoff / #kafkasum i m or t

Slide 7

Slide 7

i @rmoff / #kafkasum m @decodableco t

Slide 8

Slide 8

@decodableco @rmoff / #kafkasum Back in the t e of dinosaurs Hadoop β€’ Started life as a research project in 2 called Stratosphere. β€’ This was the t 1, 1 0 m m i i i m e of MapReduce. Java and Scala were the only way to do this. t

Slide 9

Slide 9

@rmoff / #kafkasum Flink is a big project β€’ Flink β€’ Stateful Functions β€’ β€’ Kubernetes Operator β€’ CDC Connector i on (incubating) m m i β€’ Pa L M @decodableco t

Slide 10

Slide 10

@decodableco @rmoff / #kafkasum i m Capabilities t

Slide 11

Slide 11

@decodableco @rmoff / #kafkasum Connect to Lots of Source and Target Systems JDBC CDC Kinesis i m DynamoDB Object stores Kinesis Firehose t

Slide 12

Slide 12

@decodableco @rmoff / #kafkasum Stateful and Stateless computations β€’ Filtering SELECT * FROM myStream WHERE foo=42 β€’ Joining SELECT a., b. FROM myStream a INNER JOIN myLookup b ON a.id=b.foo_id β€’ Transfor ng SELECT cost * tax_rate AS total_cost FROM myStream β€’ Pattern matching SELECT * FROM myStream MATCH_RECOGNIZE ( PART ON BY id ORDER BY user_action_t […] I T I i m M i m β€’ …and a whole lot more m SELECT SU (order_value) AS total_order_values FROM orders i β€’ Aggregations e t

Slide 13

Slide 13

@decodableco @rmoff / #kafkasum Batch and Strea i m i m Bounded and Unbounded ng t

Slide 14

Slide 14

@decodableco @rmoff / #kafkasum i m How Does Flink Work? t

Slide 15

Slide 15

@decodableco @rmoff / #kafkasum t i m << magic πŸͺ„ πŸ§™ >>

Slide 16

Slide 16

@decodableco @rmoff / #kafkasum i m // https: t nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/

Slide 17

Slide 17

@decodableco @rmoff / #kafkasum t n e p O s ’ t e L : e n i g n E L Q S s ’ ! k m n o o R Fli e n i g n E the r e h t l a W o T πŸ—£ y a d s e u πŸ“† T m p 0 3 : 5 ⏰ 2 m o o R t u o k a e r πŸ—Ί B i m m i // https: nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/

Slide 18

Slide 18

@decodableco @rmoff / #kafkasum t Running Flink Works on my machine… $ ./bin/start-cluster.sh Starting cluster. Starting standalonesession daemon on host asgard08. Starting taskexecutor daemon on host asgard08. e.taskexecutor. askManagerRunner e.entrypoint.StandaloneSessionClusterEntrypoint T m m i i i m $ jps -l 14656 org.apache.flink.runt 14379 org.apache.flink.runt

Slide 19

Slide 19

@decodableco @rmoff / #kafkasum i m Using Flink t

Slide 20

Slide 20

@decodableco @rmoff / #kafkasum It’s not just Java β€’ PyFlink β€’ added in 1.9. in 2 9 in 2 8 β€’ Flink SQL 1 1 0 0 0 0 i m β€’ Added in 1.5. t

Slide 21

Slide 21

@decodableco @rmoff / #kafkasum i m Flink SQL t

Slide 22

Slide 22

@decodableco @rmoff / #kafkasum SQL Language Support β€’ Built on Apache Calcite β€’ Common Table Expression (C ) ( β€’ Set-based operations β€’ Joins β€’ Aggregations T I W E T i m β€’ And lots more… H) t

Slide 23

Slide 23

@decodableco @rmoff / #kafkasum Running Flink SQL β€’ SQL Client β€’ SQL Gateway β€’ RE API β€’ Hive β€’ JDBC Driver i m T S β€’ From Java or Python t

Slide 24

Slide 24

@decodableco @rmoff / #kafkasum t Flink SQL Client $ ./bin/sql-client.sh Welcome! Enter β€˜HELP;’ to list all available commands. β€˜QU ;’ to exit. Command history file path: /opt/ flink/.flink-sql-history Flink SQL> β–’β–“β–ˆβ–ˆβ–“β–ˆβ–ˆβ–’ β–“β–ˆβ–ˆβ–ˆβ–ˆβ–’β–’β–ˆβ–“β–’β–“β–ˆβ–ˆβ–ˆβ–“β–’ β–“β–ˆβ–ˆβ–ˆβ–“β–‘β–‘ β–’β–’β–’β–“β–ˆβ–ˆβ–’ β–’ β–‘β–ˆβ–ˆβ–’ β–’β–’β–“β–“β–ˆβ–“β–“β–’β–‘ β–’β–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–’ β–‘β–’β–“β–ˆβ–ˆβ–ˆβ–’ β–’β–ˆβ–’β–ˆβ–’ β–‘β–“β–ˆ β–ˆβ–ˆβ–ˆ β–“β–‘β–’β–ˆβ–ˆ β–“β–ˆ β–’β–’β–’β–’β–’β–“β–ˆβ–ˆβ–“β–‘β–’β–‘β–“β–“β–ˆ β–ˆβ–‘ β–ˆ β–’β–’β–‘ β–ˆβ–ˆβ–ˆβ–“β–“β–ˆ β–’β–ˆβ–’β–’β–’ β–ˆβ–ˆβ–ˆβ–ˆβ–‘ β–’β–“β–ˆβ–“ β–ˆβ–ˆβ–’β–’β–’ β–“β–ˆβ–ˆβ–ˆβ–’ β–‘β–’β–ˆβ–“β–“β–ˆβ–ˆ β–“β–ˆβ–’ β–“β–ˆβ–’β–“β–ˆβ–ˆβ–“ β–‘β–ˆβ–‘ β–“β–‘β–’β–“β–ˆβ–ˆβ–ˆβ–ˆβ–’ β–ˆβ–ˆ β–’β–ˆ β–ˆβ–“β–‘β–’β–ˆβ–’β–‘β–’β–ˆβ–’ β–ˆβ–ˆβ–ˆβ–“β–‘β–ˆβ–ˆβ–“ β–“β–ˆ β–ˆ β–ˆβ–“ β–’β–“β–ˆβ–“β–“β–ˆβ–’ β–‘β–ˆβ–ˆβ–“ β–‘β–ˆβ–‘ β–ˆ β–ˆβ–’ β–’β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–’ β–ˆβ–ˆβ–“β–‘β–’ β–ˆβ–ˆβ–ˆβ–‘ β–‘ β–ˆβ–‘ β–“ β–‘β–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–’β–‘β–‘ β–‘β–ˆβ–‘β–“ β–“β–‘ β–ˆβ–ˆβ–“β–ˆ β–’β–’β–“β–’ β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–‘ β–’β–ˆβ–’ β–’β–“ β–“β–ˆβ–ˆβ–“ β–’β–ˆβ–ˆβ–“ β–“β–ˆ β–ˆβ–“β–ˆ β–‘β–’β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–“β–’β–‘ β–ˆβ–ˆβ–’β–’ β–ˆ β–’ β–“β–ˆβ–’ β–“β–ˆβ–“ β–“β–ˆ β–ˆβ–ˆβ–“ β–‘β–“β–“β–“β–“β–“β–“β–“β–’ β–’β–ˆβ–ˆβ–“ β–‘β–ˆβ–’ β–“β–ˆ β–ˆ β–“β–ˆβ–ˆβ–ˆβ–“β–’β–‘ β–‘β–“β–“β–“β–ˆβ–ˆβ–ˆβ–“ β–‘β–’β–‘ β–“β–ˆ β–ˆβ–ˆβ–“ β–ˆβ–ˆβ–’ β–‘β–’β–“β–“β–ˆβ–ˆβ–ˆβ–“β–“β–“β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–’ β–“β–ˆβ–ˆβ–ˆ β–ˆ β–“β–ˆβ–ˆβ–ˆβ–’ β–ˆβ–ˆβ–ˆ β–‘β–“β–“β–’β–‘β–‘ β–‘β–“β–ˆβ–ˆβ–ˆβ–ˆβ–“β–‘ β–‘β–’β–“β–’ β–ˆβ–“ β–ˆβ–“β–’β–’β–“β–“β–ˆβ–ˆ β–‘β–’β–’β–‘β–‘β–‘β–’β–’β–’β–’β–“β–ˆβ–ˆβ–“β–‘ β–ˆβ–“ β–ˆβ–ˆ β–“β–‘β–’β–ˆ β–“β–“β–“β–“β–’β–‘β–‘ β–’β–ˆβ–“ β–’β–“β–“β–ˆβ–ˆβ–“ β–“β–’ β–’β–’β–“ β–“β–ˆβ–“ β–“β–’β–ˆ β–ˆβ–“β–‘ β–‘β–’β–“β–“β–ˆβ–ˆβ–’ β–‘β–“β–ˆβ–’ β–’β–’β–’β–‘β–’β–’β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–’ β–ˆβ–ˆβ–‘ β–“β–ˆβ–’β–ˆβ–’ β–’β–“β–“β–’ β–“β–ˆ β–ˆβ–‘ β–‘β–‘β–‘β–‘ β–‘β–ˆβ–’ β–“β–ˆ β–’β–ˆβ–“ β–‘ β–ˆβ–‘ β–’β–ˆ β–ˆβ–“ β–ˆβ–“ β–ˆβ–ˆ β–ˆβ–‘ β–“β–“ β–’β–ˆβ–“β–“β–“β–’β–ˆβ–‘ β–ˆβ–“ β–‘β–“β–ˆβ–ˆβ–‘ β–“β–’ β–“β–ˆβ–“β–’β–‘β–‘β–‘β–’β–“β–ˆβ–‘ β–’β–ˆ β–ˆβ–ˆ β–“β–ˆβ–“β–‘ β–’ β–‘β–’β–ˆβ–’β–ˆβ–ˆβ–’ β–“β–“ β–“β–ˆβ–’ β–’β–ˆβ–“β–’β–‘ β–’β–’ β–ˆβ–’β–ˆβ–“β–’β–’β–‘β–‘β–’β–ˆβ–ˆ β–‘β–ˆβ–ˆβ–’ β–’β–“β–“β–’ β–“β–ˆβ–ˆβ–“β–’β–ˆβ–’ β–‘β–“β–“β–“β–“β–’β–ˆβ–“ β–‘β–“β–ˆβ–ˆβ–’ β–“β–‘ β–’β–ˆβ–“β–ˆ β–‘β–‘β–’β–’β–’ β–’β–“β–“β–“β–“β–“β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–‘β–‘β–“β–“ β–“β–‘β–’β–ˆβ–‘ T I i m ______ _ _ _ _____ ____ _ _____ _ _ _ BETA | | () | | / |/ __ | | / | () | | | | | | _ __ | | __ | ( | | | | | | | | | ___ _ __ | | | | | | | β€˜ | |/ / _ | | | | | | | | | |/ _ \ β€˜ | | | | | | | | | | < ____) | || | |____ | || | | _/ | | | | || |||| |||_\ |/ ___| ___|||_|| ||__|

Slide 25

Slide 25

@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t

Slide 26

Slide 26

@decodableco @rmoff / #kafkasum i m A Few Useful Settings t

Slide 27

Slide 27

@decodableco @rmoff / #kafkasum Runt S e Mode β€˜execution.runt β€’ strea e-mode’ = β€˜strea ng [default] i m m i i m m i i m T E β€’ batch ng’; t

Slide 28

Slide 28

@decodableco @rmoff / #kafkasum Result Mode S β€˜sql-client.execution.result-mode’ = β€˜table’; β€’ table [default] β€’ changelog i m T E β€’ tableau t

Slide 29

Slide 29

@decodableco @rmoff / #kafkasum Colour Scheme S β€˜sql-client.display.color-schema’ = β€˜Chester’; i m T E β€’ Because why not?! t

Slide 30

Slide 30

@decodableco @rmoff / #kafkasum Changing the defaults Setting up a SQL Client initialisation file β€’ Create a SQL file: $ cat init.sql S β€˜execution.runt e-mode’ = β€˜batch’; S β€˜sql-client.execution.result-mode’ = β€˜tableau’; β€’ Launch SQL Client as a parameter: th the -i flag and pass the file i w m i i m T T E E ./bin/sql-client.sh -i init.sql t

Slide 31

Slide 31

@decodableco @rmoff / #kafkasum Sub tting SQL as a job β€’ SQL Client $ ./bin/sql-client.sh β€”file ~/my_query.sql β€’ SQL Gateway curl β€”location β€˜localhost:8083/sessions/42/statements’ \ β€”header β€˜Content-Type: application/json’ \ β€”header β€˜Accept: application/json’ \ β€”data β€˜{ β€œstatement”: β€œSELECT * FROM foo;” }’ β€’ Application mode support: FLIP-316: Support application mode for i m i m SQL Gateway t

Slide 32

Slide 32

@decodableco @rmoff / #kafkasum i m Some of the Gnarly Stuff t

Slide 33

Slide 33

@decodableco @rmoff / #kafkasum The Joy of JARs β€’ For each connector, format, and catalog you need to install dependencies. β€’ All of these are i m available as JARs (Java ARchive) t

Slide 34

Slide 34

@decodableco @rmoff / #kafkasum This ght jar a bit… Could not execute SQL statement. Reason: java.lang.ClassNotFoundException org.apache.flink.core.fs.UnsupportedFileSy ste chemeException: Could not find a file system plementation for scheme β€˜s3’ m i m i i i m m S m Could not find any factory for identifier β€˜hive’ that plements β€˜org.apache.flink.table.factories.CatalogF actory’ in the classpath. t

Slide 35

Slide 35

@decodableco @rmoff / #kafkasum Finding JARs β€’ Usually the docs ll tell you which JAR you need. β€’ JARs are very specific to the versions of the tools that i w i m you’re using. t

Slide 36

Slide 36

i @rmoff / #kafkasum m @decodableco t

Slide 37

Slide 37

i @rmoff / #kafkasum m @decodableco t

Slide 38

Slide 38

i @rmoff / #kafkasum m @decodableco t

Slide 39

Slide 39

$ tree /opt/flink/lib @rmoff / #kafkasum β”œβ”€β”€ aws β”‚ β”œβ”€β”€ aws-java-sdk-bundle-1.12.648.jar β”‚ └── hadoop-aws-3.3.4.jar β”œβ”€β”€ flink-cep-1.18.1.jar β”œβ”€β”€ flink-parquet_2.12-1.18.1.jar β”œβ”€β”€ flink-table-runt e-1.18.1.jar β”œβ”€β”€ hadoop β”‚ β”œβ”€β”€ commons-configuration2-2.1.1.jar β”‚ β”œβ”€β”€ commons-logging-1.1.3.jar β”‚ β”œβ”€β”€ hadoop-auth-3.3.4.jar β”œβ”€β”€ hive β”‚ └── flink-sql-connector-hive-3.1.3_2.12-1.18.1.jar β”œβ”€β”€ iceberg β”‚ └── iceberg-flink-runt β”œβ”€β”€ kafka e-1.18-1.5. .jar 0 0 m i m └── flink-sql-connector-kafka-3.1. -1.18.jar i i β”‚ m @decodableco t

Slide 40

Slide 40

@decodableco @rmoff / #kafkasum i m Don’t forget to restart! t

Slide 41

Slide 41

@decodableco @rmoff / #kafkasum i m Tables, Connectors, and Catalogs t

Slide 42

Slide 42

@decodableco @rmoff / #kafkasum Tables CREA ( TABLE t_k_orders T T T T S S S m i m T E I T orderid RING, customerid RING, ordernumber IN , product RING, discountpercent INT ) W H ( β€˜connector’ β€˜topic’ β€˜properties.bootstrap.servers’ β€˜scan.startup. ode’ β€˜format’ ); = = = = = β€˜kafka’, β€˜orders’, β€˜broker:29092’, β€˜earliest-offset’, β€˜json’ t

Slide 43

Slide 43

@decodableco @rmoff / #kafkasum This used to be s ple β€’ The data and information about the data was all stored in the database β€’ Information Schema β€’ System Catalog m i i m β€’ Data Dictionary Views t

Slide 44

Slide 44

@decodableco @rmoff / #kafkasum m i i m Now it’s not so s ple t

Slide 45

Slide 45

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

Slide 46

Slide 46

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

Slide 47

Slide 47

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

Slide 48

Slide 48

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

Slide 49

Slide 49

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

Slide 50

Slide 50

@decodableco @rmoff / #kafkasum t Flink Catalogs T I i m E T CREA CATALOG c_hive W H ( β€˜type’ = β€˜hive’, β€˜hive-conf-dir’ = β€˜./conf’);

Slide 51

Slide 51

@decodableco @rmoff / #kafkasum Flink Catalogs i m table.catalog-store.kind: file table.catalog-store.file.path: ./conf/catalogs t

Slide 52

Slide 52

@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t

Slide 53

Slide 53

@decodableco @rmoff / #kafkasum i m In Conclusion… t

Slide 54

Slide 54

@decodableco @rmoff / #kafkasum Flink SQL is Fun! But there’s a bit of a learning curve β€’ Run ad-hoc queries th the SQL Client β€’ Understand JAR dependencies for connectors, catalogs, formats, etc β€’ Don’t be put off by the docs - there i i w i m there if you look hard enough s SQL content t

Slide 55

Slide 55

@decodableco @rmoff / #kafkasum i m decodable.co/blog t

Slide 56

Slide 56

@rmoff / #kafkasum t OF i m i m @rmoff / 19 Mar 2024 / #kafkasum E

@decodableco t