@rmoff / 19 Mar 2024 / #kafkasum t 🐲 Here be Dragons Stacktraces Flink SQL for Non-Java Developers i m Robin Moffatt, Principal DevEx Engineer @ Decodable

@decodableco @rmoff / #kafkasum m i i m Actual footage of a SQL Developer looking at Apache Flink for the first t e t

i @rmoff / #kafkasum m @decodableco t

@decodableco @rmoff / #kafkasum i m What Is Apache Flink? t

@decodableco @rmoff / #kafkasum i m A Brief History of Flink t

@decodableco @rmoff / #kafkasum i m or t

i @rmoff / #kafkasum m @decodableco t

@decodableco @rmoff / #kafkasum Back in the t e of dinosaurs Hadoop β€’ Started life as a research project in 2 called Stratosphere. β€’ This was the t 1, 1 0 m m i i i m e of MapReduce. Java and Scala were the only way to do this. t

@rmoff / #kafkasum Flink is a big project β€’ Flink β€’ Stateful Functions β€’ β€’ Kubernetes Operator β€’ CDC Connector i on (incubating) m m i β€’ Pa L M @decodableco t

@decodableco @rmoff / #kafkasum i m Capabilities t

@decodableco @rmoff / #kafkasum Connect to Lots of Source and Target Systems JDBC CDC Kinesis i m DynamoDB Object stores Kinesis Firehose t

@decodableco @rmoff / #kafkasum Stateful and Stateless computations β€’ Filtering SELECT * FROM myStream WHERE foo=42 β€’ Joining SELECT a., b. FROM myStream a INNER JOIN myLookup b ON a.id=b.foo_id β€’ Transfor ng SELECT cost * tax_rate AS total_cost FROM myStream β€’ Pattern matching SELECT * FROM myStream MATCH_RECOGNIZE ( PART ON BY id ORDER BY user_action_t […] I T I i m M i m β€’ …and a whole lot more m SELECT SU (order_value) AS total_order_values FROM orders i β€’ Aggregations e t

@decodableco @rmoff / #kafkasum Batch and Strea i m i m Bounded and Unbounded ng t

@decodableco @rmoff / #kafkasum i m How Does Flink Work? t

@decodableco @rmoff / #kafkasum t i m << magic πŸͺ„ πŸ§™ >>

@decodableco @rmoff / #kafkasum i m // https: t nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/

@decodableco @rmoff / #kafkasum t n e p O s ’ t e L : e n i g n E L Q S s ’ ! k m n o o R Fli e n i g n E the r e h t l a W o T πŸ—£ y a d s e u πŸ“† T m p 0 3 : 5 ⏰ 2 m o o R t u o k a e r πŸ—Ί B i m m i // https: nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/

@decodableco @rmoff / #kafkasum t Running Flink Works on my machine… $ ./bin/start-cluster.sh Starting cluster. Starting standalonesession daemon on host asgard08. Starting taskexecutor daemon on host asgard08. e.taskexecutor. askManagerRunner e.entrypoint.StandaloneSessionClusterEntrypoint T m m i i i m $ jps -l 14656 org.apache.flink.runt 14379 org.apache.flink.runt

@decodableco @rmoff / #kafkasum i m Using Flink t

@decodableco @rmoff / #kafkasum It’s not just Java β€’ PyFlink β€’ added in 1.9. in 2 9 in 2 8 β€’ Flink SQL 1 1 0 0 0 0 i m β€’ Added in 1.5. t

@decodableco @rmoff / #kafkasum i m Flink SQL t

@decodableco @rmoff / #kafkasum SQL Language Support β€’ Built on Apache Calcite β€’ Common Table Expression (C ) ( β€’ Set-based operations β€’ Joins β€’ Aggregations T I W E T i m β€’ And lots more… H) t

@decodableco @rmoff / #kafkasum Running Flink SQL β€’ SQL Client β€’ SQL Gateway β€’ RE API β€’ Hive β€’ JDBC Driver i m T S β€’ From Java or Python t

@decodableco @rmoff / #kafkasum t Flink SQL Client $ ./bin/sql-client.sh Welcome! Enter β€˜HELP;’ to list all available commands. β€˜QU ;’ to exit. Command history file path: /opt/ flink/.flink-sql-history Flink SQL> β–’β–“β–ˆβ–ˆβ–“β–ˆβ–ˆβ–’ β–“β–ˆβ–ˆβ–ˆβ–ˆβ–’β–’β–ˆβ–“β–’β–“β–ˆβ–ˆβ–ˆβ–“β–’ β–“β–ˆβ–ˆβ–ˆβ–“β–‘β–‘ β–’β–’β–’β–“β–ˆβ–ˆβ–’ β–’ β–‘β–ˆβ–ˆβ–’ β–’β–’β–“β–“β–ˆβ–“β–“β–’β–‘ β–’β–ˆβ–ˆβ–ˆβ–ˆ β–ˆβ–ˆβ–’ β–‘β–’β–“β–ˆβ–ˆβ–ˆβ–’ β–’β–ˆβ–’β–ˆβ–’ β–‘β–“β–ˆ β–ˆβ–ˆβ–ˆ β–“β–‘β–’β–ˆβ–ˆ β–“β–ˆ β–’β–’β–’β–’β–’β–“β–ˆβ–ˆβ–“β–‘β–’β–‘β–“β–“β–ˆ β–ˆβ–‘ β–ˆ β–’β–’β–‘ β–ˆβ–ˆβ–ˆβ–“β–“β–ˆ β–’β–ˆβ–’β–’β–’ β–ˆβ–ˆβ–ˆβ–ˆβ–‘ β–’β–“β–ˆβ–“ β–ˆβ–ˆβ–’β–’β–’ β–“β–ˆβ–ˆβ–ˆβ–’ β–‘β–’β–ˆβ–“β–“β–ˆβ–ˆ β–“β–ˆβ–’ β–“β–ˆβ–’β–“β–ˆβ–ˆβ–“ β–‘β–ˆβ–‘ β–“β–‘β–’β–“β–ˆβ–ˆβ–ˆβ–ˆβ–’ β–ˆβ–ˆ β–’β–ˆ β–ˆβ–“β–‘β–’β–ˆβ–’β–‘β–’β–ˆβ–’ β–ˆβ–ˆβ–ˆβ–“β–‘β–ˆβ–ˆβ–“ β–“β–ˆ β–ˆ β–ˆβ–“ β–’β–“β–ˆβ–“β–“β–ˆβ–’ β–‘β–ˆβ–ˆβ–“ β–‘β–ˆβ–‘ β–ˆ β–ˆβ–’ β–’β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–’ β–ˆβ–ˆβ–“β–‘β–’ β–ˆβ–ˆβ–ˆβ–‘ β–‘ β–ˆβ–‘ β–“ β–‘β–ˆ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–’β–‘β–‘ β–‘β–ˆβ–‘β–“ β–“β–‘ β–ˆβ–ˆβ–“β–ˆ β–’β–’β–“β–’ β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–‘ β–’β–ˆβ–’ β–’β–“ β–“β–ˆβ–ˆβ–“ β–’β–ˆβ–ˆβ–“ β–“β–ˆ β–ˆβ–“β–ˆ β–‘β–’β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–“β–’β–‘ β–ˆβ–ˆβ–’β–’ β–ˆ β–’ β–“β–ˆβ–’ β–“β–ˆβ–“ β–“β–ˆ β–ˆβ–ˆβ–“ β–‘β–“β–“β–“β–“β–“β–“β–“β–’ β–’β–ˆβ–ˆβ–“ β–‘β–ˆβ–’ β–“β–ˆ β–ˆ β–“β–ˆβ–ˆβ–ˆβ–“β–’β–‘ β–‘β–“β–“β–“β–ˆβ–ˆβ–ˆβ–“ β–‘β–’β–‘ β–“β–ˆ β–ˆβ–ˆβ–“ β–ˆβ–ˆβ–’ β–‘β–’β–“β–“β–ˆβ–ˆβ–ˆβ–“β–“β–“β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–“β–’ β–“β–ˆβ–ˆβ–ˆ β–ˆ β–“β–ˆβ–ˆβ–ˆβ–’ β–ˆβ–ˆβ–ˆ β–‘β–“β–“β–’β–‘β–‘ β–‘β–“β–ˆβ–ˆβ–ˆβ–ˆβ–“β–‘ β–‘β–’β–“β–’ β–ˆβ–“ β–ˆβ–“β–’β–’β–“β–“β–ˆβ–ˆ β–‘β–’β–’β–‘β–‘β–‘β–’β–’β–’β–’β–“β–ˆβ–ˆβ–“β–‘ β–ˆβ–“ β–ˆβ–ˆ β–“β–‘β–’β–ˆ β–“β–“β–“β–“β–’β–‘β–‘ β–’β–ˆβ–“ β–’β–“β–“β–ˆβ–ˆβ–“ β–“β–’ β–’β–’β–“ β–“β–ˆβ–“ β–“β–’β–ˆ β–ˆβ–“β–‘ β–‘β–’β–“β–“β–ˆβ–ˆβ–’ β–‘β–“β–ˆβ–’ β–’β–’β–’β–‘β–’β–’β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–’ β–ˆβ–ˆβ–‘ β–“β–ˆβ–’β–ˆβ–’ β–’β–“β–“β–’ β–“β–ˆ β–ˆβ–‘ β–‘β–‘β–‘β–‘ β–‘β–ˆβ–’ β–“β–ˆ β–’β–ˆβ–“ β–‘ β–ˆβ–‘ β–’β–ˆ β–ˆβ–“ β–ˆβ–“ β–ˆβ–ˆ β–ˆβ–‘ β–“β–“ β–’β–ˆβ–“β–“β–“β–’β–ˆβ–‘ β–ˆβ–“ β–‘β–“β–ˆβ–ˆβ–‘ β–“β–’ β–“β–ˆβ–“β–’β–‘β–‘β–‘β–’β–“β–ˆβ–‘ β–’β–ˆ β–ˆβ–ˆ β–“β–ˆβ–“β–‘ β–’ β–‘β–’β–ˆβ–’β–ˆβ–ˆβ–’ β–“β–“ β–“β–ˆβ–’ β–’β–ˆβ–“β–’β–‘ β–’β–’ β–ˆβ–’β–ˆβ–“β–’β–’β–‘β–‘β–’β–ˆβ–ˆ β–‘β–ˆβ–ˆβ–’ β–’β–“β–“β–’ β–“β–ˆβ–ˆβ–“β–’β–ˆβ–’ β–‘β–“β–“β–“β–“β–’β–ˆβ–“ β–‘β–“β–ˆβ–ˆβ–’ β–“β–‘ β–’β–ˆβ–“β–ˆ β–‘β–‘β–’β–’β–’ β–’β–“β–“β–“β–“β–“β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–‘β–‘β–“β–“ β–“β–‘β–’β–ˆβ–‘ T I i m ______ _ _ _ _____ ____ _ _____ _ _ _ BETA | | () | | / |/ __ | | / | () | | | | | | _ __ | | __ | ( | | | | | | | | | ___ _ __ | | | | | | | β€˜ | |/ / _ | | | | | | | | | |/ _ \ β€˜ | | | | | | | | | | < ____) | || | |____ | || | | _/ | | | | || |||| |||_\ |/ ___| ___|||_|| ||__|

@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t

@decodableco @rmoff / #kafkasum i m A Few Useful Settings t

@decodableco @rmoff / #kafkasum Runt S e Mode β€˜execution.runt β€’ strea e-mode’ = β€˜strea ng [default] i m m i i m m i i m T E β€’ batch ng’; t

@decodableco @rmoff / #kafkasum Result Mode S β€˜sql-client.execution.result-mode’ = β€˜table’; β€’ table [default] β€’ changelog i m T E β€’ tableau t

@decodableco @rmoff / #kafkasum Colour Scheme S β€˜sql-client.display.color-schema’ = β€˜Chester’; i m T E β€’ Because why not?! t

@decodableco @rmoff / #kafkasum Changing the defaults Setting up a SQL Client initialisation file β€’ Create a SQL file: $ cat init.sql S β€˜execution.runt e-mode’ = β€˜batch’; S β€˜sql-client.execution.result-mode’ = β€˜tableau’; β€’ Launch SQL Client as a parameter: th the -i flag and pass the file i w m i i m T T E E ./bin/sql-client.sh -i init.sql t

@decodableco @rmoff / #kafkasum Sub tting SQL as a job β€’ SQL Client $ ./bin/sql-client.sh β€”file ~/my_query.sql β€’ SQL Gateway curl β€”location β€˜localhost:8083/sessions/42/statements’ \ β€”header β€˜Content-Type: application/json’ \ β€”header β€˜Accept: application/json’ \ β€”data β€˜{ β€œstatement”: β€œSELECT * FROM foo;” }’ β€’ Application mode support: FLIP-316: Support application mode for i m i m SQL Gateway t

@decodableco @rmoff / #kafkasum i m Some of the Gnarly Stuff t

@decodableco @rmoff / #kafkasum The Joy of JARs β€’ For each connector, format, and catalog you need to install dependencies. β€’ All of these are i m available as JARs (Java ARchive) t

@decodableco @rmoff / #kafkasum This ght jar a bit… Could not execute SQL statement. Reason: java.lang.ClassNotFoundException org.apache.flink.core.fs.UnsupportedFileSy ste chemeException: Could not find a file system plementation for scheme β€˜s3’ m i m i i i m m S m Could not find any factory for identifier β€˜hive’ that plements β€˜org.apache.flink.table.factories.CatalogF actory’ in the classpath. t

@decodableco @rmoff / #kafkasum Finding JARs β€’ Usually the docs ll tell you which JAR you need. β€’ JARs are very specific to the versions of the tools that i w i m you’re using. t

i @rmoff / #kafkasum m @decodableco t

i @rmoff / #kafkasum m @decodableco t

i @rmoff / #kafkasum m @decodableco t

$ tree /opt/flink/lib @rmoff / #kafkasum β”œβ”€β”€ aws β”‚ β”œβ”€β”€ aws-java-sdk-bundle-1.12.648.jar β”‚ └── hadoop-aws-3.3.4.jar β”œβ”€β”€ flink-cep-1.18.1.jar β”œβ”€β”€ flink-parquet_2.12-1.18.1.jar β”œβ”€β”€ flink-table-runt e-1.18.1.jar β”œβ”€β”€ hadoop β”‚ β”œβ”€β”€ commons-configuration2-2.1.1.jar β”‚ β”œβ”€β”€ commons-logging-1.1.3.jar β”‚ β”œβ”€β”€ hadoop-auth-3.3.4.jar β”œβ”€β”€ hive β”‚ └── flink-sql-connector-hive-3.1.3_2.12-1.18.1.jar β”œβ”€β”€ iceberg β”‚ └── iceberg-flink-runt β”œβ”€β”€ kafka e-1.18-1.5. .jar 0 0 m i m └── flink-sql-connector-kafka-3.1. -1.18.jar i i β”‚ m @decodableco t

@decodableco @rmoff / #kafkasum i m Don’t forget to restart! t

@decodableco @rmoff / #kafkasum i m Tables, Connectors, and Catalogs t

@decodableco @rmoff / #kafkasum Tables CREA ( TABLE t_k_orders T T T T S S S m i m T E I T orderid RING, customerid RING, ordernumber IN , product RING, discountpercent INT ) W H ( β€˜connector’ β€˜topic’ β€˜properties.bootstrap.servers’ β€˜scan.startup. ode’ β€˜format’ ); = = = = = β€˜kafka’, β€˜orders’, β€˜broker:29092’, β€˜earliest-offset’, β€˜json’ t

@decodableco @rmoff / #kafkasum This used to be s ple β€’ The data and information about the data was all stored in the database β€’ Information Schema β€’ System Catalog m i i m β€’ Data Dictionary Views t

@decodableco @rmoff / #kafkasum m i i m Now it’s not so s ple t

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

@decodableco @rmoff / #kafkasum i m Flink Catalogs t

@decodableco @rmoff / #kafkasum t Flink Catalogs T I i m E T CREA CATALOG c_hive W H ( β€˜type’ = β€˜hive’, β€˜hive-conf-dir’ = β€˜./conf’);

@decodableco @rmoff / #kafkasum Flink Catalogs i m table.catalog-store.kind: file table.catalog-store.file.path: ./conf/catalogs t

@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t

@decodableco @rmoff / #kafkasum i m In Conclusion… t

@decodableco @rmoff / #kafkasum Flink SQL is Fun! But there’s a bit of a learning curve β€’ Run ad-hoc queries th the SQL Client β€’ Understand JAR dependencies for connectors, catalogs, formats, etc β€’ Don’t be put off by the docs - there i i w i m there if you look hard enough s SQL content t

@decodableco @rmoff / #kafkasum i m decodable.co/blog t

@rmoff / #kafkasum t OF i m i m @rmoff / 19 Mar 2024 / #kafkasum E

@decodableco t