A presentation at Kafka Summit 2024 in in London, UK by Robin Moffatt
@rmoff / 19 Mar 2024 / #kafkasum t π² Here be Dragons Stacktraces Flink SQL for Non-Java Developers i m Robin Moffatt, Principal DevEx Engineer @ Decodable
@decodableco @rmoff / #kafkasum m i i m Actual footage of a SQL Developer looking at Apache Flink for the first t e t
i @rmoff / #kafkasum m @decodableco t
@decodableco @rmoff / #kafkasum i m What Is Apache Flink? t
@decodableco @rmoff / #kafkasum i m A Brief History of Flink t
@decodableco @rmoff / #kafkasum i m or t
i @rmoff / #kafkasum m @decodableco t
@decodableco @rmoff / #kafkasum Back in the t e of dinosaurs Hadoop β’ Started life as a research project in 2 called Stratosphere. β’ This was the t 1, 1 0 m m i i i m e of MapReduce. Java and Scala were the only way to do this. t
@rmoff / #kafkasum Flink is a big project β’ Flink β’ Stateful Functions β’ β’ Kubernetes Operator β’ CDC Connector i on (incubating) m m i β’ Pa L M @decodableco t
@decodableco @rmoff / #kafkasum i m Capabilities t
@decodableco @rmoff / #kafkasum Connect to Lots of Source and Target Systems JDBC CDC Kinesis i m DynamoDB Object stores Kinesis Firehose t
@decodableco @rmoff / #kafkasum Stateful and Stateless computations β’ Filtering SELECT * FROM myStream WHERE foo=42 β’ Joining SELECT a., b. FROM myStream a INNER JOIN myLookup b ON a.id=b.foo_id β’ Transfor ng SELECT cost * tax_rate AS total_cost FROM myStream β’ Pattern matching SELECT * FROM myStream MATCH_RECOGNIZE ( PART ON BY id ORDER BY user_action_t [β¦] I T I i m M i m β’ β¦and a whole lot more m SELECT SU (order_value) AS total_order_values FROM orders i β’ Aggregations e t
@decodableco @rmoff / #kafkasum Batch and Strea i m i m Bounded and Unbounded ng t
@decodableco @rmoff / #kafkasum i m How Does Flink Work? t
@decodableco @rmoff / #kafkasum t i m << magic πͺ π§ >>
@decodableco @rmoff / #kafkasum i m // https: t nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/
@decodableco @rmoff / #kafkasum t n e p O s β t e L : e n i g n E L Q S s β ! k m n o o R Fli e n i g n E the r e h t l a W o T π£ y a d s e u π T m p 0 3 : 5 β° 2 m o o R t u o k a e r πΊ B i m m i // https: nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/
@decodableco @rmoff / #kafkasum t Running Flink Works on my machine⦠$ ./bin/start-cluster.sh Starting cluster. Starting standalonesession daemon on host asgard08. Starting taskexecutor daemon on host asgard08. e.taskexecutor. askManagerRunner e.entrypoint.StandaloneSessionClusterEntrypoint T m m i i i m $ jps -l 14656 org.apache.flink.runt 14379 org.apache.flink.runt
@decodableco @rmoff / #kafkasum i m Using Flink t
@decodableco @rmoff / #kafkasum Itβs not just Java β’ PyFlink β’ added in 1.9. in 2 9 in 2 8 β’ Flink SQL 1 1 0 0 0 0 i m β’ Added in 1.5. t
@decodableco @rmoff / #kafkasum i m Flink SQL t
@decodableco @rmoff / #kafkasum SQL Language Support ⒠Built on Apache Calcite ⒠Common Table Expression (C ) ( ⒠Set-based operations ⒠Joins ⒠Aggregations T I W E T i m ⒠And lots more⦠H) t
@decodableco @rmoff / #kafkasum Running Flink SQL β’ SQL Client β’ SQL Gateway β’ RE API β’ Hive β’ JDBC Driver i m T S β’ From Java or Python t
@decodableco @rmoff / #kafkasum t Flink SQL Client $ ./bin/sql-client.sh Welcome! Enter βHELP;β to list all available commands. βQU ;β to exit. Command history file path: /opt/ flink/.flink-sql-history Flink SQL> ββββββββ ββββββββββββββββ βββββββ βββββββ β ββββ βββββββββ βββββ βββ βββββββ βββββ βββ βββ βββββ ββ βββββββββββββββ ββ β βββ ββββββ βββββ βββββ ββββ βββββ βββββ βββββββ βββ βββββββ βββ βββββββββ ββ ββ ββββββββββ ββββββββ ββ β ββ βββββββ ββββ βββ β ββ ββββββββ βββββ ββββ β ββ β ββ ββββββββ ββββ ββ ββββ ββββ ββββββββββ βββ ββ ββββ ββββ ββ βββ βββββββββββ ββββ β β βββ βββ ββ βββ βββββββββ ββββ βββ ββ β βββββββ ββββββββ βββ ββ βββ βββ ββββββββββββββββββββ ββββ β βββββ βββ ββββββ ββββββββ ββββ ββ ββββββββ βββββββββββββββ ββ ββ ββββ βββββββ βββ ββββββ ββ βββ βββ βββ βββ βββββββ ββββ βββββββββββββ βββ βββββ ββββ ββ ββ ββββ βββ ββ βββ β ββ ββ ββ ββ ββ ββ ββ ββββββββ ββ βββββ ββ βββββββββββ ββ ββ ββββ β βββββββ ββ βββ βββββ ββ βββββββββββ ββββ ββββ βββββββ ββββββββ βββββ ββ ββββ βββββ βββββββββββββββββββββββββββββββββ βββββ T I i m ______ _ _ _ _____ ____ _ _____ _ _ _ BETA | | () | | / |/ __ | | / | () | | | | | | _ __ | | __ | ( | | | | | | | | | ___ _ __ | | | | | | | β | |/ / _ | | | | | | | | | |/ _ \ β | | | | | | | | | | < ____) | || | |____ | || | | _/ | | | | || |||| |||_\ |/ ___| ___|||_|| ||__|
@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t
@decodableco @rmoff / #kafkasum i m A Few Useful Settings t
@decodableco @rmoff / #kafkasum Runt S e Mode βexecution.runt β’ strea e-modeβ = βstrea ng [default] i m m i i m m i i m T E β’ batch ngβ; t
@decodableco @rmoff / #kafkasum Result Mode S βsql-client.execution.result-modeβ = βtableβ; β’ table [default] β’ changelog i m T E β’ tableau t
@decodableco @rmoff / #kafkasum Colour Scheme S βsql-client.display.color-schemaβ = βChesterβ; i m T E β’ Because why not?! t
@decodableco @rmoff / #kafkasum Changing the defaults Setting up a SQL Client initialisation file β’ Create a SQL file: $ cat init.sql S βexecution.runt e-modeβ = βbatchβ; S βsql-client.execution.result-modeβ = βtableauβ; β’ Launch SQL Client as a parameter: th the -i flag and pass the file i w m i i m T T E E ./bin/sql-client.sh -i init.sql t
@decodableco @rmoff / #kafkasum Sub tting SQL as a job β’ SQL Client $ ./bin/sql-client.sh βfile ~/my_query.sql β’ SQL Gateway curl βlocation βlocalhost:8083/sessions/42/statementsβ \ βheader βContent-Type: application/jsonβ \ βheader βAccept: application/jsonβ \ βdata β{ βstatementβ: βSELECT * FROM foo;β }β β’ Application mode support: FLIP-316: Support application mode for i m i m SQL Gateway t
@decodableco @rmoff / #kafkasum i m Some of the Gnarly Stuff t
@decodableco @rmoff / #kafkasum The Joy of JARs β’ For each connector, format, and catalog you need to install dependencies. β’ All of these are i m available as JARs (Java ARchive) t
@decodableco @rmoff / #kafkasum This ght jar a bitβ¦ Could not execute SQL statement. Reason: java.lang.ClassNotFoundException org.apache.flink.core.fs.UnsupportedFileSy ste chemeException: Could not find a file system plementation for scheme βs3β m i m i i i m m S m Could not find any factory for identifier βhiveβ that plements βorg.apache.flink.table.factories.CatalogF actoryβ in the classpath. t
@decodableco @rmoff / #kafkasum Finding JARs β’ Usually the docs ll tell you which JAR you need. β’ JARs are very specific to the versions of the tools that i w i m youβre using. t
i @rmoff / #kafkasum m @decodableco t
i @rmoff / #kafkasum m @decodableco t
i @rmoff / #kafkasum m @decodableco t
$ tree /opt/flink/lib @rmoff / #kafkasum βββ aws β βββ aws-java-sdk-bundle-1.12.648.jar β βββ hadoop-aws-3.3.4.jar βββ flink-cep-1.18.1.jar βββ flink-parquet_2.12-1.18.1.jar βββ flink-table-runt e-1.18.1.jar βββ hadoop β βββ commons-configuration2-2.1.1.jar β βββ commons-logging-1.1.3.jar β βββ hadoop-auth-3.3.4.jar βββ hive β βββ flink-sql-connector-hive-3.1.3_2.12-1.18.1.jar βββ iceberg β βββ iceberg-flink-runt βββ kafka e-1.18-1.5. .jar 0 0 m i m βββ flink-sql-connector-kafka-3.1. -1.18.jar i i β m @decodableco t
@decodableco @rmoff / #kafkasum i m Donβt forget to restart! t
@decodableco @rmoff / #kafkasum i m Tables, Connectors, and Catalogs t
@decodableco @rmoff / #kafkasum Tables CREA ( TABLE t_k_orders T T T T S S S m i m T E I T orderid RING, customerid RING, ordernumber IN , product RING, discountpercent INT ) W H ( βconnectorβ βtopicβ βproperties.bootstrap.serversβ βscan.startup. odeβ βformatβ ); = = = = = βkafkaβ, βordersβ, βbroker:29092β, βearliest-offsetβ, βjsonβ t
@decodableco @rmoff / #kafkasum This used to be s ple β’ The data and information about the data was all stored in the database β’ Information Schema β’ System Catalog m i i m β’ Data Dictionary Views t
@decodableco @rmoff / #kafkasum m i i m Now itβs not so s ple t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum i m Flink Catalogs t
@decodableco @rmoff / #kafkasum t Flink Catalogs T I i m E T CREA CATALOG c_hive W H ( βtypeβ = βhiveβ, βhive-conf-dirβ = β./confβ);
@decodableco @rmoff / #kafkasum Flink Catalogs i m table.catalog-store.kind: file table.catalog-store.file.path: ./conf/catalogs t
@decodableco @rmoff / #kafkasum D i m // M E https: O github.com/decodableco/examples/kafka-iceberg t
@decodableco @rmoff / #kafkasum i m In Conclusion⦠t
@decodableco @rmoff / #kafkasum Flink SQL is Fun! But thereβs a bit of a learning curve β’ Run ad-hoc queries th the SQL Client β’ Understand JAR dependencies for connectors, catalogs, formats, etc β’ Donβt be put off by the docs - there i i w i m there if you look hard enough s SQL content t
@decodableco @rmoff / #kafkasum i m decodable.co/blog t
@rmoff / #kafkasum t OF i m i m @rmoff / 19 Mar 2024 / #kafkasum E
@decodableco t
Apache Flink might be the belle of the ball at the moment, but that doesnβt stop it from being baffling to learn at times. A platform steeped in its history as a Java project, it can be daunting for the humble data engineer equipped with only some SQL and their wits to navigate. And thatβs a shame, because with Flink SQL you can do some rather useful things with streams (and batches) of data just using SQL - no coding required!
Join me as I map out the components of Flink, explore the bits that you doβand donβtβneed to be familiar with to begin to work with Flink SQL. Weβll explore together the murky undergrowth of catalogs and connectors, clients and Calciteβand take in a spot of architecture along the way to give us a proper understanding of what happens when we run a SQL statement on Flink.
By the end of this talk, youβll have learnt how to run Flink locally, submit SQL jobs, integrate with various systems through source and sink connectors, and use Flink SQL for effective data transformation.
Whether youβre a seasoned data engineer or just starting out, youβll leave with the confidence to hit the ground running with Flink SQL for yourself.
The following resources were mentioned during the presentation or are useful additional information.
Hereβs what was said about this presentation on social media.
Now on stage at #KafkaSummit: fellow Decoder @rmoff, talking about Flink for non-Java developers. Good stuff! pic.twitter.com/Vh6AwlXzor
— Gunnar Morling π (@gunnarmorling) March 19, 2024
Great demo of FlinkSQL from @rmoff #KafkaSummit pic.twitter.com/mJxDISZn3i
— Bill Bejeck (@bbejeck) March 19, 2024
Really cool demo from @rmoff at #KafkaSummit: using Flink SQL to ingest streaming data into S3 with #ApacheIceberg as the table format, so it can be queried using #DuckDB. pic.twitter.com/GQKd7sjmDH
— Gunnar Morling π (@gunnarmorling) March 19, 2024
Fantastic talk by @rmoff, and the first time @ShadowTrafficIO has been used to power a conference demo. π#KafkaSummit https://t.co/QAltYDCimj
— Michael Drogalis (@MichaelDrogalis) March 19, 2024
Where is the SQL in #ApacheFlink SQL?
— Francesco Tisiot (@FTisiot) March 19, 2024
Hidden under a layer of Java!
Great intro from @rmoff at #KafkaSummit pic.twitter.com/0278i06nof
@rmoff give some background on @ApacheFlink #KafkaSummit pic.twitter.com/dN4puadqHE
— Bill Bejeck (@bbejeck) March 19, 2024
Really looking forward to @rmoffβs talk, Here be dragons! Flink SQL for non-Java developers. #KafkaSummit
— Dave Klein (@daveklein) March 19, 2024