[ZEPPELIN-6284] Added Flink 1.20 interpreter.#5032
[ZEPPELIN-6284] Added Flink 1.20 interpreter.#5032Neuw84 wants to merge 10 commits intoapache:masterfrom
Conversation
|
@Neuw84 Thank you for your contribution. By the way, flink test is failing now. Could you please check and fix them? |
# Conflicts: # flink/flink1.20-shims/src/main/java/org/apache/zeppelin/flink/Flink120Shims.java # flink/flink1.20-shims/src/main/java/org/apache/zeppelin/flink/shims120/CollectStreamTableSink.java
|
@jongyoul I have fixed Copilot comments and added the corresponding test file. If all is good let me know if I need to clean the "commit log". |
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for Apache Flink 1.20 interpreter to Zeppelin, bringing compatibility with the latest LTS release of Flink 1.x series.
Key changes include:
- Creation of Flink 1.20 shims implementation with necessary adapters and utilities
- Addition of Python environment configuration for Flink 1.20 testing
- Updates to build configuration and CI workflows to include Flink 1.20 support
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| testing/env_python_3_with_flink_120.yml | New conda environment configuration for Python 3 with Flink 1.20.0 |
| flink/pom.xml | Adds flink1.20-shims module and version property |
| flink/flink1.20-shims/ | New module containing Flink 1.20 specific implementations and adapters |
| flink/flink-shims/src/main/java/org/apache/zeppelin/flink/FlinkShims.java | Updates shim loader to recognize Flink 1.20 |
| flink/flink-scala-2.12/pom.xml | Updates default Flink version and adds Flink 1.20 profile |
| flink/README.md | Updates documentation to include flink1.20-shims module |
| .github/workflows/core.yml | Updates CI to test with Flink 1.20 |
Comments suppressed due to low confidence (2)
flink/flink1.20-shims/src/main/java/org/apache/zeppelin/flink/PrintUtils.java:109
- Using a Unicode character literal '\uea60' (60000) for milliseconds conversion is unclear and error-prone. Replace with the explicit constant 60000 or a named constant like MILLISECONDS_PER_MINUTE.
formattedArray[i] =
flink/flink1.20-shims/src/main/java/org/apache/zeppelin/flink/PrintUtils.java:110
- Using a Unicode character literal '\uea60' (60000) for milliseconds conversion is unclear and error-prone. Replace with the explicit constant 60000 or a named constant like MILLISECONDS_PER_MINUTE.
formattedTimestamp(array.get(i), elementType, sessionTimeZone);
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| } | ||
|
|
||
| public static int localTimeToUnixDate(LocalTime time) { | ||
| return time.getHour() * 3600000 + time.getMinute() * '\uea60' + time.getSecond() * 1000 + time.getNano() / 1000000; |
There was a problem hiding this comment.
Using a Unicode character literal '\uea60' (60000) for milliseconds conversion is unclear and error-prone. Replace with the explicit constant 60000 or a named constant like MILLISECONDS_PER_MINUTE.
| return time.getHour() * 3600000 + time.getMinute() * '\uea60' + time.getSecond() * 1000 + time.getNano() / 1000000; | |
| return time.getHour() * 3600000 + time.getMinute() * MILLISECONDS_PER_MINUTE + time.getSecond() * 1000 + time.getNano() / 1000000; |
|
Unmmm, looking at the docs I see this weird error. Seems like it is not being able to launch the cluster, but a little bit lost here. Any help is appreciated as the Exception do not provide much info.... Indeed the code that is present here is working at least for Scala and SQL interfaces ( checked with some Notebooks) with Java 11. |
Hopefully this will fix the NPE with log4j.
|
Seems that the error comes from a NPE Exception from Log4j. Flink 1.20.2 seems to address the issue. https://flink.apache.org/2025/07/10/apache-flink-1.20.2-release-announcement/?utm_source=chatgpt.com |
|
Log4j issue solved with upgrade to 1.20.2, now I am getting a fancy scala error with some version mismatch.
to reproduce them. Tried to fix it but without luck yet. Using a docker-compose setup to connect Zeppelin to a Flink cluster on 1.20.2 with this code work as expected ( Scala , SQL interfaces). |
|
@zjffdu Could you please help to fix the error if you could share your time to review it? |
|
After some research I wasn´t able to identify the offending library. Spend some time time trying to fix scala versions without luck via maven. The code works well as said on my previous comment, should be a thing of scala-reflect/compiler used on the tests but no clue. |
Reamer
left a comment
There was a problem hiding this comment.
We are now using a newer JDK version and should also be using the new Java tools.
Avoid Scala reflection issues on testing.
| } else { | ||
| String flinkHome = System.getenv("FLINK_HOME"); | ||
| if (flinkHome != null && !flinkHome.isBlank()) { | ||
| pyDir = Paths.get(flinkHome).resolve("opt").resolve("python"); |
There was a problem hiding this comment.
Why do you use Paths.resolve() in various places?
I would have shortened it to Paths.get(flinkHome, “opt”, "python").
Refactoring of the code.
|
After digging and digging with scala libraries I think that the error is not solvable.... I don't see any scala library collisions and tried forcing many versions. Seems that they changed how you need to create the environments and that should updated on ScalaInterpreter. We should have a codepath for 1.17 and below and one for 1.20 now. |
File "/usr/share/miniconda/envs/python_3_with_flink/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 796, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
This is why I abandoned the previous PR ... recompiling Flink with a higher Scala version does not work either ... the key point here is FLINK-32560 added a |
|
FLINK-36327 replaces the |
|
That would be great! |
|
I dont know if you have noticed or not, but the error isn't actually fatal. If you try to run the cell a second time it surprisingly will work without any problems. |
I also noticed that, but have no idea why, due to my limited knowledge of scala internal :( |
|
I think there are two directions now:
|
|
The problem is that Flink 2.0 deletes all scala APIs, that would mean that we need to rewrite the Flink interpreter and see what are the ideal options there. SQL interface ok, but then... python+java repl? It would need a huge rearchitecture.... I will try to spend more time on this (Flink 1.20) but I can't promise anything, any help would be appreciated. As it is know it works but I would like this to be merged to have at least LTS support before migrate to Flink 2.0... |
|
@Neuw84 If it’s okay with you, I’d like to help too. |
|
go for it, the code is working well as I having been using heavily it for a re:Invent workshop I am building for next month. It is a matter of solving those scala collisions on testing. |
…1.19/1.20 support
- Remove deprecated flink1.15-shims, flink1.16-shims, flink1.17-shims modules
- Add new flink1.19-shims and flink1.20-shims modules (Flink 1.19.3 / 1.20.3)
- Fix CatalogStoreHolder requirement in Flink 1.19+ CatalogManager.Builder
- Fix Scala reflection error with ImplicitExpressionConversions by:
- Using TableEnvironment.create() / StreamTableEnvironment.create() directly
instead of custom TableEnvFactory (referenced from PR apache#5032)
- Adding bindWithRetry() for Scala REPL bind operations
- Using explicit imports instead of wildcard org.apache.flink.table.api._
- Update flink.scala.version from 2.12.7 to 2.12.18
- Update DownloadUtils to remove flink-scala jar from FLINK_HOME/lib for 1.19+
- Update CI workflow matrix from [115,116,117] to [119,120]
- Update integration tests and conda env files for Flink 1.19/1.20
- Update documentation to reflect new supported versions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1.19/1.20 support
- Remove deprecated flink1.15-shims, flink1.16-shims, flink1.17-shims modules
- Add new flink1.19-shims and flink1.20-shims modules (Flink 1.19.3 / 1.20.3)
- Fix CatalogStoreHolder requirement in Flink 1.19+ CatalogManager.Builder
- Fix Scala reflection error with ImplicitExpressionConversions by:
- Using TableEnvironment.create() / StreamTableEnvironment.create() directly
instead of custom TableEnvFactory (referenced from PR apache#5032)
- Adding bindWithRetry() for Scala REPL bind operations
- Using explicit imports instead of wildcard org.apache.flink.table.api._
- Update flink.scala.version from 2.12.7 to 2.12.18
- Update DownloadUtils to remove flink-scala jar from FLINK_HOME/lib for 1.19+
- Update CI workflow matrix from [115,116,117] to [119,120]
- Update integration tests and conda env files for Flink 1.19/1.20
- Update documentation to reflect new supported versions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1.19/1.20 support
- Remove deprecated flink1.15-shims, flink1.16-shims, flink1.17-shims modules
- Add new flink1.19-shims and flink1.20-shims modules (Flink 1.19.3 / 1.20.3)
- Fix CatalogStoreHolder requirement in Flink 1.19+ CatalogManager.Builder
- Fix Scala reflection error with ImplicitExpressionConversions by:
- Using TableEnvironment.create() / StreamTableEnvironment.create() directly
instead of custom TableEnvFactory (referenced from PR apache#5032)
- Adding bindWithRetry() for Scala REPL bind operations
- Using explicit imports instead of wildcard org.apache.flink.table.api._
- Update flink.scala.version from 2.12.7 to 2.12.18
- Update DownloadUtils to remove flink-scala jar from FLINK_HOME/lib for 1.19+
- Update CI workflow matrix from [115,116,117] to [119,120]
- Update integration tests and conda env files for Flink 1.19/1.20
- Update documentation to reflect new supported versions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What is this PR for?
A few sentences describing the overall goals of the pull request's commits.
First time? Check out the contributing guide - https://zeppelin.apache.org/contribution/contributions.html
Add Flink 1.20 support for Apache Flink interpreter. Being the latest 1.X release and LTS one it is important to be able to support it. For example we can use latest Iceberg versions with it.
What type of PR is it?
Feature
Todos
What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-6284
How should this be tested?
Screenshots (if appropriate)
Questions: