feat: add support for url_encode, url_decode, and try_url_decode#4231
feat: add support for url_encode, url_decode, and try_url_decode#4231andygrove merged 4 commits intoapache:mainfrom
Conversation
|
Thanks @parthchandra, could you add the tests that I was adding in #4152 ? They covered some edge cases |
Sure. Do we want to add parse_url to this PR as well? |
|
I don't mind either way. We can do that as separate PR if easier |
…pache#4152) Squashed from PR apache#4152 (url-functions branch). Co-Authored-By: Andy Grove <andygrove73@gmail.com>
Squashed and merged commits from #4152. |
|
Note to reviewers: Spark's |
| ("encode", UrlCodec.getClass) -> CometUrlEncodeStaticInvoke, | ||
| ("decode", UrlCodec.getClass) -> CometUrlDecodeStaticInvoke) |
There was a problem hiding this comment.
| ("encode", UrlCodec.getClass) -> CometUrlEncodeStaticInvoke, | |
| ("decode", UrlCodec.getClass) -> CometUrlDecodeStaticInvoke) | |
| ("url_encode", UrlCodec.getClass) -> CometUrlEncodeStaticInvoke, | |
| ("url_decode", UrlCodec.getClass) -> CometUrlDecodeStaticInvoke) |
?
There was a problem hiding this comment.
Looking at the Spark source for UrlEncode / UrlDecode, the rewrite uses:
StaticInvoke(UrlCodec.getClass, dataType, "encode", Seq(child), ...)
StaticInvoke(UrlCodec.getClass, dataType, "decode", Seq(child, Literal(failOnError)), ...)
The third argument is the JVM method name on UrlCodec, which is literally "encode" and "decode". The user-facing SQL names url_encode / url_decode come from prettyName, not functionName.
|
@andygrove I merged your PR but |
andygrove
left a comment
There was a problem hiding this comment.
LGTM. user guide should also be updated to list these as supported. can be separate PR.
Thanks @parthchandra!
Which issue does this PR close?
Closes #4155
Rationale for this change
try_url_decode was returning incorrect results
What changes are included in this PR?
url_encode,url_decode, andtry_url_decode(Spark 4.0+) by wiring up the existingdatafusion-sparkUDFs and handling theStaticInvokeserde forUrlCodec.encode/UrlCodec.decode.try_url_decodewas silently dropping thefailOnError=falseflag, causing Comet to error on malformed input where Spark returns NULL.How are these changes tested?
SQL file tests -
expressions/url/url_decode.sql— valid inputs, malformed input errorsexpressions/url/url_encode.sql— valid inputs, multibyte UTF-8expressions/url/try_url_decode.sql— same inputs as url_decode, malformed input returns NULL