Skip to content

feat: add support for url_encode, url_decode, and try_url_decode#4231

Merged
andygrove merged 4 commits intoapache:mainfrom
parthchandra:try_url_decode
May 7, 2026
Merged

feat: add support for url_encode, url_decode, and try_url_decode#4231
andygrove merged 4 commits intoapache:mainfrom
parthchandra:try_url_decode

Conversation

@parthchandra
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4155

Rationale for this change

try_url_decode was returning incorrect results

What changes are included in this PR?

How are these changes tested?

SQL file tests -
expressions/url/url_decode.sql — valid inputs, malformed input errors
expressions/url/url_encode.sql — valid inputs, multibyte UTF-8
expressions/url/try_url_decode.sql — same inputs as url_decode, malformed input returns NULL

@parthchandra parthchandra marked this pull request as draft May 5, 2026 20:26
@parthchandra parthchandra marked this pull request as ready for review May 5, 2026 21:48
@parthchandra parthchandra requested review from andygrove and comphead and removed request for comphead May 5, 2026 21:48
@andygrove
Copy link
Copy Markdown
Member

Thanks @parthchandra, could you add the tests that I was adding in #4152 ? They covered some edge cases

@parthchandra
Copy link
Copy Markdown
Contributor Author

Thanks @parthchandra, could you add the tests that I was adding in #4152 ? They covered some edge cases

Sure. Do we want to add parse_url to this PR as well?

@andygrove
Copy link
Copy Markdown
Member

I don't mind either way. We can do that as separate PR if easier

…pache#4152)

Squashed from PR apache#4152 (url-functions branch).

Co-Authored-By: Andy Grove <andygrove73@gmail.com>
@parthchandra parthchandra changed the title feat: add support for url_encode, url_decode, and try_url_decode feat: add support for parse_url, url_encode, url_decode, and try_url_decode May 6, 2026
@parthchandra
Copy link
Copy Markdown
Contributor Author

Thanks @parthchandra, could you add the tests that I was adding in #4152 ? They covered some edge cases

Squashed and merged commits from #4152.

@parthchandra
Copy link
Copy Markdown
Contributor Author

Note to reviewers: Spark's encode and url_encode are unrelated functions.
encode(string, charset) - converts a string to BINARY using a character set
`url-encode - percent encoding for URLs.

Comment on lines +40 to +41
("encode", UrlCodec.getClass) -> CometUrlEncodeStaticInvoke,
("decode", UrlCodec.getClass) -> CometUrlDecodeStaticInvoke)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
("encode", UrlCodec.getClass) -> CometUrlEncodeStaticInvoke,
("decode", UrlCodec.getClass) -> CometUrlDecodeStaticInvoke)
("url_encode", UrlCodec.getClass) -> CometUrlEncodeStaticInvoke,
("url_decode", UrlCodec.getClass) -> CometUrlDecodeStaticInvoke)

?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the Spark source for UrlEncode / UrlDecode, the rewrite uses:

StaticInvoke(UrlCodec.getClass, dataType, "encode", Seq(child), ...)
StaticInvoke(UrlCodec.getClass, dataType, "decode", Seq(child, Literal(failOnError)), ...)

The third argument is the JVM method name on UrlCodec, which is literally "encode" and "decode". The user-facing SQL names url_encode / url_decode come from prettyName, not functionName.

@parthchandra parthchandra changed the title feat: add support for parse_url, url_encode, url_decode, and try_url_decode feat: add support for url_encode, url_decode, and try_url_decode May 6, 2026
@parthchandra
Copy link
Copy Markdown
Contributor Author

@andygrove I merged your PR but parse_url from that PR failed in CI. I've backed it out, and it is tracked in #4156

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. user guide should also be updated to list these as supported. can be separate PR.

Thanks @parthchandra!

@andygrove andygrove merged commit 06d2469 into apache:main May 7, 2026
175 of 177 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in Comet Development May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

url_decode: try_url_decode (Spark 4.0) errors instead of returning NULL on malformed input

3 participants