feat: support ngram token filter#10
Merged
Merged
Conversation
Signed-off-by: Mingzhuo Yin <yinmingzhuo@gmail.com>
Signed-off-by: Mingzhuo Yin <yinmingzhuo@gmail.com>
Signed-off-by: Mingzhuo Yin <yinmingzhuo@gmail.com>
Signed-off-by: Mingzhuo Yin <yinmingzhuo@gmail.com>
There was a problem hiding this comment.
Pull Request Overview
This PR introduces an ngram token filter similar to Elastic Search’s implementation and updates some dependency versions.
- Added a new ngram token filter implementation with configuration and validation.
- Updated dependency versions in Cargo.toml and the setup script.
- Extended documentation and tests to cover the new ngram token filter.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/setup.sh | Bumped pgrx version to 0.14.1 for proper integration. |
| tests/sqllogictest/ngram.slt | Added tests verifying the ngram token filter behavior. |
| src/token_filter/ngram.rs | Introduced the ngram filter implementation and configuration. |
| src/token_filter/mod.rs | Registered the new ngram filter in the token filter module. |
| src/lib.rs | Modified _PG_init to use extern "C-unwind" as required by pgrx bump. |
| docs/00-reference.md | Documented the options for the ngram token filter. |
| Cargo.toml | Updated versions of lindera, pgrx, and removed pg12 support. |
| .github/workflows/check.yml | Updated workflow to use the latest sccache action version. |
Comments suppressed due to low confidence (1)
Cargo.toml:13
- The removal of the pg12 feature is not mentioned in the PR description; please confirm if dropping support for pg12 is intentional.
pg12 = ["pgrx/pg12", "pgrx-tests/pg12"]
| impl TokenFilter for Ngram { | ||
| fn apply(&self, token: String) -> Vec<String> { | ||
| let mut results = Vec::new(); | ||
| let len = token.len(); |
There was a problem hiding this comment.
If the token length is less than min_gram, the loop 'for i in 0..=(len - self.config.min_gram)' will underflow and panic. Consider adding a guard that returns an empty vector when token.len() < self.config.min_gram.
Suggested change
| let len = token.len(); | |
| let len = token.len(); | |
| if len < self.config.min_gram { | |
| return results; | |
| } |
Comment on lines
+64
to
+69
| pub fn new(config: NgramConfig) -> Self { | ||
| if let Err(e) = config.validate() { | ||
| panic!("Invalid NgramConfig: {}", e); | ||
| } | ||
|
|
||
| Ngram { config } |
There was a problem hiding this comment.
Rather than panicking on an invalid configuration, consider returning a Result to allow clients to handle configuration errors gracefully.
Suggested change
| pub fn new(config: NgramConfig) -> Self { | |
| if let Err(e) = config.validate() { | |
| panic!("Invalid NgramConfig: {}", e); | |
| } | |
| Ngram { config } | |
| pub fn new(config: NgramConfig) -> Result<Self, ValidationError> { | |
| if let Err(e) = config.validate() { | |
| return Err(e); | |
| } | |
| Ok(Ngram { config }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
close #9
Update:
ngramtoken filter. Its config is same with elastic searchpgrxto 0.14.1scache-actionto 0.0.9, https://gh.io/gha-cache-sunsetlinderato 0.42.2, previous cdn url of lindera dictionary is down