Cache aware for pd#1379
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a cache-aware routing policy for the prefill-decode master, utilizing a high-performance radix tree implemented in Rust with PyO3 bindings to route requests to cache-hot nodes. Key feedback highlights critical issues, including an undefined self._lock attribute in CacheAwarePolicy that will cause server crashes, potential deadlocks in the Rust tree implementation when holding read locks during child removal, and a missing update_nodes override in LoadBalancedCacheAwareSelector that prevents worker initialization. Additionally, the reviewer noted a potential ValueError when selecting from empty workers and pointed out that the cumulative processed_prompt_len load metric lacks a decay mechanism, which will degrade load balancing accuracy over time.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if node.children.is_empty() && node.tenant_last_access_time.is_empty() { | ||
| if let Some(ref parent) = parent_opt { | ||
| if let Some(fc) = node.text.read().unwrap().first_char() { | ||
| parent.children.remove(&fc); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Holding the read lock on node.text (via node.text.read().unwrap()) while calling parent.children.remove(&fc) creates a deadlock risk. A concurrent insert thread can hold the write lock on parent.children and wait for the write lock on node.text, while this thread holds the read lock on node.text and waits for the write lock on parent.children. To prevent this deadlock, retrieve fc first and release the read lock before removing the child from the parent.
| if node.children.is_empty() && node.tenant_last_access_time.is_empty() { | |
| if let Some(ref parent) = parent_opt { | |
| if let Some(fc) = node.text.read().unwrap().first_char() { | |
| parent.children.remove(&fc); | |
| } | |
| } | |
| } | |
| if node.children.is_empty() && node.tenant_last_access_time.is_empty() { | |
| if let Some(ref parent) = parent_opt { | |
| let fc = node.text.read().unwrap().first_char(); | |
| if let Some(fc) = fc { | |
| parent.children.remove(&fc); | |
| } | |
| } | |
| } |
| if curr.children.is_empty() && curr.tenant_last_access_time.is_empty() { | ||
| if let Some(ref parent) = parent_opt { | ||
| if let Some(fc) = curr.text.read().unwrap().first_char() { | ||
| parent.children.remove(&fc); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Holding the read lock on curr.text (via curr.text.read().unwrap()) while calling parent.children.remove(&fc) creates a deadlock risk with concurrent insert operations. To prevent this deadlock, retrieve fc first and release the read lock before removing the child from the parent.
| if curr.children.is_empty() && curr.tenant_last_access_time.is_empty() { | |
| if let Some(ref parent) = parent_opt { | |
| if let Some(fc) = curr.text.read().unwrap().first_char() { | |
| parent.children.remove(&fc); | |
| } | |
| } | |
| } | |
| if curr.children.is_empty() && curr.tenant_last_access_time.is_empty() { | |
| if let Some(ref parent) = parent_opt { | |
| let fc = curr.text.read().unwrap().first_char(); | |
| if let Some(fc) = fc { | |
| parent.children.remove(&fc); | |
| } | |
| } | |
| } |
| def update_load(self, prompt_len: int): | ||
| self.processed_prompt_len += prompt_len |
There was a problem hiding this comment.
The processed_prompt_len load metric is strictly cumulative and is never decremented or decayed when requests finish. Over time, this means a worker that historically processed many requests will always be considered highly loaded, even if it is currently completely idle. Consider implementing a decay mechanism or decrementing the load when requests complete to ensure accurate active load balancing.
support cache aware for prefill selection.