Fix slow local startup caused by secret manager initialization#6314
Conversation
…alizations, and convert transport to rest.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a significant performance regression in local Flask server startup observed after updating the grpcio library. By transitioning the Secret Manager client to a singleton and switching its transport layer to REST, the changes eliminate persistent connection overhead and resolve fork-related issues, ensuring a smoother development experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a global _secret_client variable in server/__init__.py to cache and reuse the GCP Secret Manager client across function calls, initializing it with the REST transport when first needed. There are no review comments to address, and I have no additional feedback to provide.
juliawu
left a comment
There was a problem hiding this comment.
Thank you for the much needed fix!
Issue
b/516817315
Description
Following an update to
grpciofrom1.76.0to1.80.0, we started to see a major lag in Flask server startup when running locally on MacOS.For reference, the grpcio update was in Website: 6172. It was effected because of a major vulnerability in a library that used
grpcioas a dependency.This change likely altered behavior in terms of default timeouts, retry policies, etc, in such a way that the observed delay began to manifest.
The Fix
There are two parts to this fix. One is to refactor the secret manager client to a singleton. The pattern used here is similar to that found in
server/lib/vertex_ai.py(without the threading locking as this only occurs on server start up).This does solve the problem in itself, but produced warnings because of how Flask works in debug mode. It forks a child worker for actually serving the requests (with the master monitoring the file system for changes, among other things). The gRPC connections are long-lived and are forked along with the child worker, causing a warning in the underlying library that this is not safe.
The second part to the fix is to switch the transport to rest. This fixes the problem even without the refactoring (as well as the above issue with refactoring, because the rest requests are stateless and do not maintain a persistent connection, and so have no fork safety issues.