-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix(bigquery): retry getTable server errors #13383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ | |
| import static com.google.common.base.Preconditions.checkArgument; | ||
| import static java.net.HttpURLConnection.HTTP_NOT_FOUND; | ||
|
|
||
| import com.google.api.client.http.HttpResponseException; | ||
| import com.google.api.core.BetaApi; | ||
| import com.google.api.core.InternalApi; | ||
| import com.google.api.gax.paging.Page; | ||
|
|
@@ -1123,11 +1124,15 @@ && getOptions().getOpenTelemetryTracer() != null) { | |
| new Callable<com.google.api.services.bigquery.model.Table>() { | ||
| @Override | ||
| public com.google.api.services.bigquery.model.Table call() throws IOException { | ||
| return bigQueryRpc.getTableSkipExceptionTranslation( | ||
| completeTableId.getProject(), | ||
| completeTableId.getDataset(), | ||
| completeTableId.getTable(), | ||
| optionsMap); | ||
| try { | ||
| return bigQueryRpc.getTableSkipExceptionTranslation( | ||
| completeTableId.getProject(), | ||
| completeTableId.getDataset(), | ||
| completeTableId.getTable(), | ||
| optionsMap); | ||
| } catch (HttpResponseException e) { | ||
| throw new BigQueryException(e); | ||
| } | ||
|
Comment on lines
+1127
to
+1135
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for this. IIUC the root of the issue is that the default configuration isn't retrying on a I think this will work, but I have a small concern about behavioral compatibility now that we are throwing a new exception (it may be a small possibility, but I'll need to do a bit more digging into this). BigQuery is an old library and I think it is possible that users may have written a custom ResultRetryAlgorithm to handle these cases since we weren't natively supporting it (e.g. ResultRetryAlgorithm catch specifically for One idea I have in mind is something for max compatibility is like: And if this makes sense, we use this default for the relevant RPCs (getTable, ... , etc).
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @lqiu96 i think that makes sense. I agree the current wrapping can change what a custom ResultRetryAlgorithm sees. I am planning to change the approach like this:
Rough shape of the change ( Same to what you proposed ) : I will also add a regression test for the custom retry case, so we verify custom algorithms still see the raw HttpResponseException. Let me know if this is good and i can make these changes |
||
| } | ||
| }, | ||
| getOptions().getRetrySettings(), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this fix correctly addresses the retry behavior for
getTableby wrappingHttpResponseExceptioninBigQueryException, the same pattern of using*SkipExceptionTranslationmethods (such asgetModelSkipExceptionTranslation,getRoutineSkipExceptionTranslation, and potentiallygetDatasetSkipExceptionTranslation) is used elsewhere inBigQueryImpl. These other methods likely suffer from the same issue where transient server errors (like HTTP 503) are not retried. Consider applying the same try-catch wrapping to those methods as well to ensure consistent retry behavior across all resource retrieval APIs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lqiu96 As per the gemini's suggestion does below scope make sense?
the current change only covers
getTable, butgetDataset,getModelandgetRoutineuse the same pattern - the*SkipExceptionTranslationRPC call runs insideBigQueryRetryHelper.runWithRetries(...), so a rawHttpResponseExceptioncan avoid the normalBigQueryExceptiontranslation before retry handling.So my plan is to keep the follow-up scoped to those resource getter paths only:
getDatasetgetModelgetRoutineI am not planning to wrap every
*SkipExceptionTranslationcall in this PR, since that would widen the behavior change beyond this retry issue.I will add matching tests for the three paths with a first 503 response and a successful second response, similar to the new
getTablecoverage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think that makes sense. We can create the necessary follow up for the other relevant RPCs. For now, let's try this with getTable.