Skip to content

fix(bigquery): retry getTable server errors#13383

Open
goutamadwant wants to merge 1 commit into
googleapis:mainfrom
goutamadwant:fix/bigquery-12885-retry-503
Open

fix(bigquery): retry getTable server errors#13383
goutamadwant wants to merge 1 commit into
googleapis:mainfrom
goutamadwant:fix/bigquery-12885-retry-503

Conversation

@goutamadwant

Copy link
Copy Markdown

Translate HTTP response exceptions thrown by BigQuery.getTable before retry evaluation so the existing BigQueryException retryability rules can retry transient server errors such as HTTP 503.

This keeps the change scoped to getTable. It does not change job retry behavior or raw network IOException handling.

Fixes #12885

Tests:

mvn -pl java-bigquery/google-cloud-bigquery -Dtest=BigQueryImplTest#testGetTableFailureShouldRetryServerErrors -DskipITs test
mvn -pl java-bigquery/google-cloud-bigquery -Dtest=BigQueryImplTest -DskipITs test
mvn -pl java-bigquery/google-cloud-bigquery -DskipITs test
mvn -pl java-bigquery/google-cloud-bigquery -DskipITs package

Translate HTTP response exceptions thrown by getTable before retry evaluation so existing BigQueryException retryability handles transient server errors.

Keep the change scoped to getTable and leave raw network IOExceptions on the existing retry path.

Fixes googleapis#12885
@goutamadwant goutamadwant requested review from a team as code owners June 6, 2026 21:20
@google-cla

google-cla Bot commented Jun 6, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request wraps the call to getTableSkipExceptionTranslation in BigQueryImpl with a try-catch block to catch HttpResponseException and rethrow it as a BigQueryException. It also adds a corresponding unit test testGetTableFailureShouldRetryServerErrors to verify that 503 server errors are successfully retried. There are no review comments, so I have no feedback to provide.

@lqiu96 lqiu96 requested review from jinseopkim0 and lqiu96 June 8, 2026 21:44
@lqiu96

lqiu96 commented Jun 8, 2026

Copy link
Copy Markdown
Member

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request wraps the getTableSkipExceptionTranslation call in BigQueryImpl with a try-catch block to catch HttpResponseException and throw a BigQueryException, enabling transient server errors (such as HTTP 503) to be retried. It also adds a unit test to verify this retry behavior. The reviewer suggests applying this same try-catch wrapping to other *SkipExceptionTranslation methods (e.g., for models, routines, and datasets) to ensure consistent retry behavior across all resource retrieval APIs.

Comment on lines +1133 to +1135
} catch (HttpResponseException e) {
throw new BigQueryException(e);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this fix correctly addresses the retry behavior for getTable by wrapping HttpResponseException in BigQueryException, the same pattern of using *SkipExceptionTranslation methods (such as getModelSkipExceptionTranslation, getRoutineSkipExceptionTranslation, and potentially getDatasetSkipExceptionTranslation) is used elsewhere in BigQueryImpl. These other methods likely suffer from the same issue where transient server errors (like HTTP 503) are not retried. Consider applying the same try-catch wrapping to those methods as well to ensure consistent retry behavior across all resource retrieval APIs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lqiu96 As per the gemini's suggestion does below scope make sense?

the current change only covers getTable, but getDataset, getModel and getRoutine use the same pattern - the *SkipExceptionTranslation RPC call runs inside BigQueryRetryHelper.runWithRetries(...), so a raw HttpResponseException can avoid the normal BigQueryException translation before retry handling.

So my plan is to keep the follow-up scoped to those resource getter paths only:

  • getDataset
  • getModel
  • getRoutine

I am not planning to wrap every *SkipExceptionTranslation call in this PR, since that would widen the behavior change beyond this retry issue.

I will add matching tests for the three paths with a first 503 response and a successful second response, similar to the new getTable coverage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think that makes sense. We can create the necessary follow up for the other relevant RPCs. For now, let's try this with getTable.

Comment on lines +1127 to +1135
try {
return bigQueryRpc.getTableSkipExceptionTranslation(
completeTableId.getProject(),
completeTableId.getDataset(),
completeTableId.getTable(),
optionsMap);
} catch (HttpResponseException e) {
throw new BigQueryException(e);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. IIUC the root of the issue is that the default configuration isn't retrying on a HttpResponseException due to how it's configured in BaseService.

I think this will work, but I have a small concern about behavioral compatibility now that we are throwing a new exception (it may be a small possibility, but I'll need to do a bit more digging into this). BigQuery is an old library and I think it is possible that users may have written a custom ResultRetryAlgorithm to handle these cases since we weren't natively supporting it (e.g. ResultRetryAlgorithm catch specifically for HttpResponseException and parse out the status code). I don't want to wrap the exception and have the behavior changing for existing users.

One idea I have in mind is something for max compatibility is like:

private ResultRetryAlgorithm<Object> getDefaultRetryAlgorithm() {
  final ResultRetryAlgorithm<?> configuredAlgorithm = getOptions().getResultRetryAlgorithm();
  // 1. If the user configured a custom algorithm, respect it completely (do not apply fallback)
  if (configuredAlgorithm != BigQueryBaseService.DEFAULT_BIGQUERY_EXCEPTION_HANDLER) {
    ResultRetryAlgorithm<Object> customAlgorithm = (ResultRetryAlgorithm<Object>) configuredAlgorithm;
    return customAlgorithm;
  }
  // 2. If they are using the default handler, wrap it to add transient HTTP retries (like 503)
  return new ResultRetryAlgorithm<Object>() {
    @Override
    public TimedAttemptSettings createNextAttempt(
        Throwable previousThrowable, Object previousResponse, TimedAttemptSettings previousSettings) {
      return null; // Delegate timing to TimedRetryAlgorithm
    }
    @Override
    public boolean shouldRetry(Throwable previousThrowable, Object previousResponse) {
      // Check default transient HTTP rules first
      if (previousThrowable instanceof HttpResponseException) {
        int statusCode = ((HttpResponseException) previousThrowable).getStatusCode();
        if (statusCode == 500 || statusCode == 502 || statusCode == 503 || statusCode == 504) {
          return true;
        }
      }
      
      // Fall back to default ExceptionHandler rules (SocketException, ConnectException, etc.)
      ResultRetryAlgorithm<Object> delegate = (ResultRetryAlgorithm<Object>) configuredAlgorithm;
      return delegate.shouldRetry(previousThrowable, previousResponse);
    }
  };
}

And if this makes sense, we use this default for the relevant RPCs (getTable, ... , etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default retry logic does not retry HTTP 503 errors from BigQuery API on BigQuery.getTable()

2 participants