Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import static com.google.common.base.Preconditions.checkArgument;
import static java.net.HttpURLConnection.HTTP_NOT_FOUND;

import com.google.api.client.http.HttpResponseException;
import com.google.api.core.BetaApi;
import com.google.api.core.InternalApi;
import com.google.api.gax.paging.Page;
Expand Down Expand Up @@ -1123,11 +1124,15 @@ && getOptions().getOpenTelemetryTracer() != null) {
new Callable<com.google.api.services.bigquery.model.Table>() {
@Override
public com.google.api.services.bigquery.model.Table call() throws IOException {
return bigQueryRpc.getTableSkipExceptionTranslation(
completeTableId.getProject(),
completeTableId.getDataset(),
completeTableId.getTable(),
optionsMap);
try {
return bigQueryRpc.getTableSkipExceptionTranslation(
completeTableId.getProject(),
completeTableId.getDataset(),
completeTableId.getTable(),
optionsMap);
} catch (HttpResponseException e) {
throw new BigQueryException(e);
}
Comment on lines +1133 to +1135

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While this fix correctly addresses the retry behavior for getTable by wrapping HttpResponseException in BigQueryException, the same pattern of using *SkipExceptionTranslation methods (such as getModelSkipExceptionTranslation, getRoutineSkipExceptionTranslation, and potentially getDatasetSkipExceptionTranslation) is used elsewhere in BigQueryImpl. These other methods likely suffer from the same issue where transient server errors (like HTTP 503) are not retried. Consider applying the same try-catch wrapping to those methods as well to ensure consistent retry behavior across all resource retrieval APIs.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lqiu96 As per the gemini's suggestion does below scope make sense?

the current change only covers getTable, but getDataset, getModel and getRoutine use the same pattern - the *SkipExceptionTranslation RPC call runs inside BigQueryRetryHelper.runWithRetries(...), so a raw HttpResponseException can avoid the normal BigQueryException translation before retry handling.

So my plan is to keep the follow-up scoped to those resource getter paths only:

  • getDataset
  • getModel
  • getRoutine

I am not planning to wrap every *SkipExceptionTranslation call in this PR, since that would widen the behavior change beyond this retry issue.

I will add matching tests for the three paths with a first 503 response and a successful second response, similar to the new getTable coverage.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think that makes sense. We can create the necessary follow up for the other relevant RPCs. For now, let's try this with getTable.

Comment on lines +1127 to +1135

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. IIUC the root of the issue is that the default configuration isn't retrying on a HttpResponseException due to how it's configured in BaseService.

I think this will work, but I have a small concern about behavioral compatibility now that we are throwing a new exception (it may be a small possibility, but I'll need to do a bit more digging into this). BigQuery is an old library and I think it is possible that users may have written a custom ResultRetryAlgorithm to handle these cases since we weren't natively supporting it (e.g. ResultRetryAlgorithm catch specifically for HttpResponseException and parse out the status code). I don't want to wrap the exception and have the behavior changing for existing users.

One idea I have in mind is something for max compatibility is like:

private ResultRetryAlgorithm<Object> getDefaultRetryAlgorithm() {
  final ResultRetryAlgorithm<?> configuredAlgorithm = getOptions().getResultRetryAlgorithm();
  // 1. If the user configured a custom algorithm, respect it completely (do not apply fallback)
  if (configuredAlgorithm != BigQueryBaseService.DEFAULT_BIGQUERY_EXCEPTION_HANDLER) {
    ResultRetryAlgorithm<Object> customAlgorithm = (ResultRetryAlgorithm<Object>) configuredAlgorithm;
    return customAlgorithm;
  }
  // 2. If they are using the default handler, wrap it to add transient HTTP retries (like 503)
  return new ResultRetryAlgorithm<Object>() {
    @Override
    public TimedAttemptSettings createNextAttempt(
        Throwable previousThrowable, Object previousResponse, TimedAttemptSettings previousSettings) {
      return null; // Delegate timing to TimedRetryAlgorithm
    }
    @Override
    public boolean shouldRetry(Throwable previousThrowable, Object previousResponse) {
      // Check default transient HTTP rules first
      if (previousThrowable instanceof HttpResponseException) {
        int statusCode = ((HttpResponseException) previousThrowable).getStatusCode();
        if (statusCode == 500 || statusCode == 502 || statusCode == 503 || statusCode == 504) {
          return true;
        }
      }
      
      // Fall back to default ExceptionHandler rules (SocketException, ConnectException, etc.)
      ResultRetryAlgorithm<Object> delegate = (ResultRetryAlgorithm<Object>) configuredAlgorithm;
      return delegate.shouldRetry(previousThrowable, previousResponse);
    }
  };
}

And if this makes sense, we use this default for the relevant RPCs (getTable, ... , etc).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lqiu96 i think that makes sense. I agree the current wrapping can change what a custom ResultRetryAlgorithm sees.

I am planning to change the approach like this:

  • leave getTableSkipExceptionTranslation throwing the original HttpResponseException
  • if the user configured a custom ResultRetryAlgorithm, pass it through unchanged
  • only when the configured algorithm is the default BigQuery exception handler, wrap it with a small default-only retry check for transient HTTP response codes
  • keep this PR scoped to getTable and leave other RPCs for a follow up

Rough shape of the change ( Same to what you proposed ) :

ResultRetryAlgorithm<?> getTableRetryAlgorithm() {
  ResultRetryAlgorithm<?> configured = getOptions().getResultRetryAlgorithm();

  if (configured != BigQueryBaseService.DEFAULT_BIGQUERY_EXCEPTION_HANDLER) {
    return configured;
  }

  return new ResultRetryAlgorithm<Object>() {
    @Override
    public boolean shouldRetry(Throwable previousThrowable, Object previousResponse) {
      if (previousThrowable instanceof HttpResponseException) {
        int code = ((HttpResponseException) previousThrowable).getStatusCode();
        if (code == 500 || code == 502 || code == 503 || code == 504) {
          return true;
        }
      }

      return defaultHandler.shouldRetry(previousThrowable, previousResponse);
    }

    @Override
    public TimedAttemptSettings createNextAttempt(
        Throwable previousThrowable, Object previousResponse, TimedAttemptSettings previousSettings) {
      return defaultHandler.createNextAttempt(previousThrowable, previousResponse, previousSettings);
    }
  };
}

I will also add a regression test for the custom retry case, so we verify custom algorithms still see the raw HttpResponseException.

Let me know if this is good and i can make these changes

}
},
getOptions().getRetrySettings(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@
import static org.mockito.Mockito.verify;
import static org.mockito.Mockito.when;

import com.google.api.client.googleapis.json.GoogleJsonError;
import com.google.api.client.googleapis.json.GoogleJsonResponseException;
import com.google.api.client.http.HttpHeaders;
import com.google.api.client.http.HttpResponseException;
import com.google.api.gax.paging.Page;
import com.google.api.services.bigquery.model.ErrorProto;
import com.google.api.services.bigquery.model.GetQueryResultsResponse;
Expand Down Expand Up @@ -935,6 +939,37 @@ void testGetTable() throws IOException {
.getTableSkipExceptionTranslation(PROJECT, DATASET, TABLE, EMPTY_RPC_OPTIONS);
}

@Test
void testGetTableFailureShouldRetryServerErrors() throws IOException {
GoogleJsonError error = new GoogleJsonError();
error.setMessage("Visibility check was unavailable. Please retry the request");
error.setCode(503);
GoogleJsonError.ErrorInfo errorInfo = new GoogleJsonError.ErrorInfo();
errorInfo.setReason("backendError");
error.setErrors(ImmutableList.of(errorInfo));

when(bigqueryRpcMock.getTableSkipExceptionTranslation(
PROJECT, DATASET, TABLE, EMPTY_RPC_OPTIONS))
.thenThrow(new GoogleJsonResponseException(serverErrorResponse(), error))
.thenReturn(TABLE_INFO_WITH_PROJECT.toPb());

bigquery =
options.toBuilder()
.setRetrySettings(ServiceOptions.getDefaultRetrySettings())
.build()
.getService();

Table table = bigquery.getTable(DATASET, TABLE);

assertEquals(new Table(bigquery, new TableInfo.BuilderImpl(TABLE_INFO_WITH_PROJECT)), table);
verify(bigqueryRpcMock, times(2))
.getTableSkipExceptionTranslation(PROJECT, DATASET, TABLE, EMPTY_RPC_OPTIONS);
}

private static HttpResponseException.Builder serverErrorResponse() {
return new HttpResponseException.Builder(503, "Service Unavailable", new HttpHeaders());
}

@Test
void testGetModel() throws IOException {
when(bigqueryRpcMock.getModelSkipExceptionTranslation(
Expand Down
Loading