Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -88,17 +88,15 @@ private void throwTronError(String strategy, String params, String servlet, Exc
@Override
protected void service(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {

RuntimeData runtimeData = new RuntimeData(req);
GlobalRateLimiter.acquire(runtimeData);

RuntimeData runtimeData = new RuntimeData(req);
IRateLimiter rateLimiter = container.get(KEY_PREFIX_HTTP, getClass().getSimpleName());

boolean acquireResource = true;
// Check per-endpoint first to avoid consuming global IP/QPS quota for requests
// that would be rejected by the per-endpoint limiter anyway.
boolean perEndpointAcquired = rateLimiter == null || rateLimiter.tryAcquire(runtimeData);
boolean acquireResource = perEndpointAcquired && GlobalRateLimiter.tryAcquire(runtimeData);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(commenting on line 98 because the issue lines 113 are outside the diff hunk)

[SHOULD] HTTP rejection should set 429 status, not implicit 200

The PR description says "Over-limit requests now return HTTP 429", but this branch only writes an error JSON body via resp.getWriter().println(...) and never calls resp.setStatus(...). The HTTP status stays at the default 200, so client SDKs cannot back off based on status code, gateways and Prometheus templates classify rate-limited requests as success, and the behavior is asymmetric with the gRPC path that returns RESOURCE_EXHAUSTED.

Suggestion: call resp.setStatus(HttpServletResponse.SC_TOO_MANY_REQUESTS) here (and consider a Retry-After header), or update the PR description to remove the 429 claim.


if (rateLimiter != null) {
acquireResource = rateLimiter.acquire(runtimeData);
}
String contextPath = req.getContextPath();
String url = Strings.isNullOrEmpty(req.getServletPath())
? MetricLabels.UNDEFINED : contextPath + req.getServletPath();
Expand All @@ -119,7 +117,9 @@ protected void service(HttpServletRequest req, HttpServletResponse resp)
} catch (Exception unexpected) {
logger.error("Http Api {}, Method:{}. Error:", url, req.getMethod(), unexpected);
} finally {
if (rateLimiter instanceof IPreemptibleRateLimiter && acquireResource) {
// Release whenever the per-endpoint permit was acquired (covers both the normal
// completion path and the case where GlobalRateLimiter rejected the request).
if (rateLimiter instanceof IPreemptibleRateLimiter && perEndpointAcquired) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(commenting on line 122 because the issue lines 113 are outside the diff hunk)

[NIT] IllegalAccessException is the wrong exception type for rate limiting

Util.printErrorMsg(new IllegalAccessException("lack of computing resources")) makes the response body contain java.lang.IllegalAccessException: lack of computing resources. IllegalAccessException is the standard JDK exception for reflective access failure and is unrelated to throttling, so the body misleads SDK authors and is operator-facing rather than client-facing. Since this branch is being rewritten anyway, swapping it now is the cheap moment.

Suggestion: use a clearer message or a dedicated RateLimitedException (or fold this into the setStatus(429) change so the body just says "rate limit exceeded").

((IPreemptibleRateLimiter) rateLimiter).release();
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
import com.google.common.cache.CacheBuilder;
import com.google.common.util.concurrent.RateLimiter;
import java.util.concurrent.TimeUnit;
import lombok.extern.slf4j.Slf4j;
import org.tron.core.config.args.Args;

@Slf4j
public class GlobalRateLimiter {

private static double QPS = Args.getInstance().getRateLimiterGlobalQps();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(commenting on line 14 because the issue lines 16 are outside the diff hunk)

[NIT] IP_QPS<=0 misconfiguration causes a silent fail-closed warn storm

If node.rateLimiter.global.IP.qps is misconfigured to 0 or a negative value, RateLimiter.create(IP_QPS) throws IllegalArgumentException. With the new fail-closed behavior, every new IP triggers the loader, fails, and emits a warn — so under hot traffic the log gets flooded and every client is rejected. The root cause (a bad config) is buried inside an in-flight code path instead of being reported at startup.

Suggestion: validate IP_QPS > 0 (and QPS > 0) at static initialization or Args parsing, and fail-fast if invalid, so misconfiguration surfaces before the first request.

Expand All @@ -18,18 +20,24 @@ public class GlobalRateLimiter {

private static RateLimiter rateLimiter = RateLimiter.create(QPS);

public static void acquire(RuntimeData runtimeData) {
rateLimiter.acquire();
public static boolean tryAcquire(RuntimeData runtimeData) {
String ip = runtimeData.getRemoteAddr();
if (Strings.isNullOrEmpty(ip)) {
return;
if (!Strings.isNullOrEmpty(ip)) {
RateLimiter r;
try {
// cache.get is atomic: only one loader executes per key under concurrent requests,
// preventing multiple RateLimiter instances from being created for the same IP.
r = cache.get(ip, () -> RateLimiter.create(IP_QPS));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT] cache.get atomicity claim has no concurrent regression test

The new cache.get(ip, () -> RateLimiter.create(IP_QPS)) (and the symmetric one in IPQpsStrategy.tryAcquire) relies on Guava's documented loader atomicity to fix the original getIfPresent + put race. The behavior is correct, but no test asserts it — only the documentation does. A future Guava upgrade or an accidental refactor back to getIfPresent + put would silently regress.

Suggestion: add a small concurrent test (CountDownLatch + 2+ threads + AtomicInteger counting loader invocations) that asserts the loader runs at most once per key under concurrent access.

} catch (Exception e) {
logger.warn("Failed to load IP rate limiter for {}, denying request: {}",
ip, e.getMessage());
return false;
}
if (!r.tryAcquire()) {
return false;
}
}
RateLimiter r = cache.getIfPresent(ip);
if (r == null) {
r = RateLimiter.create(IP_QPS);
cache.put(ip, r);
}
r.acquire();
return rateLimiter.tryAcquire();
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -104,44 +104,49 @@ public <ReqT, RespT> Listener<ReqT> interceptCall(ServerCall<ReqT, RespT> call,
IRateLimiter rateLimiter = container
.get(KEY_PREFIX_RPC, call.getMethodDescriptor().getFullMethodName());

RuntimeData runtimeData = new RuntimeData(call);
GlobalRateLimiter.acquire(runtimeData);

boolean acquireResource = true;
Listener<ReqT> listener = new ServerCall.Listener<ReqT>() {};

if (rateLimiter != null) {
acquireResource = rateLimiter.acquire(runtimeData);
RuntimeData runtimeData = new RuntimeData(call);
// Check per-endpoint first to avoid consuming global IP/QPS quota for requests
// that would be rejected by the per-endpoint limiter anyway.
boolean perEndpointAcquired = rateLimiter == null || rateLimiter.tryAcquire(runtimeData);
boolean acquireResource = perEndpointAcquired && GlobalRateLimiter.tryAcquire(runtimeData);

if (!acquireResource) {
// Release the per-endpoint permit when global rejected, to avoid semaphore leak.
if (rateLimiter instanceof IPreemptibleRateLimiter && perEndpointAcquired) {
((IPreemptibleRateLimiter) rateLimiter).release();
}
call.close(Status.fromCode(Code.RESOURCE_EXHAUSTED), new Metadata());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT] No metrics on rate-limit reject paths

The early-return path here (and the symmetric HTTP one in RateLimiterServlet.service) closes the call with RESOURCE_EXHAUSTED but emits no metrics. MetricsUtil.meterMark(NET_API_FAIL_QPS) only fires on startCall exceptions, so operators cannot observe rate-limit hit rate to tune thresholds. The PR claims observability as a benefit but reject metrics are still missing.

Suggestion: add a MetricsUtil.meterMark(...) (reuse NET_API_FAIL_QPS or introduce NET_API_RATE_LIMIT_REJECT_QPS) on both reject branches; can be a follow-up if out of scope.

return listener;
}

Listener<ReqT> listener = new ServerCall.Listener<ReqT>() {
};

try {
if (acquireResource) {
call.setMessageCompression(true);
ServerCall.Listener<ReqT> delegate = next.startCall(call, headers);

listener = new SimpleForwardingServerCallListener<ReqT>(delegate) {
@Override
public void onComplete() {
// must release the permit to avoid the leak of permit.
if (rateLimiter instanceof IPreemptibleRateLimiter) {
((IPreemptibleRateLimiter) rateLimiter).release();
}
call.setMessageCompression(true);
ServerCall.Listener<ReqT> delegate = next.startCall(call, headers);

listener = new SimpleForwardingServerCallListener<ReqT>(delegate) {
@Override
public void onComplete() {
// must release the permit to avoid the leak of permit.
if (rateLimiter instanceof IPreemptibleRateLimiter) {
((IPreemptibleRateLimiter) rateLimiter).release();
}
}

@Override
public void onCancel() {
// must release the permit to avoid the leak of permit.
if (rateLimiter instanceof IPreemptibleRateLimiter) {
((IPreemptibleRateLimiter) rateLimiter).release();
}
@Override
public void onCancel() {
// must release the permit to avoid the leak of permit.
if (rateLimiter instanceof IPreemptibleRateLimiter) {
((IPreemptibleRateLimiter) rateLimiter).release();
}
};
} else {
call.close(Status.fromCode(Code.RESOURCE_EXHAUSTED), new Metadata());
}
}
};
} catch (Exception e) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD] catch block on startCall failure should close the gRPC call

When next.startCall(call, headers) throws, this catch block releases the permit, marks metrics, and logs, but never invokes call.close(...). The returned listener is the empty new ServerCall.Listener<ReqT>() {} from line 107, so the client receives no status / trailers and only fails when the transport-level deadline fires. This is a pre-existing issue, but since the catch block is being rewritten in this PR (release added), it is a good moment to close it properly.

Suggestion: call call.close(Status.fromCode(Code.INTERNAL).withDescription("rpc handler init failed"), new Metadata()); in the catch block, and add a test that verifies call.close is invoked on startCall exceptions.

// next.startCall() failed — release the permit that was already acquired.
if (rateLimiter instanceof IPreemptibleRateLimiter) {
((IPreemptibleRateLimiter) rateLimiter).release();
}
String grpcFailMeterName = MetricsKey.NET_API_DETAIL_FAIL_QPS
+ call.getMethodDescriptor().getFullMethodName();
MetricsUtil.meterMark(MetricsKey.NET_API_FAIL_QPS);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ public DefaultBaseQqsAdapter(String paramString) {
}

@Override
public boolean acquire(RuntimeData data) {
return strategy.acquire();
public boolean tryAcquire(RuntimeData data) {
return strategy.tryAcquire();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ public void release() {
}

@Override
public boolean acquire(RuntimeData data) {
return strategy.acquire();
public boolean tryAcquire(RuntimeData data) {
return strategy.tryAcquire();
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ public IPQPSRateLimiterAdapter(String paramString) {
}

@Override
public boolean acquire(RuntimeData data) {
return strategy.acquire(data.getRemoteAddr());
public boolean tryAcquire(RuntimeData data) {
return strategy.tryAcquire(data.getRemoteAddr());
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@

public interface IRateLimiter {

boolean acquire(RuntimeData data);
boolean tryAcquire(RuntimeData data);

}
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ public QpsRateLimiterAdapter(String paramString) {
}

@Override
public boolean acquire(RuntimeData data) {
return strategy.acquire();
public boolean tryAcquire(RuntimeData data) {
return strategy.tryAcquire();
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,11 @@
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;
import lombok.extern.slf4j.Slf4j;


@Slf4j
public class GlobalPreemptibleStrategy extends Strategy {

public static final String STRATEGY_PARAM_PERMIT = "permit";
public static final int DEFAULT_PERMIT_NUM = 1;
public static final int DEFAULT_ACQUIRE_TIMEOUT = 2;

private Semaphore sp;

public GlobalPreemptibleStrategy(String paramString) {
Expand All @@ -29,20 +23,13 @@ protected Map<String, ParamItem> defaultParam() {
return map;
}

public boolean acquire() {

try {
if (!sp.tryAcquire(DEFAULT_ACQUIRE_TIMEOUT, TimeUnit.SECONDS)) {
throw new RuntimeException();
}

} catch (InterruptedException e) {
logger.error("acquire permit with error: {}", e.getMessage());
Thread.currentThread().interrupt();
} catch (RuntimeException e1) {
return false;
}
return true;
// Non-blocking: immediately rejects if no permit is available.
// Intentional change from the previous tryAcquire(2, TimeUnit.SECONDS) behaviour:
// blocking the caller for up to 2 s ties up Netty IO / gRPC executor threads and
// masks overload rather than shedding it. All rate-limiting in this stack is now
// non-blocking to keep the thread model consistent with GlobalRateLimiter.
public boolean tryAcquire() {
return sp.tryAcquire();
}

public void release() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import lombok.extern.slf4j.Slf4j;

@Slf4j
public class IPQpsStrategy extends Strategy {

public static final String STRATEGY_PARAM_IPQPS = "qps";
Expand All @@ -19,14 +21,18 @@ public IPQpsStrategy(String paramString) {
super(paramString);
}

public boolean acquire(String ip) {
RateLimiter limiter = ipLimiter.getIfPresent(ip);
if (limiter == null) {
limiter = newRateLimiter();
ipLimiter.put(ip, limiter);
public boolean tryAcquire(String ip) {
RateLimiter limiter;
try {
// cache.get is atomic: only one loader executes per key under concurrent requests,
// preventing multiple RateLimiter instances from being created for the same IP.
limiter = ipLimiter.get(ip, this::newRateLimiter);
} catch (Exception e) {
logger.warn("Failed to load IP rate limiter for {}, denying request: {}",
ip, e.getMessage());
return false;
}
limiter.acquire();
return true;
return limiter.tryAcquire();
}

private RateLimiter newRateLimiter() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ protected Map<String, ParamItem> defaultParam() {
return map;
}

public boolean acquire() {
rateLimiter.acquire();
return true;
public boolean tryAcquire() {
return rateLimiter.tryAcquire();
}
}
Loading
Loading