Add FFE first remote config request test#7023
Draft
leoromanovsky wants to merge 5 commits into
Draft
Conversation
Contributor
|
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Customers can have an Agent running for a long time before a tracer with Feature Flags enabled starts. In that shape, the Agent may not have an
FFE_FLAGScache yet. The tracer needs to advertise FFE on its first Remote Config request so the Agent can use the new-client backend fetch path immediately. IfFFE_FLAGSonly appears on a later poll, the application can keep serving default flag values during startup.There is a second startup shape we also want covered: the app starts while the Agent/RC endpoint is unavailable, then the Agent comes online after the tracer has already made its first request. In that case the provider must recover without an app restart once
FFE_FLAGSis delivered.We observed different behavior and corrected it in the Java client: DataDog/dd-trace-java#11465. The Go tracer is the reference shape: it subscribes FFE during tracer Remote Config startup in https://github.com/DataDog/dd-trace-go/blob/3ded6653e44aeb0d27bd5944e1e8033775473768/ddtrace/tracer/remote_config.go#L508-L512, and the OpenFeature bridge registers
FFE_FLAGSwith the FFE capability in https://github.com/DataDog/dd-trace-go/blob/3ded6653e44aeb0d27bd5944e1e8033775473768/internal/openfeature/rc_subscription.go#L42-L71.Changes
This adds shared FFE system-test coverage under the existing
FEATURE_FLAGGING_AND_EXPERIMENTATIONscenario.Test_FFE_First_Remote_Config_Requestchecks the first tracer/v0.7/configrequest captured by the library interface. It requires the client product list to includeFFE_FLAGS, and it requires the advertised capabilities to includeFFE_FLAG_CONFIGURATION_RULES. The test intentionally has no setup call to/ffe, so it checks the startup Remote Config subscription, not a subscription that appears only after the first flag-evaluation endpoint is hit.Test_FFE_RC_Down_Then_Upmakes the recovery case obvious. The proxy first returns503for/v0.7/config; the test waits until the tracer sees that failure and verifies a flag evaluation returns the in-code default. Then the test restores Remote Config by publishing anFFE_FLAGSconfig and evaluates the same flag again. A recovered provider returns the delivered value,on; a provider stuck in the startup error state keeps returning the default and fails the test.The manifests now keep the new assertions active where the behavior is implemented and mark observed gaps explicitly. Java is gated to
v1.63.0-SNAPSHOT. Python recovery is gated tov4.11.0-dev, while Python first-request subscription remains marked asbug (FFL-2339)until the dd-trace-py fix lands. Dotnet recovery is also marked asbug (FFL-2339)because the SDK currently returnsnullinstead of the in-code default while RC is unavailable.Decisions
This is intentionally a cross-language system test, not a Java-only assertion. The bug shape is about the tracer/provider Remote Config contract: FFE must be subscribed early enough, and a provider that starts before config is available must still recover when config arrives later.
The recovery test simulates the Agent/RC outage through the system-tests proxy instead of physically stopping and starting the Agent container. That keeps the test focused on the behavior SDKs can control: first RC fetch fails, a later RC payload succeeds, and evaluations switch from defaults to the delivered flag value without restarting the app.