Changing run level doesn't block node manager#22
Conversation
There was a problem hiding this comment.
Pull request overview
This PR restructures run-level transition handling so that changing run level no longer blocks the classy_node process, while also consolidating “site/cluster identity” access through the main classy API and improving traceability/testing around run-level transitions.
Changes:
- Introduces
classy_rl_changer(supervised worker) to perform run-level transitions and executeon_change_run_levelhooks outside ofclassy_node. - Switches cluster/site getters to
classy:the_site/0andclassy:the_cluster/0, backed bypersistent_termupdates fromclassy_table’son_updatecallback. - Updates tests, trace event handling, documentation, and minor API additions (
classy_autocluster:candidates/0, ignore_anvl_build/).
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/classy_SUITE.erl | Updates tests to use classy:the_cluster/0, adjusts sleeps and run-level event attribution, and whitelists new RL changer error/timeout trace kinds. |
| src/classy.erl | Exposes the_site/0 + the_cluster/0 and routes at_lower_level/2 through the new run-level changer. |
| src/classy_vote_participant.erl | Switches site lookup to classy:the_site/0. |
| src/classy_vote_coordinator.erl | Switches site lookup to classy:the_site/0. |
| src/classy_uid.erl | Switches site lookup to classy:the_site/0. |
| src/classy_sup.erl | Adds classy_rl_changer as a supervised worker ahead of classy_node. |
| src/classy_rl_changer.erl | New gen_server that applies run-level changes and runs hooks with timeout/kill behavior. |
| src/classy_node.erl | Removes inline run-level management; publishes site/cluster via persistent_term; delegates RL changes to classy_rl_changer. |
| src/classy_internal.hrl | Adds new trace kind macros for run-level hook failures/timeouts. |
| src/classy_builtin_hooks.erl | Adds local to run-level trace event payload. |
| src/classy_autocluster.erl | Exposes candidates/0 helper API. |
| src/classy_autoclean.erl | Uses classy:the_site/0 and classy:the_cluster/0. |
| doc/classy.texi | Documents run_level_timeout and run_level_grace_period settings. |
| .gitignore | Ignores _anvl_build/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sync_set_run_level(Level) -> | ||
| classy_rl_changer:set(Level), | ||
| classy_rl_changer:at_lower_level(Level, fun() -> ok end). | ||
|
|
There was a problem hiding this comment.
The only process that is supposed to call set is this one. So it cannot lower anything lower than expected.
4cc7a76 to
ccc1121
Compare
ccc1121 to
1dfaac3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (4)
src/classy_autocluster.erl:124
- This process traps exits (
process_flag(trap_exit, true)), so supervisor shutdown will arrive as an{'EXIT', _, shutdown}message. Without handling it, the process will ignore shutdown and be force-killed after the supervisor timeout, potentially delaying shutdown and skipping normal cleanup.
-doc false.
handle_info(#to_discover{}, S) ->
{noreply, handle_discover(S)};
handle_info(Info, S) ->
?tp(warning, ?classy_unknown_event,
#{ kind => info
, content => Info
, server => ?MODULE
}),
{noreply, S}.
src/classy_uid.erl:155
- This gen_server traps exits, so supervisor shutdown will be delivered as an
{'EXIT', _, shutdown}message. The currenthandle_info/2will log it as an unknown event and continue running, forcing the supervisor to kill the process and potentially skipping cleanup/ordering guarantees.
handle_info(Info, S) ->
?tp(warning, ?classy_unknown_event,
#{ kind => info
, content => Info
, server => ?MODULE
src/classy_vote_coordinator.erl:176
- This
gen_statemtraps exits, so supervisor shutdown is delivered as aninfoevent{'EXIT', _, shutdown}. Without a dedicated handler, shutdown will be treated as an unknown event and the process will not terminate gracefully (it will be force-killed by the supervisor).
handle_event(enter, OldStage, Stage, D) ->
enter(OldStage, Stage, D);
handle_event({call, ReplyTo}, VoteData = #c_vote{}, Stage, D) ->
handle_vote(ReplyTo, Stage, VoteData, D);
handle_event(state_timeout, ?state_timeout, Stage, D) ->
handle_state_timeout(Stage, D);
%% Common:
handle_event(ET, Event, State, _Data) ->
%% TODO: put ID and MFAs into error messages
?tp(warning, ?classy_unknown_event,
#{ kind => ET
, content => Event
, state => State
, server => ?MODULE
}),
keep_state_and_data.
src/classy_vote_participant.erl:190
- This
gen_statemtraps exits, so supervisor shutdown is delivered as aninfoevent{'EXIT', _, shutdown}. Without handling it, shutdown is ignored and logged as an unknown event, and the supervisor must force-kill the process.
handle_event(enter, OldStage, Stage, D) ->
enter(OldStage, Stage, D);
handle_event(state_timeout, ?state_timeout, ?s_prepare, D) ->
%% Perform the actual vote:
do_real_vote(D);
handle_event({call, From}, #c_outcome{} = Outcome, ?s_wait_outcome, D) ->
do_receive_outcome(From, Outcome, D);
handle_event(ET, Event, State, _Data) ->
%% TODO: put ID and MFAs into error messages
?tp(warning, ?classy_unknown_event,
#{ kind => ET
, content => Event
, state => State
, server => ?MODULE
}),
keep_state_and_data.
| handle_info(_Info, State = #state{}) -> | ||
| {noreply, State}. |
| @@ -329,18 +292,17 @@ handle_info(Info, S) -> | |||
| {noreply, S}. | |||
Isolate parts that interact with the business apps into a separate process to avoid blocking classy's own workers.