Skip to content

Changing run level doesn't block node manager#22

Open
ieQu1 wants to merge 11 commits into
masterfrom
dev/nonblocking-run-levels
Open

Changing run level doesn't block node manager#22
ieQu1 wants to merge 11 commits into
masterfrom
dev/nonblocking-run-levels

Conversation

@ieQu1

@ieQu1 ieQu1 commented Jun 19, 2026

Copy link
Copy Markdown
Member

Isolate parts that interact with the business apps into a separate process to avoid blocking classy's own workers.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures run-level transition handling so that changing run level no longer blocks the classy_node process, while also consolidating “site/cluster identity” access through the main classy API and improving traceability/testing around run-level transitions.

Changes:

  • Introduces classy_rl_changer (supervised worker) to perform run-level transitions and execute on_change_run_level hooks outside of classy_node.
  • Switches cluster/site getters to classy:the_site/0 and classy:the_cluster/0, backed by persistent_term updates from classy_table’s on_update callback.
  • Updates tests, trace event handling, documentation, and minor API additions (classy_autocluster:candidates/0, ignore _anvl_build/).

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/classy_SUITE.erl Updates tests to use classy:the_cluster/0, adjusts sleeps and run-level event attribution, and whitelists new RL changer error/timeout trace kinds.
src/classy.erl Exposes the_site/0 + the_cluster/0 and routes at_lower_level/2 through the new run-level changer.
src/classy_vote_participant.erl Switches site lookup to classy:the_site/0.
src/classy_vote_coordinator.erl Switches site lookup to classy:the_site/0.
src/classy_uid.erl Switches site lookup to classy:the_site/0.
src/classy_sup.erl Adds classy_rl_changer as a supervised worker ahead of classy_node.
src/classy_rl_changer.erl New gen_server that applies run-level changes and runs hooks with timeout/kill behavior.
src/classy_node.erl Removes inline run-level management; publishes site/cluster via persistent_term; delegates RL changes to classy_rl_changer.
src/classy_internal.hrl Adds new trace kind macros for run-level hook failures/timeouts.
src/classy_builtin_hooks.erl Adds local to run-level trace event payload.
src/classy_autocluster.erl Exposes candidates/0 helper API.
src/classy_autoclean.erl Uses classy:the_site/0 and classy:the_cluster/0.
doc/classy.texi Documents run_level_timeout and run_level_grace_period settings.
.gitignore Ignores _anvl_build/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/classy_rl_changer.erl Outdated
Comment thread src/classy_rl_changer.erl Outdated
Comment thread src/classy_rl_changer.erl Outdated
Comment thread src/classy_rl_changer.erl Outdated
Comment thread src/classy_node.erl
Comment on lines +647 to +650
sync_set_run_level(Level) ->
classy_rl_changer:set(Level),
classy_rl_changer:at_lower_level(Level, fun() -> ok end).

@ieQu1 ieQu1 Jun 19, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only process that is supposed to call set is this one. So it cannot lower anything lower than expected.

@ieQu1 ieQu1 force-pushed the dev/nonblocking-run-levels branch from 4cc7a76 to ccc1121 Compare June 19, 2026 16:35
@ieQu1 ieQu1 force-pushed the dev/nonblocking-run-levels branch from ccc1121 to 1dfaac3 Compare June 19, 2026 16:56
@ieQu1 ieQu1 requested a review from Copilot June 19, 2026 16:57

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

Comment thread src/classy_autocluster.erl
Comment thread test/classy_SUITE.erl Outdated
@ieQu1 ieQu1 marked this pull request as ready for review June 19, 2026 17:12
Comment thread src/classy_rl_changer.erl Outdated
@ieQu1 ieQu1 requested a review from thalesmg June 19, 2026 20:24
Comment thread src/classy_rl_changer.erl Outdated
Comment thread src/classy_rl_changer.erl Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 18 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (4)

src/classy_autocluster.erl:124

  • This process traps exits (process_flag(trap_exit, true)), so supervisor shutdown will arrive as an {'EXIT', _, shutdown} message. Without handling it, the process will ignore shutdown and be force-killed after the supervisor timeout, potentially delaying shutdown and skipping normal cleanup.
-doc false.
handle_info(#to_discover{}, S) ->
  {noreply, handle_discover(S)};
handle_info(Info, S) ->
  ?tp(warning, ?classy_unknown_event,
      #{ kind => info
       , content => Info
       , server => ?MODULE
       }),
  {noreply, S}.

src/classy_uid.erl:155

  • This gen_server traps exits, so supervisor shutdown will be delivered as an {'EXIT', _, shutdown} message. The current handle_info/2 will log it as an unknown event and continue running, forcing the supervisor to kill the process and potentially skipping cleanup/ordering guarantees.
handle_info(Info, S) ->
  ?tp(warning, ?classy_unknown_event,
      #{ kind => info
       , content => Info
       , server => ?MODULE

src/classy_vote_coordinator.erl:176

  • This gen_statem traps exits, so supervisor shutdown is delivered as an info event {'EXIT', _, shutdown}. Without a dedicated handler, shutdown will be treated as an unknown event and the process will not terminate gracefully (it will be force-killed by the supervisor).
handle_event(enter, OldStage, Stage, D) ->
  enter(OldStage, Stage, D);
handle_event({call, ReplyTo}, VoteData = #c_vote{}, Stage, D) ->
  handle_vote(ReplyTo, Stage, VoteData, D);
handle_event(state_timeout, ?state_timeout, Stage, D) ->
  handle_state_timeout(Stage, D);
%% Common:
handle_event(ET, Event, State, _Data) ->
  %% TODO: put ID and MFAs into error messages
  ?tp(warning, ?classy_unknown_event,
      #{ kind    => ET
       , content => Event
       , state   => State
       , server  => ?MODULE
       }),
  keep_state_and_data.

src/classy_vote_participant.erl:190

  • This gen_statem traps exits, so supervisor shutdown is delivered as an info event {'EXIT', _, shutdown}. Without handling it, shutdown is ignored and logged as an unknown event, and the supervisor must force-kill the process.
handle_event(enter, OldStage, Stage, D) ->
  enter(OldStage, Stage, D);
handle_event(state_timeout, ?state_timeout, ?s_prepare, D) ->
  %% Perform the actual vote:
  do_real_vote(D);
handle_event({call, From}, #c_outcome{} = Outcome, ?s_wait_outcome, D) ->
  do_receive_outcome(From, Outcome, D);
handle_event(ET, Event, State, _Data) ->
  %% TODO: put ID and MFAs into error messages
  ?tp(warning, ?classy_unknown_event,
      #{ kind    => ET
       , content => Event
       , state   => State
       , server  => ?MODULE
       }),
  keep_state_and_data.

Comment thread src/classy_membership.erl
Comment on lines 392 to 393
handle_info(_Info, State = #state{}) ->
{noreply, State}.
Comment thread src/classy_autoclean.erl
Comment thread src/classy_node.erl
Comment on lines 283 to 292
@@ -329,18 +292,17 @@ handle_info(Info, S) ->
{noreply, S}.
Comment thread src/classy_node.erl
Comment thread src/classy_rl_changer.erl
Comment thread src/classy_rl_changer.erl
Comment thread src/classy_rl_changer.erl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants