feat: enhance logging capabilities and configuration#393
feat: enhance logging capabilities and configuration#393niteshpurohit wants to merge 16 commits into
Conversation
- Introduced structured logging support to improve log readability and parsing. - Added access and error log file paths to the configuration for better log management. - Implemented logging for runtime events, including worker lifecycle and health state changes. - Created methods for logging access events and runtime errors to separate concerns. - Updated runtime configuration to include new logging options, ensuring they are configurable via environment variables and Ruby options. - Enhanced the control plane to serve metrics and stats endpoints, providing observability into the system's performance. closes: #124 closes: #150
There was a problem hiding this comment.
Pull request overview
This PR expands Vajra’s operator-facing observability surface by adding configurable access/error log destinations, introducing worker health state tracking, and exposing native control-plane endpoints for stats and Prometheus-style metrics.
Changes:
- Added new runtime configuration options (
access_log,error_log,structured_logs,stats_path,metrics_endpoint) and plumbed them from Ruby → native runtime. - Implemented runtime/access/error logging helpers and added access logging for both app and control-plane responses.
- Added worker health state tracking plus control-plane
/stats(JSON) and/metrics(text) endpoints, with new E2E coverage.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| gems/vajra/spec/support/documented_server_options.rb | Updates documented/native option fixtures to include the new logging + control-plane config. |
| gems/vajra/spec/e2e/vajra/configuration_spec.rb | Adds E2E coverage for configured stats/metrics endpoints and adjusts an ordering assertion. |
| gems/vajra/lib/vajra.rb | Extends documented/native start option key lists to include new logging/control-plane options. |
| gems/vajra/ext/vajra/vajra.hpp | Extends native start signature to accept logging + control-plane parameters. |
| gems/vajra/ext/vajra/vajra.cpp | Passes newly loaded runtime config fields into the native start call. |
| gems/vajra/ext/vajra/runtime/worker_pool.hpp | Adds worker health state enum and shared telemetry/transition counters. |
| gems/vajra/ext/vajra/runtime/runtime_logging.hpp | Adds declarations for logging configuration and new access/error logging helpers. |
| gems/vajra/ext/vajra/runtime/runtime_logging.cpp | Implements file-backed logging helpers, access/error logging, and enriches lifecycle logs. |
| gems/vajra/ext/vajra/runtime/runtime_config.hpp | Extends RuntimeConfig with logging + control-plane fields. |
| gems/vajra/ext/vajra/runtime/runtime_config.cpp | Loads/validates new options from Ruby/env and returns them in RuntimeConfig. |
| gems/vajra/ext/vajra/runtime/native_runtime.hpp | Adds health policy data and a method to refresh worker health. |
| gems/vajra/ext/vajra/runtime/native_runtime.cpp | Implements health refresh, tracks worker telemetry, wires control-plane config, configures logging. |
| gems/vajra/ext/vajra/request/request_processor.cpp | Logs access events and handles control-plane responses before Rack execution. |
| gems/vajra/ext/vajra/request/request_executor.hpp | Adds control_response virtual hook for control-plane handling. |
| gems/vajra/ext/vajra/request/request_executor.cpp | Provides default control_response implementation returning nullopt. |
| gems/vajra/ext/vajra/rack/rack_request_executor.hpp | Adds control-plane config plumbing and default stats/metrics payload hooks. |
| gems/vajra/ext/vajra/rack/rack_request_executor.cpp | Implements control-plane request matching and emits stats/metrics payloads. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (3)
gems/vajra/ext/vajra/runtime/runtime_logging.cpp:392
logging_config.structured_logsis accessed without synchronization here. Sinceconfigure_runtime_loggingcan reset the config/streams underlogging_mutex, this should read the flag (and any derived state) under the same mutex (or via atomics) to avoid undefined behavior in multi-threaded logging.
void Vajra::runtime::log_runtime_error(const std::string &message)
{
if (logging_config.structured_logs)
{
write_error_line(
"{\"component\":\"error\",\"timestamp\":\"" + utc_timestamp() + "\",\"message\":" +
escaped_log_value(message) + "}");
return;
gems/vajra/ext/vajra/runtime/runtime_logging.cpp:407
logging_config.structured_logsis read without locking, but is written underlogging_mutexinconfigure_runtime_logging. To avoid a data race during concurrent access logging, read a synchronized snapshot (or make the flag atomic) before branching.
void Vajra::runtime::log_access_event(const std::string &method, const std::string &target, int status_code)
{
if (logging_config.structured_logs)
{
std::ostringstream line;
line << "{\"component\":\"access\""
<< ",\"timestamp\":\"" << utc_timestamp() << "\""
<< ",\"method\":" << escaped_log_value(method)
<< ",\"target\":" << escaped_log_value(target)
<< ",\"status\":" << status_code
gems/vajra/ext/vajra/runtime/native_runtime.cpp:1434
- The runtime writes log lines (potentially to buffered
std::ofstreams) and then forks workers shortly afterwards. Any buffered-but-unflushed data at fork time can be duplicated (flushed by both parent and child) or lost. Consider flushing log streams after the boot banner/configure step, or deferring opening/initializing file streams until afterfork()in each process.
const bool debug_logging = debug_logging_enabled(config.log_level);
{
const std::lock_guard<std::mutex> lock(server_mutex_);
health_policy_ = health_policy_for(config);
debug_logging_.store(debug_logging, std::memory_order_release);
}
configure_runtime_logging(config.structured_logs, config.access_log, config.error_log);
log_runtime_banner_start(config.host, config.port, config.workers, config.min_threads, config.max_threads);
const BootContractResult master_boot_result = BootContract::run(
BootContractConfig{config.port, config.max_request_head_bytes, kMasterPreloadRuntimeRole});
BootContract::ensure_ready(master_boot_result);
std::vector<std::shared_ptr<SharedWorkerState>> booted_worker_states;
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 25 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
gems/vajra/spec/e2e/vajra/configuration_spec.rb:165
wait_for_runtime_outputis now implemented both here and inspec/e2e/vajra/support/http_helpers.rb(included viaVajraE2EHttpHelpers). Keeping two copies increases drift risk; consider removing this local definition and using the shared helper method instead.
def wait_for_runtime_output(output, runtime_output, pattern, count: 1, timeout: 2)
Timeout.timeout(timeout) do
loop do
runtime_output << read_available_output(output)
break if runtime_output.scan(pattern).size >= count
sleep 0.01
end
end
end
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 26 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
gems/vajra/ext/vajra/runtime/native_runtime.cpp:2061
NativeRuntime::stop()no longer callsstop_worker_processes(). As a result, callingVajra.stopcan leave worker processes running (and the runtime may not actually shut down cleanly) since only the server is stopped here. Consider stopping worker processes (and/or waiting for them to exit) as part ofstop()again, consistent with the shutdown path instart().
void Vajra::runtime::NativeRuntime::stop()
{
const bool had_runtime = runtime_running();
if (had_runtime)
{
begin_runtime_shutdown();
}
Vajra::Server *server = nullptr;
std::shared_ptr<Vajra::Server> server_handle;
{
std::lock_guard<std::mutex> lock(server_mutex_);
server_handle = server_instance_;
server = server_handle.get();
}
if (server != nullptr)
{
server->stop();
}
}
- Introduced tracing capabilities using OpenTelemetry to monitor request and runtime lifecycle spans. - Added configuration options for enabling tracing, specifying the tracing endpoint, and service name. - Implemented methods to manage tracing state and lifecycle callbacks. - Enhanced logging to include tracing status and details in worker lifecycle events. - Created a new Tracing module to encapsulate tracing logic and state management. - Added tests to ensure proper functionality and handling of tracing options.
| module Vajra | ||
| module Internal | ||
| module Tracing | ||
| type start_options = Hash[Symbol, bool | String] |
| TRACE_STATE: TraceState | ||
|
|
||
| def self.install_from_start_options!: (start_options) -> bool | ||
| def self.with_request_span: [T] (Hash[String, String]) { () -> T } -> T |
| #if defined(__APPLE__) | ||
| #pragma clang diagnostic push | ||
| #pragma clang diagnostic ignored "-Wdeprecated-declarations" | ||
| #endif | ||
| pid = fork(); |
closes: #124
closes: #137
closes: #150