Skip to content

OBE-10327: rejection reports for splunk hec / in general#114

Open
akshayakumar-t wants to merge 8 commits into
masterfrom
OBE-10327_log_hec_api_call
Open

OBE-10327: rejection reports for splunk hec / in general#114
akshayakumar-t wants to merge 8 commits into
masterfrom
OBE-10327_log_hec_api_call

Conversation

@akshayakumar-t

@akshayakumar-t akshayakumar-t commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Extracts rejection-report infrastructure from the Elasticsearch sink into a shared
src/sinks/util/rejection_report.rs module, then ports the same capability to both
Splunk HEC sinks (logs + metrics).

Before: Only the ES sink logged details about rejected batches. HEC silently dropped
rejections with no structured log output and no counter.

After: Both ES and HEC emit a structured error! log on rejection/error, increment
a hec_rejected{endpoint=...} counter, and expose a rejection_report config field
that controls how much detail is logged.

What changed

New: src/sinks/util/rejection_report.rs

  • RejectionReport enum (moved from elasticsearch/mod.rs, re-exported there via
    pub use so existing configs and imports are unaffected):

    • stats (default) — increment counters only, no bodies logged
    • response — also log the HTTP response body
    • request_response — log both the request payload and response body; use with small
      batch sizes during debugging only, as event payloads may contain sensitive data
  • RejectionContext trait — sink-specific plug-in points: error_code,
    error_message, record_rejection. Each sink implements this to wire in its own
    counters and response parsing without touching the shared logging logic.

  • emit_rejection_error — generic free function that handles all three
    RejectionReport branches, decompresses the request body when needed, and emits a
    structured error! log.

Elasticsearch (elasticsearch/service.rs)

  • Added ElasticsearchRejectionContext implementing RejectionContext; holds the
    existing Telemetry{rejected, indexed} counters.
  • emit_bad_response_error removed; get_event_status now delegates to
    emit_rejection_error.
  • err_summary signature changed from (&Response<Bytes>) to (u16, &Bytes),
    decoupling it from the HTTP response type.

Splunk HEC (splunk_hec/common/service.rs)

  • Added HecRejectionContext implementing RejectionContext; holds a single
    rejected: Counter. Parses Splunk's JSON error body ({"text":"..."}) to surface
    a human-readable message; falls back to the status code.
  • HecService<S> gains three new fields: rej_rpt, compression, context.
  • call() clones the request body before sending (only when needs_request() is true,
    i.e. RequestResponse mode) and calls emit_rejection_error on 4xx/5xx.
  • 5xx responses downgrade RequestResponseResponse (request payload is not useful
    for server-side failures).
  • ResponseExt trait extended with status_code() -> u16 so the generic service can
    expose the HTTP status without depending on http::Response<Bytes> directly.

Config (logs/config.rs, metrics/config.rs)

Both HEC sink configs gain:

# Controls how much detail is logged when Splunk HEC rejects a batch.
# Options: stats (default), response, request_response                                                                                                                                                   
rejection_report = "stats"
                                                                                                                                                                                                         
Tests                                                  
                                                                                                                                                                                                         
- rejection_report.rs: serde roundtrip (including "normal" alias), needs_request                                                                                                                         
behaviour, record_rejection called once per invocation across all modes,
decompression path, default error_code formatting.                                                                                                                                                       
- service.rs (HEC): 4xx → Rejected, 5xx → Errored, RequestResponse mode does                                                                                                                             
not affect event status on either.                                                                                                                                                                       
                                                                                                                                                                                                         
Behavioral notes                                                                                                                                                                                         
                                                       
- category field removed from logs — the old emit_bad_response_error emitted                                                                                                                             
category="es_rej_rpt". This field is dropped; Vector's structured logging already
attaches component_kind, component_id, and component_type to every log line,                                                                                                                             
making a redundant category unnecessary. Update any log queries filtering on                                                                                                                             
category="es_rej_rpt" to use component_type="elasticsearch" instead.                                                                                                                                     
- Sensitive data warning — request_response mode logs the full event payload.                                                                                                                            
Do not enable it in production if events contain PII, credentials, or other sensitive                                                                                                                    
data. 

akshayakumar-t and others added 7 commits June 15, 2026 12:08
…ice rejection paths

Co-Authored-By: Akshaya's Agent <akshaya.kumar+agent@sentinelone.com>
Co-Authored-By: Akshaya's Agent <akshaya.kumar+agent@sentinelone.com>
Sinks no longer emit a category field in rejection logs — vector's own
structured logging already identifies the component. The trait method is
removed so implementations are not forced to provide one.

Co-Authored-By: Akshaya's Agent <akshaya.kumar+agent@sentinelone.com>
- Inline rejected counter directly into HecRejectionContext, removing
  the Telemetry wrapper struct that collided with the ES Telemetry name
- Parse Splunk JSON error body in error_message to surface the text field
- Remove #OBSERVO_STYLE_TELEMETRY# cross-reference comment

Co-Authored-By: Akshaya's Agent <akshaya.kumar+agent@sentinelone.com>
…itialisers

Co-Authored-By: Akshaya's Agent <akshaya.kumar+agent@sentinelone.com>
Co-Authored-By: Akshaya's Agent <akshaya.kumar+agent@sentinelone.com>
- Use typed SplunkErrorBody struct instead of serde_json::Value to
  parse the Splunk error body — avoids full allocation for one field
- Replace &*context with context.as_ref() for consistency
- Add unit tests for HecRejectionContext::error_message covering the
  text-field-present and fallback paths

Co-Authored-By: Akshaya's Agent <akshaya.kumar+agent@sentinelone.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants