Add built-in CEF access-log format#2418
Conversation
Introduce the "cef" logformat, emitting ArcSight Common Event Format lines for SIEM ingestion. The format is rendered directly by Squid so that CEF-reserved bytes are escaped per the spec and derived values not exposed to logformat (notably severity) can be included. Header: Vendor=Squid, Product="Squid Cache", DeviceVersion=VERSION, SignatureID=<Squid cache code>, Name="Proxy Request", Severity derived from LogTags and error category (preferring proxy signals over upstream HTTP status). Extensions cover client/server addressing, request/response sizing, timing, user, URL, hierarchy code, content-type, and error reason; HTTP status is exposed via cn2/cn2Label=HttpStatus. Header pipe/backslash and extension =/CR/LF are escaped per the CEF Implementation Standard. Also add %squid::hostname and %squid::version logformat tokens so administrators can replicate the built-in shape via a custom logformat, in consideration of the above mentioned restrictions, when their SIEM schema needs adjustments.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
Due to the strict description and title format requirements, I stripped all linked sources and example from the description. If anybody is interested in it, see the initial version of the description. |
yadij
left a comment
There was a problem hiding this comment.
This is an initial review. There are likely more details that need to be attended to after the conversion to SBufStream is done.
Rename FormatSquidCEF.cc to FormatSiemCef.cc and Log::Format::SquidCEF to Log::Format::SiemCef, dropping the ArcSight reference now that CEF is an open standard. Update the cf.data.pre description accordingly. Switch the output buffer from SBuf to SBufStream so header and extension fields can be written via stream operators. Collapse FieldWriter's typed methods (str/literal/integer) into a templated put() plus putStr() for escaped strings, and move the extension-escape helper inside FieldWriter as a private static. Make file-local helpers static and UpperCamelCase (cefTransport -> CefTransport, cefSeverity -> CefSeverity) per Squid's file-local function policy.
11e6297 to
fea3807
Compare
|
Explanation for the force-push: I forgot to add the name change of the function/format in two files.
|
Test ./test-builds.sh highlighted function signature error. This commit fixes that error. In addition, changed order of SIEM CEF Format to align with other positions in similar files.
There was a problem hiding this comment.
Thank you for working on improving CEF/SIEM support.
Introduce the "cef" logformat, emitting ArcSight Common Event Format
lines for SIEM ingestion.
Adding a new hard-coded logformat is a showstopper IMO: Squid should get rid of the few remaining legacy hard-coded logformat implementations, not add new ones. The number of real problems hard-coded logformats solve do not justify their development and maintenance overheads, especially since most important use cases usually require some environment-specific customizations that hard-coded logformats do not support well.
In cf.data.pre, this PR currently says that If the built-in "cef" format does not fit your SIEM schema, you can build a CEF-shaped line yourself with logformat. If that claim is accurate, then let's remove built-in logformat. Documenting (in cf.data.pre) a logformat configuration matching a popular or common SIEM schema sounds like a good idea.
The format is rendered directly by Squid so that CEF-reserved bytes are escaped per the spec
I hope this part can be implemented in Squid code by adding support for an additional escaping mechanism (that could be applied to other logformat %codes). Do you think that is feasible? What are the examples of the missing escaping mechanisms? Or does the problem below make additional escaping mechanisms in Squid unnecessary because the helper will implement them?
... and derived values not exposed to logformat (notably severity) can be included.
Defining what transactions represent a "problem" and determining that problem "severity" does not belong to Squid code. Different Squid admins are very likely to classify different transactions differently. If tight integration with Squid is desirable, that transaction/log analysis should be done via annotation ACLs, in an external ACL helper, or in an access log daemon.
N.B. I have not reviewed the proposed low-level code changes yet. I wanted to log this blocker ASAP to reduce the work on the parts of code that I think should be removed from this PR. Let's focus on resolving the high-level concerns/questions above first.
That is a fair point and I already expected being required, that all values in this format are ported to the custom format in order to be able to fully replicate it. In the end, the different values, especially in the header, are very unlikely to include any reserved characters. In order to fully replicate the current format as is, it would require the following additions to the formatter:
With these additions (without optimized naming), the logformat would look like this (at least I think, I'm loosing the overview at this length): However, choosing from a list of predefined log formats would still be nice. But as far as I understand the codebase (which is new to me), the cleanest way would be to replace all builtin formats with a lookup table for custom formats. But I'm far from qualified in regards to the squid project to make/suggest such a decision.
Here I need to disagree. Yes, different admins might have different opinions about severity. However, this is a technical value calculated by the server as an initial default. Offering such a default would save admins from replicating a set of ACLs just to derive it. It also enables easier initial rating within a SIEM for further analysis (e.g. anomaly detection on Squid severity to highlight potential issues or unusual behavior). |
On this we disagree. Some formats can be built-in as
The shell-escape quoting would be useful here. Provided a mechanism was added to support
When the log format can actually be represented using a squid.conf Problem is that the Apache and Squid native formats are slightly different in output for some values that are not represented properly by the custom |
| out << '|'; | ||
| appendHeader(out, cacheCode); | ||
| out << "|Proxy Request|" << CefSeverity(*al) << '|'; |
There was a problem hiding this comment.
The cacheCode string is a set of alphanumeric tags with _ delimiter. It does not contain any of the special CEF header characters and thus does not need filtering.
| out << '|'; | |
| appendHeader(out, cacheCode); | |
| out << "|Proxy Request|" << CefSeverity(*al) << '|'; | |
| out << '|' << cacheCode << "|Proxy Request|" << CefSeverity(*al) << '|'; |
|
|
||
| /* Time (rt = receipt time; start/end mark activity boundaries) */ | ||
| if (al->cache.start_time.tv_sec > 0) { | ||
| w.put("rt", startMs); |
There was a problem hiding this comment.
If this code is to stay I believe it is better to design the FieldWriter class as having (key,value) constructor parameters and an std::ostream &operator <<(std::ostream &os) method that does the stream output.
So it can be used like this:
| w.put("rt", startMs); | |
| out << FieldWriter("rt", startMs); |
Introduce the "cef" logformat, emitting ArcSight Common Event Format
lines for SIEM ingestion. The format is rendered directly by Squid
so that CEF-reserved bytes are escaped per the spec and derived values
not exposed to logformat (notably severity) can be included.
Header: Vendor=Squid, Product="Squid Cache", DeviceVersion=VERSION,
SignatureID=Squid cache code, Name="Proxy Request", Severity derived
from LogTags and error category (preferring proxy signals over upstream
HTTP status). Extensions cover client/server addressing, request/
response sizing, timing, user, URL, hierarchy code, content-type, and
error reason; HTTP status is exposed via cn2/cn2Label=HttpStatus.
Header pipe/backslash and extension =/CR/LF are escaped per the CEF
Implementation Standard.
Also add %squid::hostname and %squid::version logformat tokens so
administrators can replicate the built-in shape via a custom logformat,
in consideration of the above mentioned restrictions, when their SIEM
schema needs adjustments.
Header and extension values follow the standard specification, prior
work from other products, and best-effort mapping of squid specific
values to CEF fields. Field and value choices remain open for
discussion.