They got in via the logging! remote exploits and DDoS using the security logs

So the other day I posted my pride and joy regex. You know, this one?

'^(?<host>[^ ]*) - \[(?<real_ip>)[^ ]*\] - 
(?<user>[^ ]*) \[(?<time>[^\]]*)\] 
"(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" 
(?<code>[^ ]*) (?<size>[^ ]*) 
"(?<referer>[^\"]*)" "(?<agent>[^\"]*)" 
(?<request_length>[^ ]*) 
(?<request_time>[^ ]*) 
\[(?<proxy_upstream_name>[^ ]*)\] 
(?<upstream_addr>[^ ]*) 
(?<upstream_response_length>[^ ]*) 
(?<upstream_response_time>[^ ]*) 
(?<upstream_status>[^ ]*) (?<last>[^$]*)'

Seems simple, right? But, it leads to a set of questions:

  1. If we can get a ” in the path, we can do a quoting-style-escape to avoid getting logged
  2. The regex engine used in fluent-bit is onigmo. And it has some CVE. This means its conceivable that a pattern that a user can put on the wire can escape into our trusted most privileged logging container (running privileged, node filesystem mounted, etc)
  3. DDoS. We log a lot. But the logs are often bigger than the thing they are logging.

For #3, consider this. Its a connection log from istio. Yes you read that right, a TCP SYN( ~64 bytes) creates this in JSON of 816 bytes:


Hmm, so you are seeing where I am going. You remember a few years ago where we found that NTP could be asked its upstream list? So a small packet would create a large response? And, being UDP, could be spoofed, so the response could go to someone else? Making it a great DDoSsource.

Well, my log. Your SYN costs me a lot more to receive than it costs you to send. Think of all the mechanisms below that (elasticsearch, fluent-bit, kibana, storage, network, cpu, ram, …).


Now about #2. That is a bit of a trouble point. Who wants to find that the regex that is parsing the field that any user can send you via netcat is itself prone to a crash, or remote escape? Not me.