So the other day I posted my pride and joy regex. You know, this one?

'^(?<host>[^ ]*) - \[(?<real_ip>)[^ ]*\] - 
(?<user>[^ ]*) \[(?<time>[^\]]*)\] 
"(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" 
(?<code>[^ ]*) (?<size>[^ ]*) 
"(?<referer>[^\"]*)" "(?<agent>[^\"]*)" 
(?<request_length>[^ ]*) 
(?<request_time>[^ ]*) 
\[(?<proxy_upstream_name>[^ ]*)\] 
(?<upstream_addr>[^ ]*) 
(?<upstream_response_length>[^ ]*) 
(?<upstream_response_time>[^ ]*) 
(?<upstream_status>[^ ]*) (?<last>[^$]*)'

Seems simple, right? But, it leads to a set of questions:

  1. If we can get a ” in the path, we can do a quoting-style-escape to avoid getting logged
  2. The regex engine used in fluent-bit is onigmo. And it has some CVE. This means its conceivable that a pattern that a user can put on the wire can escape into our trusted most privileged logging container (running privileged, node filesystem mounted, etc)
  3. DDoS. We log a lot. But the logs are often bigger than the thing they are logging.

For #3, consider this. Its a connection log from istio. Yes you read that right, a TCP SYN( ~64 bytes) creates this in JSON of 816 bytes:

{“level”:”info”,”time”:”2018-09-17T20:12:59.912982Z”,”instance”:”tcpaccesslog.logentry.istio-system”,”connectionDuration”:”12.740646ms”,”connectionEvent”:”close”,”connection_security_policy”:”none”,”destinationApp”:””,”destinationIp”:”10.244.1.57″,”destinationName”:”payment-6cdc5b656-fkhxh”,”destinationNamespace”:”socks”,”destinationOwner”:”kubernetes://apis/extensions/v1beta1/namespaces/socks/deployments/payment”,”destinationPrincipal”:””,”destinationServiceHost”:””,”destinationWorkload”:”payment”,”protocol”:”tcp”,”receivedBytes”:117,”reporter”:”destination”,”requestedServerName”:””,”sentBytes”:240,”sourceApp”:””,”sourceIp”:”10.244.1.1″,”sourceName”:”unknown”,”sourceNamespace”:”default”,”sourceOwner”:”unknown”,”sourcePrincipal”:””,”sourceWorkload”:”unknown”,”totalReceivedBytes”:117,”totalSentBytes”:240}

Hmm, so you are seeing where I am going. You remember a few years ago where we found that NTP could be asked its upstream list? So a small packet would create a large response? And, being UDP, could be spoofed, so the response could go to someone else? Making it a great DDoSsource.

Well, my log. Your SYN costs me a lot more to receive than it costs you to send. Think of all the mechanisms below that (elasticsearch, fluent-bit, kibana, storage, network, cpu, ram, …).

Hmm.

Now about #2. That is a bit of a trouble point. Who wants to find that the regex that is parsing the field that any user can send you via netcat is itself prone to a crash, or remote escape? Not me.

Share This

Share this post with your friends!