bc091364 cloud logs

Laughably Loquacious Logging


So you are pretty proud of yourself. You have a full micro-services running in Kubernetes with a service mesh (courtesy of Istio). You have configured your liveness probes to once per second. You are using an EFK stack (Elasticsearch / Fluent-Bit/  Kubernetes). Live is good. You are evaluating turning on either Jaeger or Zipkin. You have Prometheus, Grafana going and regularly go swimming in charts of the highest beauty and speed.

Only one problem. Every month your cloud bill goes up, mostly due to storage, and you have to lay off another person on the team to pay for Jeff Bezo’s rocket fetish. What’s up with this?

So you finally roll-up your sleeves and dig in. You take a look at one of those kube-probe liveness checks. Its got a unique requestID so tracing it is not hard. Lets just take a snoop in Elasticsearch and see how much ‘raw’ size (ignoring the index) it uses.

curl -XGET https://elastic/logstash-2018.11.13/_search -d '{ "query": { 
  "query_string": { 
    "query": "5159e6fa-07e7-90df-aa69-cef9dd6cb606" } } }' | wc -c

OK that was easy. But… the answer might shock you. It turns out for me it was ~32KiB. Yes, more memory than your first computer had, for 1 message. For your viewing pleasure this is below, I wouldn’t read all of it 🙂

And it starts to sink in. All those pods, each with sidecars and services and so on, and all the proxy servers… Each probe goes a lot of path, with a lot of logging. And you have a lot of pods. As you scale up, it gets bigger and bigger. And its mostly duplicated data.

Maybe you should look @ Druid.io ? It handles the duplicates differently, not costing you in cloud storage IOPS.

Maybe you should remove the liveness checks? Filter them out on the ingress of the logging?

Its a lot of data being written ‘just in case’ you read it, most is never viewed. Hmm.


{
"_shards": {
"failed": 0,
"skipped": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "_Y6nDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 43.775322,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.ist …
"kubernetes": {
"annotations": {
"scheduler_a …
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"app": "tele …
"istio": "mi …
"istio-mixer …
"pod-templat …
},
"namespace_name" …
"pod_id": "39efc …
"pod_name": "ist …
},
"log": "{\"level\":\ …
"stream": "stdout",
"values": {
"apiClaims": "",
"apiKey": "",
"clientTraceId": …
"connection_secu …
"destinationApp" …
"destinationIp": …
"destinationName …
"destinationName …
"destinationOwne …
"destinationPrin …
"destinationServ …
"destinationWork …
"httpAuthority": …
"instance": "acc …
"latency": "5.51 …
"level": "info",
"method": "GET",
"protocol": "htt …
"receivedBytes": …
"referer": "",
"reporter": "des …
"requestId": "51 …
"requestSize": 0 …
"requestedServer …
"responseCode": …
"responseSize": …
"responseTimesta …
"sentBytes": 435 …
"sourceApp": "",
"sourceIp": "0.0 …
"sourceName": "u …
"sourceNamespace …
"sourceOwner": " …
"sourcePrincipal …
"sourceWorkload" …
"time": "2018-11 …
"url": "/health" …
"userAgent": "ku …
"xForwardedFor": …
}
},
"_type": "_doc"
},
{
"_id": "Y46nDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 38.377754,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.soc …
"kubernetes": {
"annotations": {
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"name": "car …
"pod-templat …
},
"namespace_name" …
"pod_id": "1689e …
"pod_name": "car …
},
"values": {
"agent": "kube-p …
"authority": "10 …
"bytes_received" …
"bytes_sent": "1 …
"code": "200",
"duration": "5",
"flags": "-",
"method": "GET",
"path": "/health …
"protocol": "HTT …
"real_ip": "10.2 …
"remainder_ip": …
"request_id": "5 …
"stream": "stdou …
"upstream": "127 …
"upstream_servic …
}
},
"_type": "_doc"
},
{
"_id": "2o6bDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.98986,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.ist …
"kubernetes": {
"annotations": {
"scheduler_a …
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"app": "tele …
"istio": "mi …
"istio-mixer …
"pod-templat …
},
"namespace_name" …
"pod_id": "39efc …
"pod_name": "ist …
},
"log": "{\"level\":\ …
"stream": "stdout",
"values": {
"apiClaims": "",
"apiKey": "",
"clientTraceId": …
"connection_secu …
"destinationApp" …
"destinationIp": …
"destinationName …
"destinationName …
"destinationOwne …
"destinationPrin …
"destinationServ …
"destinationWork …
"httpAuthority": …
"instance": "acc …
"latency": "2.85 …
"level": "info",
"method": "GET",
"protocol": "htt …
"receivedBytes": …
"referer": "",
"reporter": "des …
"requestId": "81 …
"requestSize": 0 …
"requestedServer …
"responseCode": …
"responseSize": …
"responseTimesta …
"sentBytes": 103 …
"sourceApp": "",
"sourceIp": "0.0 …
"sourceName": "u …
"sourceNamespace …
"sourceOwner": " …
"sourcePrincipal …
"sourceWorkload" …
"time": "2018-11 …
"url": "/",
"userAgent": "ku …
"xForwardedFor": …
}
},
"_type": "_doc"
},
{
"_id": "BZDMDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.986363,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.ist …
"kubernetes": {
"annotations": {
"scheduler_a …
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"app": "tele …
"istio": "mi …
"istio-mixer …
"pod-templat …
},
"namespace_name" …
"pod_id": "39efc …
"pod_name": "ist …
},
"log": "{\"level\":\ …
"stream": "stdout",
"values": {
"apiClaims": "",
"apiKey": "",
"clientTraceId": …
"connection_secu …
"destinationApp" …
"destinationIp": …
"destinationName …
"destinationName …
"destinationOwne …
"destinationPrin …
"destinationServ …
"destinationWork …
"httpAuthority": …
"instance": "acc …
"latency": "1.27 …
"level": "info",
"method": "POST" …
"protocol": "htt …
"receivedBytes": …
"referer": "",
"reporter": "des …
"requestId": "89 …
"requestSize": 4 …
"requestedServer …
"responseCode": …
"responseSize": …
"responseTimesta …
"sentBytes": 174 …
"sourceApp": "",
"sourceIp": "10. …
"sourceName": "c …
"sourceNamespace …
"sourceOwner": " …
"sourcePrincipal …
"sourceWorkload" …
"time": "2018-11 …
"url": "/istio.m …
"userAgent": "",
"xForwardedFor": …
}
},
"_type": "_doc"
},
{
"_id": "uI6dDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.148528,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.ist …
"kubernetes": {
"annotations": {
"scheduler_a …
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"app": "tele …
"istio": "mi …
"istio-mixer …
"pod-templat …
},
"namespace_name" …
"pod_id": "39efc …
"pod_name": "ist …
},
"log": "{\"level\":\ …
"stream": "stdout",
"values": {
"apiClaims": "",
"apiKey": "",
"clientTraceId": …
"connection_secu …
"destinationApp" …
"destinationIp": …
"destinationName …
"destinationName …
"destinationOwne …
"destinationPrin …
"destinationServ …
"destinationWork …
"httpAuthority": …
"instance": "acc …
"latency": "1.14 …
"level": "info",
"method": "POST" …
"protocol": "htt …
"receivedBytes": …
"referer": "",
"reporter": "des …
"requestId": "6e …
"requestSize": 4 …
"requestedServer …
"responseCode": …
"responseSize": …
"responseTimesta …
"sentBytes": 174 …
"sourceApp": "",
"sourceIp": "10. …
"sourceName": "o …
"sourceNamespace …
"sourceOwner": " …
"sourcePrincipal …
"sourceWorkload" …
"time": "2018-11 …
"url": "/istio.m …
"userAgent": "",
"xForwardedFor": …
}
},
"_type": "_doc"
},
{
"_id": "tY-3DmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.148528,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.ist …
"kubernetes": {
"annotations": {
"scheduler_a …
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"app": "tele …
"istio": "mi …
"istio-mixer …
"pod-templat …
},
"namespace_name" …
"pod_id": "39efc …
"pod_name": "ist …
},
"log": "{\"level\":\ …
"stream": "stdout",
"values": {
"apiClaims": "",
"apiKey": "",
"clientTraceId": …
"connection_secu …
"destinationApp" …
"destinationIp": …
"destinationName …
"destinationName …
"destinationOwne …
"destinationPrin …
"destinationServ …
"destinationWork …
"httpAuthority": …
"instance": "acc …
"latency": "2.83 …
"level": "info",
"method": "GET",
"protocol": "htt …
"receivedBytes": …
"referer": "",
"reporter": "des …
"requestId": "58 …
"requestSize": 0 …
"requestedServer …
"responseCode": …
"responseSize": …
"responseTimesta …
"sentBytes": 103 …
"sourceApp": "",
"sourceIp": "0.0 …
"sourceName": "u …
"sourceNamespace …
"sourceOwner": " …
"sourcePrincipal …
"sourceWorkload" …
"time": "2018-11 …
"url": "/",
"userAgent": "ku …
"xForwardedFor": …
}
},
"_type": "_doc"
},
{
"_id": "tJDUDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.148528,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.ist …
"kubernetes": {
"annotations": {
"scheduler_a …
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"app": "tele …
"istio": "mi …
"istio-mixer …
"pod-templat …
},
"namespace_name" …
"pod_id": "39efc …
"pod_name": "ist …
},
"log": "{\"level\":\ …
"stream": "stdout",
"values": {
"apiClaims": "",
"apiKey": "",
"clientTraceId": …
"connection_secu …
"destinationApp" …
"destinationIp": …
"destinationName …
"destinationName …
"destinationOwne …
"destinationPrin …
"destinationServ …
"destinationWork …
"httpAuthority": …
"instance": "acc …
"latency": "1.12 …
"level": "info",
"method": "POST" …
"protocol": "htt …
"receivedBytes": …
"referer": "",
"reporter": "des …
"requestId": "8a …
"requestSize": 4 …
"requestedServer …
"responseCode": …
"responseSize": …
"responseTimesta …
"sentBytes": 174 …
"sourceApp": "",
"sourceIp": "10. …
"sourceName": "o …
"sourceNamespace …
"sourceOwner": " …
"sourcePrincipal …
"sourceWorkload" …
"time": "2018-11 …
"url": "/istio.m …
"userAgent": "",
"xForwardedFor": …
}
},
"_type": "_doc"
},
{
"_id": "hZDHDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.0215845,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.soc …
"kubernetes": {
"annotations": {
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"name": "use …
"pod-templat …
},
"namespace_name" …
"pod_id": "6e247 …
"pod_name": "use …
},
"values": {
"agent": "kube-p …
"authority": "10 …
"bytes_received" …
"bytes_sent": "1 …
"code": "200",
"duration": "3",
"flags": "-",
"method": "GET",
"path": "/health …
"protocol": "HTT …
"real_ip": "10.2 …
"remainder_ip": …
"request_id": "0 …
"stream": "stdou …
"upstream": "127 …
"upstream_servic …
}
},
"_type": "_doc"
},
{
"_id": "nI6dDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.014115,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.soc …
"kubernetes": {
"annotations": {
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"name": "use …
"pod-templat …
},
"namespace_name" …
"pod_id": "6e247 …
"pod_name": "use …
},
"values": {
"agent": "kube-p …
"authority": "10 …
"bytes_received" …
"bytes_sent": "1 …
"code": "200",
"duration": "2",
"flags": "-",
"method": "GET",
"path": "/health …
"protocol": "HTT …
"real_ip": "10.2 …
"remainder_ip": …
"request_id": "6 …
"stream": "stdou …
"upstream": "127 …
"upstream_servic …
}
},
"_type": "_doc"
},
{
"_id": "XI6bDmcBGvwBJNsJ …
"_index": "logstash-2018 …
"_score": 8.013674,
"_source": {
"@timestamp": "2018- …
"flb-key": "kube.soc …
"kubernetes": {
"annotations": {
"fluentbit_i …
"sidecar_ist …
},
"container_name" …
"host": "aks-nod …
"labels": {
"name": "fro …
"pod-templat …
},
"namespace_name" …
"pod_id": "6e798 …
"pod_name": "fro …
},
"values": {
"agent": "kube-p …
"authority": "10 …
"bytes_received" …
"bytes_sent": "9 …
"code": "200",
"duration": "2",
"flags": "-",
"method": "GET",
"path": "/",
"protocol": "HTT …
"real_ip": "10.2 …
"remainder_ip": …
"request_id": "8 …
"stream": "stdou …
"upstream": "127 …
"upstream_servic …
}
},
"_type": "_doc"
}
],
"max_score": 43.775322,
"total": 35
},
"timed_out": false,
"took": 376
}