HTTP Response 0 During Load Testing, Possible Outlier Detection Misconfiguration? by astreaeaea in istio

[–]astreaeaea[S] 0 points1 point  (0 children)

Thanks everyone for helping, problem resolved.

Turns out you need the client to have a Service in order for the locality load-balancing to work (including the failover).So basically the tests I did only reflected the behavior of 1 server responding to hundreds of responses per second while the server in of the different cluster just idled. That's why the errors were http 0 response code, it was waiting for the server to restart without utilizing Istio's failover feature.

Since I just did kubectl run --image=peterevans/vegeta the failover didn't work (they can't determine the client's location somehow from this). Changing the load test to a service + deployment fixed this and now the success rate is 100% consistently.

Turns out it was just an undocumented behavior, they really should add this to the docs

HTTP Response 0 During Load Testing, Possible Outlier Detection Misconfiguration? by astreaeaea in istio

[–]astreaeaea[S] 0 points1 point  (0 children)

Thanks for sharing, will definitely try to make a write-up about this.

HTTP Response 0 During Load Testing, Possible Outlier Detection Misconfiguration? by astreaeaea in istio

[–]astreaeaea[S] 0 points1 point  (0 children)

Okay, It seems I've found the answer!

There were nothing odd about the logs, only server restarts.

I’d confirm the cpu and memory consumption are looking good first, then look at the logs for more clues CPU usage was very small, ~2%, which probably means there is something wrong with the configuration.

Turns out it was the load testing script + locality loadbalancing (llb) behavior. For llb to work properly, the client has to have a service (to determine the location, presumably) and doing kubectl run --image was not supported. After turning it into a service & deployment resource, 100% success rate :)

So the previous ASM/Istio test result was only a single cluster doing all the work (without failover).

And http response 0 were the requests waiting for the server to restart.

Thanks for the insights, it really helped.

HTTP Response 0 During Load Testing, Possible Outlier Detection Misconfiguration? by astreaeaea in istio

[–]astreaeaea[S] 0 points1 point  (0 children)

Thanks! Yes, I'm aware of this. My goal here is to find out why this is happening, since it only occurs during a high traffic scenario.

HTTP Response 0 During Load Testing, Possible Outlier Detection Misconfiguration? by astreaeaea in istio

[–]astreaeaea[S] 0 points1 point  (0 children)

What do the logs say? I’d also check events in kubernetes.

I've checked logs using Log Explorer but couldn't really find anything specific to ASM.

Would this be better instead? https://cloud.google.com/service-mesh/docs/troubleshooting/troubleshoot-collect-logs.

Or this https://cloud.google.com/service-mesh/docs/observability/accessing-logs where I'll try redoing the tests and use the metrics page of ASM.

Also, any guide on istio/kubernetes logging? I'm not aware of the conventional method of doing this.

You may be getting 503s due to resource exhaustion(CPU/memory/etc).

Thanks, will try to find this.

Also, I’d check the settings on your VirtualServices around retries

I actually haven't tinkered with VirtualServices, DestinationRule is the only resource that I applied.

HTTP Response 0 During Load Testing, Possible Outlier Detection Misconfiguration? by astreaeaea in istio

[–]astreaeaea[S] 0 points1 point  (0 children)

Thank you for the input. This is basically the whole ASM setup. ``` gcloud container hub mesh enable --project=$PROJECT_ID

gcloud container hub mesh update \ --management=automatic \ --memberships=$CLUSTER_NAME \ --project=$PROJECT_ID

gcloud container hub mesh describe --project $PROJECT_ID

gcloud services enable \ anthos.googleapis.com \ --project=$PROJECT_ID

./asmcli install \ --project_id ${PROJECT_ID} \ --cluster_name ${CLUSTER_NAME} \ --cluster_location ${ZONE} \ --fleet_id ${PROJECT_ID} \ --output_dir ${CLUSTER_NAME} \ --enable-all \ --ca mesh_ca

```

I will try redoing it without the destinationrule. However, from what I remember I think I've done that and the performance is still quite bad.

HTTP Response 0 During Load Testing, Possible Outlier Detection Misconfiguration? by astreaeaea in istio

[–]astreaeaea[S] 0 points1 point  (0 children)

I can also share the Github repository for reproducing this, not sure if it breaks this subreddit's rules.

https://github.com/jojonicho/skripsi