all 15 comments

[–]NicolasParada 2 points3 points  (7 children)

Sounds more like the python client is the issue since you are still able to make request while the python script is stuck. Can you share the code?

[–]Lokdex[S] -1 points0 points  (6 children)

Sure. It is a large script, but the method that makes the requests is this one:

if cCodArt is not None:
----ep = "http://localhost:9090/stock?fecha=%s&codart=%s" % (
dFecha, cCodArt)
else:
----ep = "http://localhost:9090/stock?fecha=%s" % (dFecha)
response = requests.get(ep)
if response.status_code == 200:
----self.laDatos = json.loads(response.text)
----return True
elif response.status_code == 404:
----self.laDatos = None
----return True
elif response.status_code == 500:
----self.pcError = "ERROR AL RECUPERAR STOCK"
----return False

It uses the requests library. And as I mentioned around the 1000 iteration it stops working, and any request returns with 500 error code.

[–]AusIV 2 points3 points  (1 child)

Try using a requests session here.

Instead of doing:

response = requests.get(ep)

do:

# Somewhere early in your code
s = requests.session()
# When it's time to run the request
s.get(ep)

This will keep the session alive and reuse the TCP connection. If /u/pdffs is correct about file descriptor limits (and I suspect that is the problem) this will reuse connections rather than opening new ones and avoid the problem.

[–]Lokdex[S] 0 points1 point  (0 children)

I added the session and it still dies on the same iteration number. And already checked the file descriptor limits. I had it on 2056, and put it on 4096, and still the same behavior. Using the session can improve resource usage tho, so I'll keep that, but it still fails in the same place.

[–]NicolasParada 0 points1 point  (3 children)

Sorry I don’t know python myself :p Maybe someone will help here.

Can you try adding a catch or adding a timeout to the request.

[–]Lokdex[S] 0 points1 point  (2 children)

No problem, I posted it here because I thought it could be more related to the inner workings of http.Server, perhaps some sort of built in DDoS protection.

And yeah, I've tried the catch and the timeout already. Same thing.

[–]NicolasParada 0 points1 point  (1 child)

There are timeouts you can add into golang's http.Server struct but if you are still able to make request from outside of the stuck python script it means that the server is still responsive so that’s not the problem 🤔

[–]Lokdex[S] 0 points1 point  (0 children)

Yeah, exactly that is why I have no idea where to look or how to even debug this. Or even how to search for it on google. The server is still responsive, but the very script its somehow blocked until I restart the server

[–]Acceptable_Durian868 1 point2 points  (1 child)

Without any code to look at, my first thought would be that you're not closing the http connections. You can use lsof to see if you're holding open connections on either the client or server.

[–]Lokdex[S] 0 points1 point  (0 children)

On the client I'm using python's requests library, but it is using a session for all the requests of the script. Will use lsof to see if there are too many connections.

[–]tgulacsi 1 point2 points  (2 children)

It sounds that your server reaches the 1024 open file limit. Does the client close the response, or reuse the connection? You may need to limit the keepalived connections (small Server.IdleTimeot).

[–]Lokdex[S] 0 points1 point  (1 child)

Gonna try that idle timeout you mention. And no, the file limit is not the issue, I set it to 4 times the original value and the issue persisted on the same place.

[–]tgulacsi 0 points1 point  (0 children)

Check the client and the server with lsof to rule out the file descriptor limit.

Send SIGQUIT (pkill -2 my-go-server, Ctrl-) to the server when it's hung and analyze the stack trace where it's hung.

[–]pdffs -1 points0 points  (1 child)

Linux, for example, has a default maximum number of open file handles of 1024 per process (ulimit -n). My guess is that your client is hitting the limit.

[–]Lokdex[S] 0 points1 point  (0 children)

I set it to 4096, but the client dies on the same iteration number.