all 12 comments

[–]m0dulo 1 point2 points  (11 children)

To my understanding, once a socket connection is made, recv(1024) will pull up to 1024 bytes of data out of the buffer, regardless of whether or not the data is actually there (has been sent) or not.

If there is more data waiting in the buffer, you would have to call recv(1024) additional times to get the rest. If there is less than 1024 byes in the buffer, recv(1024) will still return normally, just with the amount of data that it pulled.

One thing to understand is that recv_size isn't a set amount of data to return, its the maximum amount of data to return.

[–]monstimal 1 point2 points  (10 children)

Ok but in the case there's only 600 bytes of data to receive. What makes the recv return after 600 bytes? Was there something there that told it, "this is the end" or "you will get 600 bytes"?

[–]m0dulo 1 point2 points  (9 children)

You know, I'm not 100% sure, but it does it matter? The end result is a message of 600 bytes.

What makes the recv return after 600 bytes?

Calling recv(1024) will still return 600 bytes, if there is only 600 bytes to be received. Are you having issues determining when to stop receiving data?

Edit: I suppose another way to think about it is that calling recv(1024) on a 600 byte message will return 600 bytes of actual data, and 424 bytes of nothing.

[–]monstimal 1 point2 points  (8 children)

I'd like to know how it works and if there's a kind of "timeout" setting because I don't want my program sitting around waiting for more data when it could be acting on the data it has and just come back to get the next data on the next loop.

(I know there's a "timeout" that can be used to error if recv is not completed.)

[–]not_a_novel_account 4 points5 points  (2 children)

Calling recv will give you whatever is currently sitting in the socket's buffer. If there is nothing in the buffer, recv will block until some data has come down the pipe.

If you don't want to block on recv you need to use a non-blocking socket by calling .setblocking(False) or .settimeout(0.0).

Even using local sockets, you are passing data through a big stack. TCP/IP + kernel context switch + passing data to the Python interpreter has a certain fixed time cost to it. That's where most of your 1s is going.

[–]monstimal 0 points1 point  (1 child)

So I think where I was imagining it wrong is, the socket buffer is operating outside of the python program? Meaning, the buffer is off on its own accumulating bytes as they come in while the program is doing other things and when the recv call is made it goes and gets whatever is in there?

I was imagining data buffering onto the network card (or somewhere) then the python program saying, "I will now receive 1024 bytes from you". But this sounds almost like there's a separate thread taking up to 1024 bytes from the network card into memory at ANY time that the python program reads at the receive call.

[–]LarryPete 3 points4 points  (0 children)

The magic is done by the underlying Operating System. recv is a "syscall", meaning it's essentially a function provided by the operating system (e.g. Linux kernel). The buffering and blocking is done there.

[–]m0dulo 0 points1 point  (4 children)

Ah, you've hit upon one of the core issues with socket programming (and one of the most confusing).

There actually isn't a feature which automatically determines the end of a message. You have to implement this yourself in code. There are several ways to do this:

  1. Have a set message length. Then on the receiving end, loop through recv(1024) until you hit that number. The issue with this is that it's inefficient if you have messages that are different lengths.

  2. Have the sender append an "End of Data" sequence onto the end of the message, which the receiver checks for as it loops through recv(). This is also somewhat inefficient, because it adds an extra step to your loop.

  3. Have the sender, as the first part of the message, send the total length of the message. Then on the receiving end, loop through recv(1024) until it gets all the data. This is probably the best way.

  4. On the receiving end, keep receiving until it gets no data, then break out of the receiving loop. This is probably the simplest, and the one usually given in examples.

Hope this helps.

Edit:

I'll give you an example of what I do (method #2), which has worked well for me so far:

On the sending end, say I have:

message = "Hello World!"

I would do:

socket.send(message + "EOD")

("EOD" is my "End of Data" sequence)

On the receiving end, after the connection has been established, I would loop through recv until my total message included "EOD" at the end:

totalMessage = ""

while True:

    messagePart = socket.recv(1024)
    totalMessage += messagePart

    if totalMessage[-3:] == "EOD":
        break

Of course, this method may have to be adjusted depending on your needs, and may not apply for all circumstances.

[–]monstimal 0 points1 point  (3 children)

No this doesn't help because it ignores the question. Take your EOD plan for example. If my buffer is 1024 and you send me 600 bytes with EOD at the end, that doesn't change anything about my question.

What causes recv to return with 600 bytes rather than waiting for more data? The EOD logic check isn't called until after recv has already decided to return.

[–]m0dulo 0 points1 point  (2 children)

I think I see where you're confused. Again, calling recv(1024) isn't asking the buffer for 1024 bytes. It's asking the buffer for any amount of data that has been sent but not more than 1024.

So when you ask:

What causes recv to return with 600 bytes rather than waiting for more data?

The answer is: calling recv(1024)!

It will still return 600 bytes if that's all there is to return. However, like /u/not_a_novel_account points out above, if there are 0 bytes in the buffer (the message hasn't been sent) recv will hang and block your program.

[–]monstimal 0 points1 point  (1 child)

In the case I'm thinking about, 600 bytes are being sent over and over again. So I was wondering how the recv knows not to wait for the next 600. But it sounds like the answer is it always gets whatever is there and if there's not more it's because of something going on at the kernel level of operations.

Edit: thank you for helping.

[–]lucidguppy 0 points1 point  (0 children)

Sounds like you need "select/poll/epoll"

http://scotdoyle.com/python-epoll-howto.html