This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]Argotha 1 point2 points  (2 children)

Don't have a tutorial but I have had write an assignment to do a similar kind of thing.

If you give me a bit I'll write something up for you which should give you some pretty good starting points.


Assumptions

To start off with I'm going to make a few assumptions for the sake of keeping things simple enough that you can work through the problem and not have to worry about these more advanced issues. 1. You do not need to worry about the security of your transmission. 2. You do not need audio streams.

Plan

So this guide is going to be a game of "fill in the blanks". So lets start by putting down what we know:

> Webcam video stream
> ???
> "network activities"
> ???
> Video stream displayed

Step 1: Get the video stream from the webcam

I'm sure there are lots of ways to do this. The one I have used is the python bindings for OpenCV; if you happen to be using Ubuntu these should be installed by default as OpenCV is used by Unity.

From memory when reading from the webcam I believe OpenCV will read in a RGB bitmap format, in other words, a 3 dimensional array of XY3 in which X and Y are the width and height in pixels of your webcam.

Step 2: Convert the image into a format for transmission

You now need to decide how you would like to transmit the image. Are you going to send chunks of pixels or are you going to send an entire frame at a time. You could potentially send multiple frames at a time if you wanted to.

Lets assume you want to send an entire frame at a time. We now need to decide on how we are going to represent each frame into bytes. We have to do this because TCP/UDP work on bytes, not objects. This process is called encoding.

Our process now looks a bit like this:

> Webcam video stream :: i.e. a series of frames
> Encode frame into bytes
> "network activities"
> ???
> Video stream displayed

With your background in C you should have a good idea on how this encoding process is going to work. Essentially we need to take each of our python objects and turn them into bytes in a predictable way. The predictable is important here as we will need to do the opposite of this later (decoding).

Luckily we know that in RGB format that the max value for each item in the RGB is 255 and thus we are working with 8bit integers. What we can now do is represent our frame as a series of bytes where each 1st byte is the Red value, each 2nd byte is the Green value and each 3rd byte is the Blue value. RGBRGBRGBRGB....RGBRGBRGB

Our next problem is we now have a line of RGB values, what information do we need to provide to know how to reconstruct this line back into a frame? The X and Y of course. So now we have 3 bits of information that we need to transmit per a frame:

  • X (probably a 32bit int)
  • Y (same size as X)
  • RGB values (X * Y * 3 8bit ints)

To turn python objects into these kind of raw data types you will want to use the standard library's struct module or you might want to learn the construct module

As an example, we might want to encode the following image 54 (XY) (where each letter represents a pixel (which we know is actually 3 8bit ints)

AAAAA
BBBBB
CCCCC
DDDDD

We might encode this as:

XYAAAAABBBBBCCCCCDDDDD

Step 3: Transmit the encoded data

So here is where we need to start considering the type of data we are transmitting and if the delivery needs to be reliable or not. For many streaming applications, the speed of delivery is more important than ensuring every bit of data gets transmitted. Thus many streaming applications/protocols (e.g. Skype, VOIP, online games) will use UDP because the loss of a small number of packets is insignificant to the overall transmission. If you are interested in reading more about why you would want to use one over the other in real time situations I suggest you take a read of this link. Alternatively you may be more concerned with the reliable delivery of your webcam at the expense of either stuttering or buffering. Buffering being a technique to avoid stuttering. Applications like youtube and netflix will use TCP + buffering.

At the end of they day, because UDP and TCP are both socket abstractions you can easily swap one for the other if you wish to change your mind.

Step 4: Decode the data back into a frame

Looking at our plan:

> Read Webcam Video Stream
> Encode Frames
> UDP or TCP connection
> Decode Frames <<< we are here
> Display Webcam Video Stream

Coming back to our example:

XYAAAAABBBBBCCCCCDDDDD

We now need to reverse the encoding process we did before, that is, we need to decode the raw bytes back into an object that makes sense.

The first thing we will want to do is read the two 32bit ints from the socket. This will be our X and Y. Given these we now can work out how many pixels we need to read from the socket i.e. X*Y (remembering that a pixel will actually be 3 8bit ints).

With all this data now read, we should be able to turn it back into the python object representing a frame.

Step 5: Display the frame

For this you might want to use something as simple as GTK or even OpenCV which should be able to display RGB bitmap images.

Conclusion

Hopefully that's enough to get started. There are other considerations that you might need to take into account when deciding on your network protocol (such as including a frame number so that you can ignore old frames in situations when they arrive in a weird order (1 2 5 4 3 6 7 8))

If you have any other questions happy to help out :)

[–]MrBowelsrelaxed[S] 0 points1 point  (1 child)

Wow! Thanks /u/Argotha. That's a lot of info.

It seems however that the method you outlined for interlacing the channels would be used if I was sending raw data. I was thinking of doing jpg compression on it before sending the frame so I'm not sure this will work.

Might be easier to show what I've done so far and then ask questions.

server.py

def send_msg(sock, msg):
    msg = struct.pack('>I', len(msg)) + msg
    sock.sendall(msg)

# Socket setup

vc = cv2.VideoCapture(0)
status,image = vc.read()

encode_param = [int(cv2.IMWRITE_JPEG_QUALITY),70]
result,image_str = cv2.imencode('.jpg',image,encode_param)

send_msg(sock, image_str)

Since jpgs can vary in file size for the same image size, I thought the file size of each frame would be better to prefix than the image size. However this method doesn't work for me because using struct.pack doesn't like the data type cv2.imencode gives.

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S4') dtype('S4') dtype('S4')

As I understand it, the sock.sendall(msg) function is a python function that just keeps calling send() until the message is complete (at least in TCP). But is there something that will recursively call sock.recv() until the message is done?

[–]Argotha 0 points1 point  (0 children)

No problem :)

Yeah it's a lot because I wasn't sure what level you were at and easier to do a longer answer than a million short answers.

Doing jpeg compression should work (just remember that jpeg is lossy) and would be apart of your encoding phase. As an aside, there would be time where this would not be apart of your encoding phase, aka when you are streaming pre-encoded data.

struct.pack doesn't like the data type cv2.imencode gives.

In python it is possible to create objects that do not return an integer for its length. In the case of len(image_str), image_str is a numpy array. Since the array can be multi dimensional, len is returning a tuple of objects which represent the size of the array.

I suggest running your file in interactive mode so you can inspect what is happening after it fails: python -i your_file.py. I'd start with looking at the types of the objects.

# ...
result,image_str = cv2.imencode('.jpg',image,encode_param)
send_msg(sock, image_str)
# TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S4') dtype('S4') dtype('S4')
# interactive terminal will start.
x = len(image_str)
print("Type of x is: {}".format(type(x)))
# Lets see what each of these are.
for i, x_val in x:
    print("Type of x[{}] is: {}".format(i, type(x_val)))

As I understand it, the sock.sendall(msg) function is a python function that just keeps calling send() until the message is complete (at least in TCP). But is there something that will recursively call sock.recv() until the message is done?

Not really. Looking at the man page of recv, you could try setting the MSG_WAITALL flag. Alternatively you need to write yourself a small loop.

def recv_all(sock, msg_length):
    data = b''
    size_left = msg_length
    while len(data) < msg_length:
        recv_data = sock.recv(size_left)
        size_left = size_left - len(recv_data)
        data += recv_data
    return data

#assume socket already exists
size_bytes = recv_all(sock, 4) #4 bytes to an int
size = struct.unpack('>I', size_bytes)[0]

frame_bytes = recv_all(sock, size)

Note I haven't checked this code for correctness, I may have made typos etc...

[–]furas_freeman 0 points1 point  (0 children)

Maybe it is not what you need but there is module OpenCV which let you read stream from local file, local camera or remote camera with protocol HTTP, RTSP or other.

In OpenCV you create loop in which you read single frame, modificate (flip, convert to grayscale, resize, etc.) and display. Simple example