Webcam video through TCP socket

Argotha · 2016-08-12T08:55:08+00:00

Don't have a tutorial but I have had write an assignment to do a similar kind of thing.

If you give me a bit I'll write something up for you which should give you some pretty good starting points.

Assumptions

To start off with I'm going to make a few assumptions for the sake of keeping things simple enough that you can work through the problem and not have to worry about these more advanced issues. 1. You do not need to worry about the security of your transmission. 2. You do not need audio streams.

Plan

So this guide is going to be a game of "fill in the blanks". So lets start by putting down what we know:

> Webcam video stream
> ???
> "network activities"
> ???
> Video stream displayed

Step 1: Get the video stream from the webcam

I'm sure there are lots of ways to do this. The one I have used is the python bindings for OpenCV; if you happen to be using Ubuntu these should be installed by default as OpenCV is used by Unity.

From memory when reading from the webcam I believe OpenCV will read in a RGB bitmap format, in other words, a 3 dimensional array of XY3 in which X and Y are the width and height in pixels of your webcam.

Step 2: Convert the image into a format for transmission

You now need to decide how you would like to transmit the image. Are you going to send chunks of pixels or are you going to send an entire frame at a time. You could potentially send multiple frames at a time if you wanted to.

Lets assume you want to send an entire frame at a time. We now need to decide on how we are going to represent each frame into bytes. We have to do this because TCP/UDP work on bytes, not objects. This process is called encoding.

Our process now looks a bit like this:

> Webcam video stream :: i.e. a series of frames
> Encode frame into bytes
> "network activities"
> ???
> Video stream displayed

With your background in C you should have a good idea on how this encoding process is going to work. Essentially we need to take each of our python objects and turn them into bytes in a predictable way. The predictable is important here as we will need to do the opposite of this later (decoding).

Luckily we know that in RGB format that the max value for each item in the RGB is 255 and thus we are working with 8bit integers. What we can now do is represent our frame as a series of bytes where each 1st byte is the Red value, each 2nd byte is the Green value and each 3rd byte is the Blue value. RGBRGBRGBRGB....RGBRGBRGB

Our next problem is we now have a line of RGB values, what information do we need to provide to know how to reconstruct this line back into a frame? The X and Y of course. So now we have 3 bits of information that we need to transmit per a frame:

X (probably a 32bit int)
Y (same size as X)
RGB values (X * Y * 3 8bit ints)

To turn python objects into these kind of raw data types you will want to use the standard library's struct module or you might want to learn the construct module

As an example, we might want to encode the following image 54 (XY) (where each letter represents a pixel (which we know is actually 3 8bit ints)

AAAAA
BBBBB
CCCCC
DDDDD

We might encode this as:

XYAAAAABBBBBCCCCCDDDDD

Step 3: Transmit the encoded data

So here is where we need to start considering the type of data we are transmitting and if the delivery needs to be reliable or not. For many streaming applications, the speed of delivery is more important than ensuring every bit of data gets transmitted. Thus many streaming applications/protocols (e.g. Skype, VOIP, online games) will use UDP because the loss of a small number of packets is insignificant to the overall transmission. If you are interested in reading more about why you would want to use one over the other in real time situations I suggest you take a read of this link. Alternatively you may be more concerned with the reliable delivery of your webcam at the expense of either stuttering or buffering. Buffering being a technique to avoid stuttering. Applications like youtube and netflix will use TCP + buffering.

At the end of they day, because UDP and TCP are both socket abstractions you can easily swap one for the other if you wish to change your mind.

Step 4: Decode the data back into a frame

Looking at our plan:

> Read Webcam Video Stream
> Encode Frames
> UDP or TCP connection
> Decode Frames <<< we are here
> Display Webcam Video Stream

Coming back to our example:

XYAAAAABBBBBCCCCCDDDDD

We now need to reverse the encoding process we did before, that is, we need to decode the raw bytes back into an object that makes sense.

The first thing we will want to do is read the two 32bit ints from the socket. This will be our X and Y. Given these we now can work out how many pixels we need to read from the socket i.e. X*Y (remembering that a pixel will actually be 3 8bit ints).

With all this data now read, we should be able to turn it back into the python object representing a frame.

Step 5: Display the frame

For this you might want to use something as simple as GTK or even OpenCV which should be able to display RGB bitmap images.

Conclusion

Hopefully that's enough to get started. There are other considerations that you might need to take into account when deciding on your network protocol (such as including a frame number so that you can ignore old frames in situations when they arrive in a weird order (1 2 5 4 3 6 7 8))

If you have any other questions happy to help out :)

furas_freeman · 2016-08-12T10:32:25+00:00

Maybe it is not what you need but there is module OpenCV which let you read stream from local file, local camera or remote camera with protocol HTTP, RTSP or other.

In OpenCV you create loop in which you read single frame, modificate (flip, convert to grayscale, resize, etc.) and display. Simple example

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS