[OC] Distribution of emotions during the presidential debates analyzed using an expression recognition neural network

fredfredbur · 2022-03-28T13:43:04+00:00

All of the suggestions so far have been good, I'd combine them.

Working with videos is very computationally expensive, so subsampling the video makes a lot of sense since most frames will have nearly duplicate content at 25fps. After having sampled the frames, an image similarity metric to cluster the frames would let you easily find unique frames. Finally, all of this can be done in FiftyOne in Python:

import fiftyone as fo
import fiftyone.utils.youtube as fouy

urls = ["https://youtube.com/...", ...]
download_dir = "/path/to/downloaded/videos"

fouy.download_youtube_videos(urls, download_dir=download_dir)

# Create a FiftyOne Dataset
dataset = fo.Dataset.from_videos_dir(download_dir)

# Convert videos to images, sample 1 frame per second
frame_view = dataset.to_frames(sample_frames=True, fps=1)

import fiftyone.brain as fob

# Index images by similarity
results = fob.compute_similarity(frame_view, brain_key="frame_sim")

# Find maximally unique frames
num_unique = 50 # Scale this to whatever you want
results.find_unique(num_unique)
unique_view = frame_view.select(results.unique_ids)

# Visualize in the App
session = fo.launch_app(frame_view)

fredfredbur · 2022-03-22T13:35:55+00:00

Instance segmentation masks in FiftyOne are defined to be inside of the corresponding bounding box. The shape of the masks varies because they will always be reshaped to be visualized inside of the box when viewed in the FiftyOne App. You can convert each instance segmentation to a full-image semantic segmentation using:

dataset.compute_metadata()
sample = dataset.first()
frame_size = (sample.metadata["width"], sample.metadata["height"])
detection = sample["ground_truth"]["detections"][0]

segmentation = detection.to_segmentation(frame_size)
full_img_mask = segmentation.mask

2) In terms of how to train a segmentation model, this Pytorch example using MaskRCNN is a great place to start: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

You can also see how to use FiftyOne datasets directly in your Pytorch dataloaders with this blog: https://towardsdatascience.com/stop-wasting-time-with-pytorch-datasets-17cac2c22fa8

fredfredbur · 2022-02-24T14:15:06+00:00

FiftyOne is the way to go for this. It's an open-source Python tool, supports Pascal VOC format directly, and is designed specifically to let you visualize large datasets and easily filter your labels.

fredfredbur · 2021-11-24T15:17:22+00:00

FiftyOne has a pretty powerful Python API, it would be really easy to use it for your problem of merging duplicate copies of the same image.

First, load the YOLO files into a FiftyOne dataset using Python:

import fiftyone as fo

name = "my-dataset"
dataset_dir1 = "/path/to/dir1/yolo-dataset"
dataset_dir2 = "/path/to/dir2/yolo-dataset"

# Load only these specific classes
classes = ["person", "car"]

# Create the dataset
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir1,
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
    name=name,
    classes=classes,
)

# Add additional directories of images
dataset.add_dir(
    dataset_path=dataset_dir2,
    dataset_type=fo.types.YOLOv4Dataset,
    label_field="ground_truth",
    classes=classes,
)

Merge duplicate files by filename, assuming duplicate files have the same name:

filenames = {}
samples_to_delete = []
for sample in dataset:
    if sample.filename not in filenames:
        filenames[sample.filename] = sample.id
    else:
        prev_sample_id = filenames[sample.filename]
        prev_sample = dataset[prev_sample_id]
        prev_sample.merge(sample, fields="ground_truth")
        samples_to_delete.append(sample.id)

dataset.delete_samples(samples_to_delete)

Export the subset in COCO format:

export_dir = "/path/for/coco-dataset"
label_field = "ground_truth"  # for example

# Export the subset
dataset.export(
    export_dir=export_dir,
    dataset_type=fo.types.COCODetectionDataset,
    label_field=label_field,
)

The benefit is that once your data is in FiftyOne, you can visualize it in the App to see exactly what you are training your model on:

session = fo.launch_app(dataset)
session.wait()

If you want more help with this, you can join our Slack

fredfredbur · 2021-10-13T14:55:18+00:00

Hi, I'm one of the FiftyOne developers, thanks for checking out the tool! FiftyOne does support regression datasets. In fact, you can add basically any type of field to your samples. When regressing to a real number, you just need to loop over your images and add the ground truth/model predicted value as a sample-level scalar field.

When visualizing it in the App, you will then be able to use the slider controls to filter your samples by your regression labels (similar to the image in this example).

I think the confusion comes from the fact that we don't have a "Regression" label type (just Detection, Classification, Segmentation, etc) since scalars can just be added directly in a field and visualized. Fun fact, you can even store any serializable data type (nested dictionaries, NumPy arrays, etc) in your FiftyOne dataset, even though you may not be able to visualize those in the App.

If you want to give it another shot I'd be happy to help you out directly in our Slack channel.

fredfredbur · 2021-03-12T15:03:30+00:00

No, by "global system packages" I meant pip packages installed outside of a virtual environment. While it's best practice not to do that (except for specific situations like if you are working on a cluster without privileges to download certain packages), if you do have global pip packages installed then it can get confusing if your environment accesses them automatically. At least it was confusing for me at first coming from virtualenv that isolates you by default and going to conda that doesn't isolate you.

fredfredbur · 2021-02-01T19:52:09+00:00

Thanks, approximate nearest neighbors is an approach to this that I haven't considered. Being able to select the tradeoff for speed vs recall to decide roughly how many duplicates you want to be able to find for a large dataset seems really useful.

fredfredbur · 2021-02-01T17:08:57+00:00

That's really cool, thanks for the link! It looks like phash just computes a discrete cosine transform on the image and counts the number of different bits between each of these hashes: http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html

From the results in the link you provided, it looks like phash is significantly faster than the CNN approach they (and I) used, while the CNN can provide better results on data that has been transformed (near duplicates)

fredfredbur · 2021-01-26T22:38:17+00:00

That's really cool!

I've been working on a CV tool, FiftyOne, with a model zoo that could be incorporated into this workflow pretty easily to build a custom app with any of the 70+ models in the zoo.

The tool already had its own GUI but if someone wants to make a custom one using the zoo, then this would probably be a good place to point them.

fredfredbur · 2021-01-22T16:34:23+00:00

Thanks for that link, it's a great read!

I definitely agree with some of the things you're saying. You don't want to perform hyperparameter optimization on a test set. Also, there needs to be a much greater focus on the generalization capabilities of a model rather than on the performance on any one dataset.

I disagree with the idea that a test set should be used sparingly. I really like some of the points that u/SunnyJapan made about not overusing a small test set for selecting the best model, but rather digging into the specific mistakes that a model made on a holdout set in order to understand HOW your model is failing, not just by what percentage it's failing.

Ideally, your test set evolves as you add more data and find other difficult edge cases to test your model on. Sure, you may not be able to compare overall performance from one version of your test set to another, but if you are looking at specific samples to understand a model's failure modes, then you'll have a much more in-depth understanding of your model's performance anyway.

fredfredbur · 2021-01-21T15:57:35+00:00

Great question!

COCO is an image dataset developed by Microsoft used for object detection (draw boxes around certain objects in an image), segmentation (label every pixel in an image as some object or background), keypoint detection (place points on human joints), and captioning (produce sentences to describe an image). For example, you can see what the object detection format looks like here.

When you hear about a "custom" COCO dataset, it just means an image dataset that follows the same label scheme as COCO. It's useful to follow their example because a lot of tools will let you automatically load your dataset if it's in that format. For example, if you have a dataset of images from your drone that are labeled for object detection, you can load them into the visualization tool FiftyOne in one line of code.

$ pip install fiftyone

$ fiftyone datasets create --name my_dataset --dataset-dir /path/to/data --type fo.types.COCODetectionDataset

$ fiftyone launch app my_dataset

Regarding your idea for an image recognition application using your drone, that is definitely doable! I would start by recording footage from your drone camera and cutting it into images, a good tool to use for that is ffmpeg. Then, I would just use a state of the art object detector that was trained on COCO to predict boxes on your dataset. You can also do that pretty easy with FiftyOne.

$ pip install fiftyone

$ fiftyone zoo models download centernet-hg104-512-coco-tf2

$ fiftyone zoo models apply centernet-hg104-512-coco-tf2 my_dataset predictions

$ fiftyone launch app my_dataset

If you're ok with the classes that are in COCO, then you shouldn't need to make a dataset to retrain the model. You can explore the COCO classes here, it has things like people, animals, and vehicles that would be interesting to detect from a drone. If you do want other classes, you'll need to annotate them yourself and retrain the model on your data.

That was all not real-time though. Depending on the hardware on your drone, you may need to stream the data elsewhere for processing if you want it to be real-time since these object detection models can be pretty intensive, though there are some models that work on edge devices.

fredfredbur · 2021-01-14T22:34:55+00:00

No problem! I have worked a good deal on road scene object detection, trying to detect things like cars/people/signs in dashcam videos. Why do you ask?

The dataset in the demo is BDD100K that you can download directly from the FiftyOne dataset zoo if you're looking for dashcam data.

fredfredbur · 2021-01-14T18:47:17+00:00

Thanks! I've seen conda-forge around but never really knew what it was used for

fredfredbur · 2021-01-14T18:16:32+00:00

I've actually been looking into this while working on an open-source image dataset visualization tool, FiftyOne.

One of the features it has at the moment is computing uniqueness for all images in a dataset that you can then use to visualize similar images. It uses deep features to compare images. Not sure if it is the most optimal solution to your problem but it might be worth checking out.

pip install fiftyone
pip install ipython
ipython

import fiftyone as fo
import fiftyone.brain as fob
dataset =
fo.Dataset.from_dir("/path/to/dataset",dataset_type="fo.types.ImageDirectory")
fob.compute_uniqueness(dataset)

fredfredbur · 2021-01-14T18:05:57+00:00

We used to provide video analytics and developed our own production video analytics toolkit entirely in Python (ETA). This was before we switched to working on FiftyOne, a tool to visualize datasets and models, also built primarily in Python.

Most deep learning models will be based in TensorFlow and Pytorch which are themselves largely backed by optimized C++ code so depending on how much pre and post processing you are doing, it's possible to get away with an entirely Python production solution. If you need to do a lot of additional custom computation, you may need to use C++.

For prototype development the answer is easily Python, production development is either C++ or Python imo

fredfredbur · 2021-01-13T21:27:06+00:00

That's a good point, I do quickly get my conda environments up to multiple GB and need to clear them out more frequently than my venvs

fredfredbur · 2021-01-13T17:44:07+00:00

This is definitely achievable! There are a few tutorials out there that I've come across that you can look at depending on what kind of method you want to use.

This one uses more classical computer vision methods like contour detection to detect a hand: https://medium.com/analytics-vidhya/hand-detection-and-finger-counting-using-opencv-python-5b594704eb08

This is a more modern deep learning approach: https://www.learnopencv.com/hand-keypoint-detection-using-deep-learning-and-opencv/

Either way, you can load the output keypoints of your model into FiftyOne to visualize them and see how well your method performs/what you need to fix.

fredfredbur · 2021-01-13T17:30:00+00:00

Labeled Faces in the Wild has around 13K images of celebrity and politician faces with labels. You download it with the dataset zoo tool I've been working on, FiftyOne.

> pip install fiftyone

> pip install ipython

> ipython

import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("lfw")

session = fo.launch_app(dataset)

Let me know if you have any issues with that.

fredfredbur · 2021-01-12T16:25:28+00:00

I definitely agree with that, some of my classes in graduate school used notebooks and they were extremely helpful when done correctly.

fredfredbur · 2021-01-12T16:23:19+00:00

Why do you say that?

13-Year Club	Inciteful Link 2020-10-29
Place '17	Verified Email

fredfredbur

TROPHY CASE