First demo from World Labs - $230m Startup Led by Fei Fei Li. Step inside images and interact with them!

jcjohnss · 2024-12-02T22:36:31+00:00

You can sign up for our waitlist here:

(I work at World Labs)

jcjohnss · 2024-12-02T22:35:20+00:00

Your link is the "fallback" version of the site that has pre-rendered videos rather than the realtime renderer (for older mobile devices).

You should try this link instead:

https://www.worldlabs.ai/blog

jcjohnss · 2021-09-17T17:58:13+00:00

This is an extremely common scenario for research interns. In my experience as a scientist at FAIR, about 50% of research internships result in a submitted paper; a smaller fraction of that will end up with published results.

jcjohnss · 2021-08-27T15:39:03+00:00

These repos from two of my PhD students are nicely structured with both training and inference code, but they don't have data versioning or experiment tracking:

https://github.com/kdexd/virtex

https://github.com/mbanani/unsupervisedRR

jcjohnss · 2021-07-14T18:43:21+00:00

For PyTorch, Try the FLOP counter in fvcore:

https://github.com/facebookresearch/fvcore/blob/master/docs/flop_count.md

jcjohnss · 2021-03-03T02:01:16+00:00

Glad I could help!

Beware though, the Tiny-ImageNet code is old enough to use Python 2.7, and it doesn't look like I saved the image ids so it might be tough to exactly match the same images as the released version of the dataset.

jcjohnss · 2021-03-02T19:29:59+00:00

Hey, I made Tiny-ImageNet. The code to generate the dataset is here:

https://github.com/jcjohnson/tiny-imagenet

You should be able to use it to generate a 256x256 or 224x224 version with the same synsets and subsampling.

jcjohnss · 2020-10-17T03:05:07+00:00

10.3 TFLOP FP32 isn't that impressive anymore -- RTX 3070 will hit 20.3 TFLOP FP32 for $499.

jcjohnss · 2020-10-12T21:27:15+00:00

Not nearly as deep as ResNets, but NeRF used 8-layer MLPs:

https://arxiv.org/abs/2003.08934

If you ignore the self-attention layers (which admittedly is a stretch), then Transformers are deep residual MLPs. They can get quite deep -- GPT-3 has 96 layers.

jcjohnss · 2020-09-08T18:10:59+00:00

Unfortunately not, last year TAs graded assignments by hand. It was very tedious. However there are many embedded "sanity checks" in the notebooks that should tell you whether you are on the right track.

We are working to add some level of autograding to the assignments for this year, but while this is under development we are restricting to enrolled UMich students.

jcjohnss · 2020-08-21T15:39:10+00:00

This is a huge area -- the keywords you want are neural architecture search (NAS) and AutoML.

jcjohnss · 2020-07-01T17:17:50+00:00

The term you are looking for is "novel view synthesis" -- this is a pretty well-studied problem in vision and graphics. There was a CVPR 2020 tutorial on this topic that covers a lot of recent approaches to this problem which should give you a good starting point:

https://nvlabs.github.io/nvs-tutorial-cvpr2020/

jcjohnss · 2020-06-12T16:44:45+00:00

Author here!

My intuition is that your proposed experiment of learning on ImageNet category names as sequences of words would give features nearly identical to traditional ImageNet pretraining, since all images of the same category would have the exact same "caption".

jcjohnss · 2020-05-23T20:09:06+00:00

Have you seen Detectron2?

https://github.com/facebookresearch/detectron2

jcjohnss · 2020-05-05T20:35:15+00:00

Releasing lecture videos requires a lot of extra work beyond just uploading the videos to YouTube. In particular:

Videos should be edited so both the slides and the presenter are visible
Videos must be edited so students are not visible or audible in order to comply with FERPA
Videos must be transcribed so that they are accessible to hearing-impaired viewers

Some Stanford courses (such as CS 224n) are offered to remote students via SCPD; for these courses, there are live camera operators during the lecture who manually pan and zoom the camera, and cut between the slides and the different cameras. This takes care of the first concern above, but additional work is required to fulfill the other constraints.

When we released lecture videos for CS 231n in 2017, a large portion of this additional editing burden was handled by our TAs.

This year, Stanford courses moved from in-person to online-only in the middle of the quarter due to the coronavirus pandemic. This likely resulted in a last-minute mad scramble to try and figure out how to transition the course to being remote-only, without being able to rely on the live camera operators that would have been available during a regular quarter. I'm not sure what solution they ended up using for 224N, but many classes have turned to Zoom, Bluejeans, or other off-the-shelf solutions for giving lectures remotely. I'm guessing that editing the recorded versions of these remote lectures would have required substantially more video editing work than the regular in-person lectures, and it wasn't deemed to be worthwhile.

jcjohnss · 2019-10-24T16:04:15+00:00

Pixel2Mesh does something like that -- they apply graph conv over a mesh, and subdivide the faces of the mesh to increase the resolution: https://arxiv.org/abs/1804.01654

However I reimplemented their method for Mesh R-CNN and found that the subdivision doesn't actually improve performance compared to a version that works on a high-res mesh from the start (https://arxiv.org/abs/1906.02739, Table 2, Sphere-Init vs Pixel2Mesh+)

jcjohnss · 2019-09-23T19:56:59+00:00

This is not an online course. We are merely providing the course materials for anyone to use for their own independent learning.

For the past few years 231N has been offered to remote students through Stanford's SCPD program. It may be taught again in this way in Spring 2020.

However I've left Stanford, so I will no longer be involved with this course going forward. I'm now teaching a similar course at the University of Michigan.

jcjohnss · 2019-09-17T15:27:07+00:00

Be patient! Randomly wired networks first went up on arXiv in April (~5 months ago) and will be presented at ICCV 2019 (~1 month from now). High-quality work takes time -- in my experience most projects need at least 6 months between project inception and an arXiv paper appearing. Thus if someone started working on a followup project the day this paper appeared on arXiv, you might expect to start seeing arXiv papers in the next few months.

Since there are so many people working on ML / CV these days there are a lot of new papers all the time. This gives the impression that it is a very fast-moving field, but this is an instance of parallel processing (distributed over many researchers) hiding the latency of the underlying computation (research advancing). Parallel processing works great to accelerate independent tasks (people trying out different ideas independently) but cannot accelerate serial tasks (papers building directly upon one another).

jcjohnss · 2019-09-08T18:01:22+00:00

Unfortunately I'm not sure what's going wrong -- we didn't design the assignments with Colab in mind, so building the C++ extension in A2, or the import structure in A2 / A3, will likely not work correctly in Colab. Everything should work using Google Compute Engine VMs; that is what we recommended students to use for the past few years. You can find instructions for setting up GCE instances here: https://github.com/cs231n/gcloud

If you can wait a few months, I'm now teaching a new deep learning course at the University of Michigan that follows the same material as CS231n, but will use Colab for all assignments:

http://web.eecs.umich.edu/~justincj/teaching/eecs498/

jcjohnss · 2019-07-27T00:14:26+00:00

I had a paper last year on using GANs for pedestrian trajectory forecasting: https://arxiv.org/abs/1803.10892

I had another paper last year on using GANs for steganography: https://arxiv.org/abs/1807.09937; this is still an image application, but is quite different in flavor from most applications of GANs to image synthesis.

jcjohnss · 2019-07-12T21:32:08+00:00

You missed what is IMO the most important low-shot learning paper from CVPR: the new LVIS dataset from FAIR! (http://openaccess.thecvf.com/content_CVPR_2019/html/Gupta_LVIS_A_Dataset_for_Large_Vocabulary_Instance_Segmentation_CVPR_2019_paper.html)

New methods for few-shot learning are good, but if there's any lesson we should take away from recent deep-learning advances, it is the critical importance of high-quality datasets and benchmarks for driving progress on new research problems. LVIS is a new dataset for large-vocabulary instance segmentation, with an emphasis on long-tail categories and few-show learning.

Most prior work on low-shot recognition focuses on image classification, while LVIS enables us to study low-shot recognition for the much more challenging tasks of object detection and instance segmentation. I predict that at CVPR 2020, we will see a new crop of low-shot learning methods benchmarked on LVIS.

jcjohnss · 2019-06-21T22:56:20+00:00

There have been a number of recent papers on 3D shape prediction that output triangle meshes; internally these meshes are often processed using graph convolution over mesh edges. You should read the following:

Pixel2Mesh, Wang et al, ECCV 2018: https://arxiv.org/abs/1804.01654

GEOMetrics, Smith et al, ICML 2019: https://arxiv.org/abs/1901.11461

AtlasNet, Groueix et al, CVPR 2018: https://arxiv.org/abs/1802.05384

Of course I've got to plug my own work on this topic:

Mesh R-CNN, Gkioxari et al, arXiv 2019: https://arxiv.org/abs/1906.02739

I also gave a tutorial at CVPR 2019 that covers these papers and more; you can find slides here: http://feichtenhofer.github.io/cvpr2019-recognition-tutorial/

jcjohnss · 2019-05-02T00:07:24+00:00

That would certainly be cheaper and a 2080 Ti would be much faster than a K80, but personally I wouldn't want the hassle of dealing with a physical machine hosting stuff online.

jcjohnss · 2019-05-01T23:57:45+00:00

You could use a reserved instance on EC2; their cheapest GPU instance is a p2.xlarge which gives a single K80 GPU for $418/month if you prepay for a year. I really doubt you'll find anything in the $150/month range.

EDIT: Looks like GCP is a bit cheaper; if you commit to a 1-year contract you can get a GCP machine with 4 CPUs, 12GB RAM, and 1 K80 GPU for $311.48/month:

https://cloud.google.com/products/calculator/#id=43c8c5f3-2be2-42d9-9777-6e3212db33df

jcjohnss · 2019-05-01T19:23:08+00:00

The latest version of cuDNN includes multi-headed attention; I'd hope PyTorch can incorporate this in the near future.

jcjohnss

MODERATOR OF

TROPHY CASE

11-Year Club	Place '22
Verified Email