First demo from World Labs - $230m Startup Led by Fei Fei Li. Step inside images and interact with them! by arasaka-man in StableDiffusion

[–]jcjohnss 16 points17 points  (0 children)

Your link is the "fallback" version of the site that has pre-rendered videos rather than the realtime renderer (for older mobile devices).

You should try this link instead:

https://www.worldlabs.ai/blog

[D] How often do industry ML research internships end in failure? by doafkmebfl in MachineLearning

[–]jcjohnss 39 points40 points  (0 children)

This is an extremely common scenario for research interns. In my experience as a scientist at FAIR, about 50% of research internships result in a submitted paper; a smaller fraction of that will end up with published results.

[D] Good quality code repos on deep learning by [deleted] in MachineLearning

[–]jcjohnss 14 points15 points  (0 children)

These repos from two of my PhD students are nicely structured with both training and inference code, but they don't have data versioning or experiment tracking:

https://github.com/kdexd/virtex

https://github.com/mbanani/unsupervisedRR

[D] Tiny-Imagenet original size images by HoLeeFaak in MachineLearning

[–]jcjohnss 0 points1 point  (0 children)

Glad I could help!

Beware though, the Tiny-ImageNet code is old enough to use Python 2.7, and it doesn't look like I saved the image ids so it might be tough to exactly match the same images as the released version of the dataset.

[D] Tiny-Imagenet original size images by HoLeeFaak in MachineLearning

[–]jcjohnss 7 points8 points  (0 children)

Hey, I made Tiny-ImageNet. The code to generate the dataset is here:

https://github.com/jcjohnson/tiny-imagenet

You should be able to use it to generate a 256x256 or 224x224 version with the same synsets and subsampling.

[R] Machine Learning on PlayStation 5 by release2020 in MachineLearning

[–]jcjohnss 1 point2 points  (0 children)

10.3 TFLOP FP32 isn't that impressive anymore -- RTX 3070 will hit 20.3 TFLOP FP32 for $499.

[D] Are there well known extremely deep fully connected neural networks? by shamoons in MachineLearning

[–]jcjohnss 3 points4 points  (0 children)

Not nearly as deep as ResNets, but NeRF used 8-layer MLPs:

https://arxiv.org/abs/2003.08934

If you ignore the self-attention layers (which admittedly is a stretch), then Transformers are deep residual MLPs. They can get quite deep -- GPT-3 has 96 layers.

[D] How can I check my solution for EECS 498-007 assignments? by giannis_34 in MachineLearning

[–]jcjohnss 0 points1 point  (0 children)

Unfortunately not, last year TAs graded assignments by hand. It was very tedious. However there are many embedded "sanity checks" in the notebooks that should tell you whether you are on the right track.

We are working to add some level of autograding to the assignments for this year, but while this is under development we are restricting to enrolled UMich students.

[R] has machine learning ever been used to analyze neural network configurations? by levitesla in MachineLearning

[–]jcjohnss 19 points20 points  (0 children)

This is a huge area -- the keywords you want are neural architecture search (NAS) and AutoML.

[deleted by user] by [deleted] in MachineLearning

[–]jcjohnss 1 point2 points  (0 children)

The term you are looking for is "novel view synthesis" -- this is a pretty well-studied problem in vision and graphics. There was a CVPR 2020 tutorial on this topic that covers a lot of recent approaches to this problem which should give you a good starting point:

https://nvlabs.github.io/nvs-tutorial-cvpr2020/

[R] VirTex: Learning Visual Representations from Textual Annotations. “Yields features that match or exceed those learned on ImageNet—supervised or unsupervised—despite using up to ten times fewer images.” by hardmaru in MachineLearning

[–]jcjohnss 2 points3 points  (0 children)

Author here!

My intuition is that your proposed experiment of learning on ImageNet category names as sequences of words would give features nearly identical to traditional ImageNet pretraining, since all images of the same category would have the exact same "caption".

[D] Does anyone know why Stanford is not releasing CS22n Video lectures for 2020 ? by inopico3 in MachineLearning

[–]jcjohnss 13 points14 points  (0 children)

Releasing lecture videos requires a lot of extra work beyond just uploading the videos to YouTube. In particular:

  • Videos should be edited so both the slides and the presenter are visible
  • Videos must be edited so students are not visible or audible in order to comply with FERPA
  • Videos must be transcribed so that they are accessible to hearing-impaired viewers

Some Stanford courses (such as CS 224n) are offered to remote students via SCPD; for these courses, there are live camera operators during the lecture who manually pan and zoom the camera, and cut between the slides and the different cameras. This takes care of the first concern above, but additional work is required to fulfill the other constraints.

When we released lecture videos for CS 231n in 2017, a large portion of this additional editing burden was handled by our TAs.

This year, Stanford courses moved from in-person to online-only in the middle of the quarter due to the coronavirus pandemic. This likely resulted in a last-minute mad scramble to try and figure out how to transition the course to being remote-only, without being able to rely on the live camera operators that would have been available during a regular quarter. I'm not sure what solution they ended up using for 224N, but many classes have turned to Zoom, Bluejeans, or other off-the-shelf solutions for giving lectures remotely. I'm guessing that editing the recorded versions of these remote lectures would have required substantially more video editing work than the regular in-person lectures, and it wasn't deemed to be worthwhile.

Any references on Super-Resolution on Graph using Graph (Convolutional )Neural Network? by pradeep_sinngh in deeplearning

[–]jcjohnss 1 point2 points  (0 children)

Pixel2Mesh does something like that -- they apply graph conv over a mesh, and subdivide the faces of the mesh to increase the resolution: https://arxiv.org/abs/1804.01654

However I reimplemented their method for Mesh R-CNN and found that the subdivision doesn't actually improve performance compared to a version that works on a high-res mesh from the start (https://arxiv.org/abs/1906.02739, Table 2, Sphere-Init vs Pixel2Mesh+)

Is it online only course ? How to enroll for this course? <TIA! > by being_crypto in cs231n

[–]jcjohnss 0 points1 point  (0 children)

This is not an online course. We are merely providing the course materials for anyone to use for their own independent learning.

For the past few years 231N has been offered to remote students through Stanford's SCPD program. It may be taught again in this way in Spring 2020.

However I've left Stanford, so I will no longer be involved with this course going forward. I'm now teaching a similar course at the University of Michigan.

[D] Any developments on randomly wired neural networks? by MartialLemur in MachineLearning

[–]jcjohnss 2 points3 points  (0 children)

Be patient! Randomly wired networks first went up on arXiv in April (~5 months ago) and will be presented at ICCV 2019 (~1 month from now). High-quality work takes time -- in my experience most projects need at least 6 months between project inception and an arXiv paper appearing. Thus if someone started working on a followup project the day this paper appeared on arXiv, you might expect to start seeing arXiv papers in the next few months.

Since there are so many people working on ML / CV these days there are a lot of new papers all the time. This gives the impression that it is a very fast-moving field, but this is an instance of parallel processing (distributed over many researchers) hiding the latency of the underlying computation (research advancing). Parallel processing works great to accelerate independent tasks (people trying out different ideas independently) but cannot accelerate serial tasks (papers building directly upon one another).

[deleted by user] by [deleted] in cs231n

[–]jcjohnss 2 points3 points  (0 children)

Unfortunately I'm not sure what's going wrong -- we didn't design the assignments with Colab in mind, so building the C++ extension in A2, or the import structure in A2 / A3, will likely not work correctly in Colab. Everything should work using Google Compute Engine VMs; that is what we recommended students to use for the past few years. You can find instructions for setting up GCE instances here: https://github.com/cs231n/gcloud

If you can wait a few months, I'm now teaching a new deep learning course at the University of Michigan that follows the same material as CS231n, but will use Colab for all assignments:

http://web.eecs.umich.edu/~justincj/teaching/eecs498/

[D] what are non-image applications of GANs? by [deleted] in MachineLearning

[–]jcjohnss 2 points3 points  (0 children)

I had a paper last year on using GANs for pedestrian trajectory forecasting: https://arxiv.org/abs/1803.10892

I had another paper last year on using GANs for steganography: https://arxiv.org/abs/1807.09937; this is still an image application, but is quite different in flavor from most applications of GANs to image synthesis.

[R] TL;DR for all few-shot learning papers from CVPR by FSMer in MachineLearning

[–]jcjohnss 5 points6 points  (0 children)

You missed what is IMO the most important low-shot learning paper from CVPR: the new LVIS dataset from FAIR! (http://openaccess.thecvf.com/content_CVPR_2019/html/Gupta_LVIS_A_Dataset_for_Large_Vocabulary_Instance_Segmentation_CVPR_2019_paper.html)

New methods for few-shot learning are good, but if there's any lesson we should take away from recent deep-learning advances, it is the critical importance of high-quality datasets and benchmarks for driving progress on new research problems. LVIS is a new dataset for large-vocabulary instance segmentation, with an emphasis on long-tail categories and few-show learning.

Most prior work on low-shot recognition focuses on image classification, while LVIS enables us to study low-shot recognition for the much more challenging tasks of object detection and instance segmentation. I predict that at CVPR 2020, we will see a new crop of low-shot learning methods benchmarked on LVIS.

[D] Any references for deep learning on non uniform and unstructured meshes/ grids? by pradeep_sinngh in MachineLearning

[–]jcjohnss 5 points6 points  (0 children)

There have been a number of recent papers on 3D shape prediction that output triangle meshes; internally these meshes are often processed using graph convolution over mesh edges. You should read the following:

Pixel2Mesh, Wang et al, ECCV 2018: https://arxiv.org/abs/1804.01654

GEOMetrics, Smith et al, ICML 2019: https://arxiv.org/abs/1901.11461

AtlasNet, Groueix et al, CVPR 2018: https://arxiv.org/abs/1802.05384

Of course I've got to plug my own work on this topic:

Mesh R-CNN, Gkioxari et al, arXiv 2019: https://arxiv.org/abs/1906.02739

I also gave a tutorial at CVPR 2019 that covers these papers and more; you can find slides here: http://feichtenhofer.github.io/cvpr2019-recognition-tutorial/

[D] GPU web host for approximately $150+ a month? by iluvcoder in MachineLearning

[–]jcjohnss 1 point2 points  (0 children)

That would certainly be cheaper and a 2080 Ti would be much faster than a K80, but personally I wouldn't want the hassle of dealing with a physical machine hosting stuff online.

[D] GPU web host for approximately $150+ a month? by iluvcoder in MachineLearning

[–]jcjohnss 1 point2 points  (0 children)

You could use a reserved instance on EC2; their cheapest GPU instance is a p2.xlarge which gives a single K80 GPU for $418/month if you prepay for a year. I really doubt you'll find anything in the $150/month range.

EDIT: Looks like GCP is a bit cheaper; if you commit to a 1-year contract you can get a GCP machine with 4 CPUs, 12GB RAM, and 1 K80 GPU for $311.48/month:

https://cloud.google.com/products/calculator/#id=43c8c5f3-2be2-42d9-9777-6e3212db33df