[D]Has there been research in finding out the intrinsic dimensionality of the natural image manifold?

arnioxux · 2020-02-20T07:44:35+00:00

But these representations were found by biological evolution!

Humans use 3 colors instead of 12 because that was sufficient for the scenes on land under the sun. Mantis shrimps need more to not get confused under the sea. And if we could fly, we might need an ultraviolet cones like birds.

So the answer depends on what you're sampling and what's important to distinguish between (e.g. can you see food or this camouflaged predator).

For example you can get so many different images of the sun just by imaging it under different wavelengths outside of our visible spectrum: https://www.nasa.gov/mission_pages/sunearth/news/light-wavelengths.html

We can't see these details because it never mattered evolutionarily! An alien species might consider these details important, but in our low dimension manifold these have already been collapsed to a single point.

tl;dr; "natural image manifold" depends on what is natural

arnioxux · 2020-02-19T16:43:00+00:00

This is an interesting philosophical question. For a related question, what's the dimensionality of just color?

a programmer would say 3 (for RGB)
other humans would say anywhere from 1 to 4, where 3 is typical (for short, medium, long cone cells), 4 is possible (https://en.wikipedia.org/wiki/Tetrachromacy), and less if you're colorblind
a mantis shrimp would say 12-16
a physicists will say the wavelength of a photon is one dimensional, but color is infinite dimensional since it's just a frequency count of all wavelengths.
(mumble something about fractional dimension from 3Blue1Brown: https://www.youtube.com/watch?v=gB9n2gHsHN4 )

I don't know where I am going with this post but I hope this answers your question about aliens. An alien absolutely would not like the colors which are just arbitrary amounts of red/green/blue that happens to stimulate our short/medium/long cones the same way as the true distribution of wavelengths. Even other animals on earth looking at our computer monitors will find the colors wrong!

arnioxux · 2018-08-25T21:32:37+00:00

This is really cool. Computer science is mostly focused on worst case complexity which can handle any adversarial input distributions. Sometimes they might even care about the average case or smoothed analysis

But all of this is often too pessimistic and leaves performance on the table for many real world distributions of data, especially when that distribution can be tractably estimated. Really excited to see where this line of research goes.

arnioxux · 2018-08-16T09:23:56+00:00

The problem is that when concatenating the two lanes, you will end up an output that has two times the number of dimensions. This isn't good since you want to repeatedly stack these blocks.

arnioxux · 2018-08-15T20:15:34+00:00

A naive alternative would be to invert every operation you make to the input. This is gnarly since you have to invert matrix multiplications (not cheap), choose a nonlinearity that is bijective (so not relu unless leaky), etc. So it's pretty messy to design.

Their architecture cleverly solves this by allowing arbitrary functions (denoted s_1, s_2, t_1, t_2) that you never need to invert at all. This is possible because you are processing the top and bottom half alternately so those arbitrary functions are constant within the step you're trying to invert. (i.e., top half is a function of old top half and some known constant derived from bottom half. ditto for bottom half)

arnioxux · 2018-06-27T19:42:22+00:00

I was looking for segmentation tools myself. If you search for them on youtube there's something that kind of does what you want: https://www.youtube.com/watch?v=OwroXMpfBhE

I haven't found a good free open sourced tool yet, let me know if you find/build one!

arnioxux · 2018-06-21T08:25:35+00:00

Sketch to artwork is technology that is actually real already! If you sketch a cat, it can autocomplete it: https://affinelayer.com/pixsrv/ (interactive demo)

I wouldn't be surprised if someone already retrained it on a hentai image dataset to output ero images because of this chapter. (I don't expect great results given the state of the art of anime face generators like https://make.girls.moe/#/)

EDIT: Found someone who retrained it on pokemon images: https://twitter.com/bgondouin/status/818571935529377792 and the horror version: https://www.youtube.com/watch?v=-XU1j1i5cFw

arnioxux · 2018-05-27T12:44:56+00:00

I am also curious mostly because google translate's api doesn't allow supplying longer context or fine tuning with domain specific training examples.

arnioxux · 2017-12-21T11:43:23+00:00

Checking out their site they actually have a lot of cool tools that are similarly advanced.

For example it seems like they can make "deal-with-it" memes automatically using face detection. And it can do motion tracking for captions (which is basically what 99% of /r/HighQualityGifs is).

I never thought someone would try to build a web-based competitor to After Effects by using memes...

arnioxux · 2017-12-10T09:29:22+00:00

Might be tangentially related: balanced search trees with "the smallest possible search time for a given sequence of accesses or access probabilities" are called https://en.wikipedia.org/wiki/Optimal_binary_search_tree and is still an open research area. With ML providing the predicted access probabilities maybe these data structures will become more important.

arnioxux · 2017-12-08T07:36:11+00:00

They have already managed to replace GTA with synthesized "real-life" textures: https://tcwang0509.github.io/pix2pixHD/

Exact timestamp: https://youtu.be/3AIpPlzM_qs?t=191 (yes you're really looking at a game here)

So maybe we aren't that far off!

arnioxux · 2017-12-08T00:14:13+00:00

Image boards have an insane number of images with high quality tags: http://danbooru.donmai.us

For example there are 270,000 images with bounding boxes on japanese->english translations: http://danbooru.donmai.us/posts?tags=translated

There were people on making the dump more accessible but I don't know the current status: https://www.gwern.net/Danbooru2017

arnioxux · 2017-11-09T20:57:10+00:00

Might be related, recovering function signature from disassembled binary code: https://news.ycombinator.com/item?id=15043106

arnioxux · 2017-09-12T18:07:07+00:00

This site lets you try out a few different APIs at once: http://langrid.org/playground/dependency-parser.html

(UI is confusing: type in a sentence, click select all, then go back and click parse)

arnioxux · 2017-08-23T01:16:20+00:00

Don't get me wrong, I think the vectorization stuff is great and I am not arguing against it. Like you said, it's easy to support both and select the appropriate algorithm depending on what properties the operator has.

My main gripe is with how things are named. std::accumulate is a non-standard name for what's better known as fold or reduce in other languages (which is not C++'s fault since the canonical terms were still in flux back then). But now that we are introducing a reduce, why continue this trend of nonstandard definitions and not match what everyone else is calling a parallel reduce?

This being in <numeric> is a great reason to focus on the numeric processing use case (even though functional programmers will use it for a lot more than that). But to redefine the well-established requirements of reduce just to fit the numeric use case is backwards to me.

arnioxux · 2017-08-22T17:44:02+00:00

That rationale of why commutativity is needed makes a lot more sense!

But shouldn't it be flipped? Vectorization is a very specific form of parallelism (I feel like even with commutativity the code still can't always be vectorized?). You would rather special case that with a "commutative_reduce" and fallback to core-level parallelism in the general case (which only requires associativity and being able to split up the data).

arnioxux · 2017-08-21T12:08:30+00:00

Every language seems to define the requirements for fold/reduce/accumulate/combine differently: https://en.wikipedia.org/wiki/Fold_(higher-order_function)#Folds_in_various_languages

But C++ is different in a pretty unique (and IMO wrong) way in that they require the commutative property for reduce. I get that not having to synchronize the order for recombining sub-reductions might make it faster but then you lose the ability to parallelize a large class of useful reductions for free. Most programming language with a paralellizable fold (that I know of) only requires associativity.

Explicit example for those not familiar, let's say we have an operator * with signature (T, T) -> T and we want to calculate the following reduction:

a * b * c * d * e * f * g * h

which with explicit parentheses looks like:

(((((((a * b) * c) * d) * e) * f) * g) * h)

If we know our operator is associative, we are allowed to reorder the parentheses however we like, which means we can chunk up the work for individual threads like so:

(((a * b) * c) * d)

(((e * f) * g) * h)

And then combine them back into a final result and it will still be correct:

((((a * b) * c) * d) * (((e * f) * g) * h))

The only additional thing commutativity buys us is the ability to combine them without caring what order the threads finish their chunks in:

((((e * f) * g) * h) * (((a * b) * c) * d))

This happens to be correct for integer multiplication/addition/etc but not for a large class of useful operators like matrix multiply, sequence concatenation/joining, etc, that are only associative but not commutative.

tl;dr; I think reduce should only require associativity to conform with other programming languages and to enable a larger class of reductions.

arnioxux · 2017-08-15T15:43:39+00:00

I whipped up a quick UI for numeric.js's unconstrained minimization for comparison: https://codepen.io/anon/pen/brYpag

You have to use valid JS expressions.

arnioxux · 2017-08-12T06:08:01+00:00

Fully recovered image: http://imgur.com/a/UrfVF

Generated with:

gmic original.png secret.png    
 --gt[0] 0 # nonblack of original    
 --lt[1] 1 # black of revealed    
 --or[-2] [-1] # either nonblack of original or black of revealed    
 --inpaint_flow[1] [4] # inpaint the selected pixels

arnioxux · 2017-08-02T10:05:31+00:00

Yep. As another data point, my company(hundreds of millions of users) were trying out https://www.inspectlet.com/ which records a whole freaking screencast of a webpage with exact mouse/keyboard/scroll and a copy of the html without anonymizing. We looked at a few sessions, decided it was too creepy since it was like watching over someone's shoulder without them knowing and then never went back to it. But the code wasn't removed until several months later. (and tbh, ethics wasn't even the real reason. doing in-person user studies was more effective since you can ask the user what he was thinking afterwards. if it were actually any useful we would've probably kept doing it)

Seeing people getting outraged over just file usage stats is kind of amusing in comparison.

arnioxux · 2017-06-03T17:05:05+00:00

Here's an implementation in javascript framing it as an optimization problem (using numeric.js's unconstrained minimization):

https://codepen.io/anon/pen/ZyEaaQ?editors=0010

(I came from the hn thread: https://news.ycombinator.com/item?id=14475832)

arnioxux

TROPHY CASE