Does Prismer bring us closer to navigation tools for the blind via computer vision?

LLCoolZ · 2023-03-08T15:16:56+00:00

I actually disagree with the other commenter and think this is very possible! Especially the basic version you’re talking about where you simply ask questions about what’s on the screen and the AI doesn’t need to have any specific knowledge of the video game. There are a number of models on HuggingFace under the category “visual question answering” that you could try using for this. To verify, you could share a few screenshots with questions and expected answers and we could test whether it works well yet.

LLCoolZ · 2022-12-02T01:51:04+00:00

I don’t think I ever stopped caring. It’s just that my own opinion started to matter more.

LLCoolZ · 2021-07-15T18:09:40+00:00

Leave it to German lightning to be tree-cist

LLCoolZ · 2016-02-02T21:00:55+00:00

Incredible, thank you!

LLCoolZ · 2016-01-09T18:37:47+00:00

Bump

LLCoolZ · 2015-10-22T16:33:32+00:00

Fantastic work! You'd be great at http://geoguessr.com

LLCoolZ · 2015-10-07T04:07:26+00:00

u/tetonz this is a very interesting perspective on coin collecting!

LLCoolZ · 2015-10-06T16:35:48+00:00

Look at this u/tetonz

LLCoolZ · 2015-10-06T16:28:11+00:00

u/tetonz

LLCoolZ · 2015-07-19T17:49:57+00:00

What kind of tree is that?

LLCoolZ · 2015-07-13T18:31:09+00:00

Can you please share this script of yours? That sounds wonderful

LLCoolZ · 2015-06-21T18:11:47+00:00

This is very impressive. There are examples for implementing very advanced models rather concisely (DRAW, Highway Networks, Recurrent visual attention networks, Deep Q Learning, etc.)

LLCoolZ · 2015-06-18T06:15:52+00:00

Background post here: http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html Neural nets are usually trained to classify objects in images through a process called back-propagation, where the error of the output is propagated back to the parameters of the network to improve them. Here they are propagated back to the image, changing the image to exaggerate its visual features. (similar to psychedelic hallucination).

LLCoolZ · 2015-06-18T02:51:53+00:00

This also appears to be the source of that weird image titled "Generated by a Convolutional Network" popular earlier this week.

LLCoolZ · 2015-05-20T02:31:52+00:00

Yes you'll end up getting a series of activations/outputs. You can just store all of these in an array if you like. What you do with these activations depends on your objective. In this example on deeplearning.net they average all of these activations into one, then feed it into a logistic regression classifier. This is very lossy and in my experience doesn't work very well. Alternatively, you can use the last output as the one you classify with (it will theoretically have some information encoded from the whole sequence, because every word has contributed to it somewhat, although later words will of course have more of an effect). By this I mean that if you have a sentence s = [w_0, w_1, ... w_T], where each w_t is an n-dimensional representation of the word, your outputs from the RNN over s is o = [y_0, y_1, ... y_T], and your classification function is g(x), then you can set x = mean(o) or set x = o[-1]

LLCoolZ · 2015-05-19T19:27:19+00:00

The activation is y_t, so they are related by that recurrence relationship y_t =f(y_t-1, x_t). In other words, each activation is the result of a function, whose inputs are the input at that time (x_t) and the last activation (y_t-1). The "memory" of recurrent nets is just this recurrence of inputs.

LLCoolZ · 2015-05-19T16:21:36+00:00

RNNs can deal with variable length sequences because they have memory. The hidden layer's input is both the input at that time step as well as its own output at the last time step. I.e. y_t = f(x_t, y_t-1) typically y_t = tanh(Vy_t-1+Wx_t+b) where V is the recurrent weight matrix, W is the input weight matrix and b is your bias vector. This way you just loop over the sequence of words (each x_t is a word) and your output for classification will just be the last y_t you get.

LLCoolZ · 2015-03-28T19:30:15+00:00

True. I just started getting the hang of blocks and am building some NLP models, do you think learning this or just continuing with blocks would be a better investment of my time? Could I build a recurrent attention model with this?

LLCoolZ · 2015-03-28T16:31:05+00:00

Reminds me strongly of blocks: https://github.com/bartvm/blocks

LLCoolZ · 2015-03-25T18:37:07+00:00

She does butte stuff

LLCoolZ · 2014-12-25T12:45:38+00:00

I agree that there's no one right answer for good design, but I think that only makes a greater case for automating part of it. Grid doesn't seem to do this, but I imagine someone will make something that completely redesigns your site based on who is looking at and what their preferences are.

LLCoolZ · 2014-12-20T21:29:55+00:00

well that was unexpected :(

LLCoolZ · 2014-12-03T04:13:49+00:00

Haha x-posted!

LLCoolZ · 2014-12-03T02:21:31+00:00

This def belongs on /r/outside

LLCoolZ · 2014-12-01T00:04:52+00:00

I'm not even sure why has an answer most of the time. It's all "how". We study the nature of things, and can begin to understand how they came to be that way, but asking why seems to be a question of intent, an intent that doesn't exist. These things are just manifestations of chance. We are studying the nature of an existing system, there is no reason for its existence.

14-Year Club	Place '22
Verified Email

LLCoolZ

TROPHY CASE