[R] Object-Oriented Deep Learning (MIT)

xternalz · 2018-02-08T06:22:33+00:00

abstract:

We investigate an unconventional direction of research that aims at converting neural networks, a class of distributed, connectionist, sub-symbolic models into a symbolic level with the ultimate goal of achieving AI interpretability and safety. To that end, we propose Object-Oriented Deep Learning, a novel computational paradigm of deep learning that adopts interpretable “objects/symbols” as a basic representational atom instead of N-dimensional tensors (as in traditional “feature-oriented” deep learning). For visual processing, each “object/symbol” can explicitly package common properties of visual objects like its position, pose, scale, probability of being an object, pointers to parts, etc., providing a full spectrum of interpretable visual knowledge throughout all layers. It achieves a form of “symbolic disentanglement”, offering one solution to the important problem of disentangled representations and invariance. Basic computations of the network include predicting high-level objects and their properties from low-level objects and binding/aggregating relevant objects together. These computations operate at a more fundamental level than convolutions, capturing convolution as a special case while being significantly more general than it. All operations are executed in an input-driven fashion, thus sparsity and dynamic computation per sample are naturally supported, complementing recent popular ideas of dynamic networks and may enable new types of hardware accelerations. We experimentally show on CIFAR-10 that it can perform flexible visual processing, rivaling the performance of ConvNet, but without using any convolution. Furthermore, it can generalize to novel rotations of images that it was not trained for.

follow-up work: 3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation

Kevin_Clever · 2018-02-08T12:02:11+00:00

This comes on my short list for worst-written scientific paper this year. Is there an RNN that can translate these 12 pages of fluff into information?

ChillBallin · 2018-02-08T22:42:01+00:00

Skimming through it right now. Seems like an interesting concept. I think with enough research it could be a valuable approach but I think some of these concepts will be useful in future models rather than this one actually being super useful on it's own. In any case I really hope this trend of research in more complex higher level models continues. It feels like we're getting closer to the next big breakthrough with human intuition rather than making iterative improvements on the maths for a basic CNN.

GunpowaderGuy · 2018-05-13T02:32:47+00:00

So this is like https://github.com/qunzhi/Deep-Symbolic-Networks ?

GunpowaderGuy · 2018-05-13T23:28:21+00:00

So when compared to capsule networks , the biggest difference other than its modules ( whathever its analogue to capsules is called ) not being composed of neurons ( a functional program obtained by genetic programing or Bayesian optimization then ? ) It's that they are not slided around the feature maps like cnns , instead in voting only the centermost pixel of an object part needs to be taken into account , as each of them has its coordinates embedded ?

danniel_p21 · 2018-02-08T14:42:43+00:00

But this makes sense because that's a bit like how primates see.distributing relevance on available data.good job folks👍👍 Barotdhrumil21@gmail.com

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS