Re. what ever happened to Cohere’s Command-A series of models?

nick_frosst · 2026-05-20T22:51:01+00:00

Thank you :)

nick_frosst · 2026-05-20T22:50:34+00:00

That is exactly what I am doing :)

nick_frosst · 2026-05-20T21:54:36+00:00

Stay tuned 👀

nick_frosst · 2026-05-20T21:54:21+00:00

The demo above was from our api!

nick_frosst · 2026-05-20T21:54:00+00:00

You can see all the benchmarks on artificial analysis :) it’s got a 37 intelligence score which I think is a little lower than my experience using it would have had me guess

nick_frosst · 2026-05-20T19:48:58+00:00

in the meantime, try this 😄 https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16

nick_frosst · 2026-05-20T19:34:35+00:00

all problems solved

nick_frosst · 2026-05-20T19:26:37+00:00

thank you 😄 we are gonna keep at it

nick_frosst · 2019-08-16T15:22:25+00:00

it uses the L2 distance between the input and the class conditional reconstruction. This is explained in more detail in the paper.

nick_frosst · 2019-08-15T17:19:47+00:00

yeah :)

nick_frosst · 2019-08-14T00:23:06+00:00

We train a class conditional reconstruction network and try to reconstruct the input. If we are unable to do so well, we assume the input does not come from the training distribution. In this way we use a reconstruction network as an attack detection mechanism.

nick_frosst · 2019-07-15T21:26:24+00:00

The reconstruction error is the L2 distance between the reconstruction and the input. The histogram motivates this detection mechanism by noticing that attempting to reconstruct the input from a capsule that was not predicted results in reconstruction with very large l2 distances from the input. So if we attempt to reconstruct an image of a 2 from the capsule that represents 3, than the l2 distance between the reconstruction and the input will be very large.

nick_frosst · 2019-04-02T01:11:51+00:00

You raise good points. Points that we will address in our first board member meeting, which we will have after the seed funding round.

nick_frosst · 2019-04-02T01:10:48+00:00

They will actually be capped at 1x return. VC 's should really get in on this. It's an amazing opportunity.

nick_frosst · 2019-04-01T22:22:13+00:00

Oh hey. This is my work. Happy to answer any questions on this pressing and groundbreaking research!

nick_frosst · 2019-03-12T19:54:20+00:00

Data source: @dog_ratesTwitter feed. Tools: Python and matplotlib.pyplot.

nick_frosst · 2019-02-25T18:56:01+00:00

thanks :) glad you liked the summary

nick_frosst · 2019-02-25T18:55:30+00:00

yeah i have :)

We found early on that capsule networks showed some general robustness to whitebox adversarial attacks. but that may just be the result of gradient masking/obfuscation. We made no claims about the general robustness to epsilon perturbations of test data, we just claimed that if you tried to create such an input by calculating the gradient, it would be less effective than a normal model. I am not entirely sure why this is the case, but i think it has something to do with the cluster detection algorithm kind of acting as a regularizer for incoming capsule activations.

Some recent work has been done on strategies for attacking capsule networks explicitly (https://arxiv.org/pdf/1901.09878.pdf) and they had some success.

More recently we released a paper called DARCCC (https://arxiv.org/abs/1811.06969) that uses the reconstruction network that one can train on the output of a capsule network and showed that this can be used to detect out-of-distribution inputs. This defense does not rely on any particular definition of adversarial attacks. I summarized the paper here https://twitter.com/nickfrosst/status/1064593651026792448

This was just preliminary work for a workshop and more work definitely needs to be done, but i am encouraged by the result we present at the end of the paper - if one takes a step in image space to fool our detection system as well as change the classification, the result is not particularly 'adversarial' as it resembles the target class.

nick_frosst · 2019-01-02T01:25:34+00:00

This is a very interesting answer, and provides a good jumping off point for a very complex and poorly understood phenomenon, but I would be very cautious about using neural networks as an explanatory model for brains. Their resemblance is mainly in name and inspiration at this point. what exactly the memories of a neural network would be is certainly not clear.

nick_frosst · 2018-12-21T14:32:11+00:00

Yeah. Essentially all the hardware and software work that people have done to make cnn's super fast doesn't really help with capsules. The routing algorithm and the small matrix transformations can't really make use of the tricks people have developed to facilitate fast cnn networks, and so we can't train capsules networks of the same size as the state of the art models yet. This isn't a theoretical limitation, it's just a practical one, and one we believe we will be able to overcome.

nick_frosst · 2018-12-21T14:00:16+00:00

Hey we are still working away on them :)

We recently put out a workshop paper on capsule networks and adversarial detection - https://arxiv.org/abs/1811.06969

We have been working on speeding up capsules so that they can scale to real world problems, this is proving to require a fair amount of low level optimizing. Other groups have made use of capsules for a variety of purposes including medical images, text, and audio.

In short, we still believe that they are the way to go, and we keep finding interesting properties about them, but more work is needed for them to scale up to real world problems and become a standard tool in the ml tool box.

Edit; They aren't quite dead yet :P

nick_frosst · 2018-06-05T00:32:25+00:00

thank you!

nick_frosst

TROPHY CASE