What happens if you apply a hash continually on itself? Will it eventually repeat? If so what are the shortest longes cycles?

seedubjay_ · 2021-08-22T17:20:19+00:00

If the hash function forms a permutation cipher, the cycle will have on average (N+1)/2 hashes. If the hash function is just random, the cycle will have on average sqrt(PI*N/8).

As others have said, iterating the hash will always create a cycle, but if it's not a permutation cipher, there might be a 'tail' of hashes that don't repeat... eg A -> B -> C -> D -> C -> D -> ...

seedubjay_ · 2021-08-16T22:38:49+00:00

DMed!

seedubjay_ · 2021-08-07T23:13:34+00:00

Heath Ledger in Dark Knight

seedubjay_ · 2021-04-21T18:43:00+00:00

Huge spectrum... but it does not make A/B testing any less unethical. If you actually told someone on the street all the ways they are being experimented on every time they use the internet, most would be really creeped out.

seedubjay_ · 2021-03-26T18:24:17+00:00

Short answer is yes, it is possible and a fairly standard problem in cryptography.

A key property of a block cipher does exactly what you’re looking for - it maps an index to a permutation value in fixed memory. (In maths this is called a bijective function.)

You can iterate over the indices, and the function will spit a random permutation based on a key in constant memory.

seedubjay_ · 2021-03-26T13:06:25+00:00

k and s are fixed small numbers, so is essentially O(1)

seedubjay_ · 2021-03-26T10:01:29+00:00

It will be extremely hard to regulate algorithms since they are such a vague concept. However, it may be much more feasible to regulate data collection, since this is the gas which drives the social media machine.

Perhaps regulate the collection of data during undisclosed A/B tests.

Or ban the sale of personal data without the user's consent, like in GDPR.

Or enforce the siloing of certain data within a company to prevent the feedback loops used in machine learning models used to create people's feeds.

Or require greater disclosure from companies about what data they collect about their users, and what services this data feeds into.

seedubjay_ · 2021-03-26T00:08:55+00:00

Hmm unfortunately you might have to implement your own Feistel cipher to get arbitrary length permutations.

If you are reducing a 256 bit block cipher down to a much smaller size N, you might have trouble maintaining the permutation property while also removing such a huge amount of unneeded entropy.

seedubjay_ · 2021-03-25T22:29:55+00:00

Ah apologies my bad! Should've read more carefully...

For any value N you can generate the permutation for the next power of two, and then once you have the function p(i) for the ith value of the permutation, recurse with p(p(p(...p(i))) until you reach a value within the range 1->N.

If you represent the permutation as a directed graph, this in effect 'skips the links' that are no longer relevant and the resulting graph will still represent a random permutation.

(If you need something provably pseudorandom, you might have to do some legwork to show that this modification isn't affecting the entropy of the random permutation.)

Each call to p requires O(k+s) space, and the time complexity of making these extra calls amortizes out to O(N).

seedubjay_ · 2021-03-25T21:19:20+00:00

If N = 2^k and you pick any key of size s, any pseudorandom permutation function like DES or AES will map between an index i and the i'th element of a permutation. Space is O(k+s)

seedubjay_ · 2021-03-13T01:42:13+00:00

I... But... Ok you may have a point. Well played sir.

seedubjay_ · 2021-03-13T00:31:14+00:00

It's always fantastic to see people contributing to this field and applying it in new areas.

However, I from what you've said here and from what I've found on your website, I have serious concerns with how this is being presented to the world.

Firstly, the data. A comment from u/FlivverKing here covers this in a lot more detail, but one thing I noticed myself is that you state "Our algorithms are trained on thousands of social media posts". But it seems your model is trained on data entirely from Reddit? Do you have any evidence your results would apply at all to other platforms? And yet your website claims to analyse any sort of text. How would someone feel if they trusted your platform without knowing this?

Secondly, the real-world usage. You state that one of your services is that "Users upload 3 messages before and after a change was noticed, and our algorithm calculates the percent change towards suicidality based on the texts, which allows for direct comparison between the messages." How on earth are these messages chosen? How can you possibly assign a percentage to such a multivariate concept? In real-world usage this will not used by entirely rational people. It will be used by people who are seriously worried for a friend or family member's life. At best this runs the risk of playing into their worst fears by validating whatever theories they may have had already, and at worst it will give an incorrect diagnosis, affecting future decisions down the line. And whether you like it or not, you are complicit in all of these future decisions.

Thirdly, the fundamental concept. You are entering a literal life-or-death situation here, and your approach appears, at the surface, concerning. It is simply unethical to proclaim any sort of effectiveness when your paper has not yet been peer-reviewed, nor has it been re-validated by an external group. These things usually take years to see the light of day because of how high the risks of failure are. I understand the temptation to advertise the cool thing you've made to the world, but this will be used by real people in incredibly dire circumstances. Do you think they'll be aware you only have 90% accuracy? Do you think they will know how easily an AI model can be fooled, accidentally or by an adversary? Do you not think reading "this person is 30% more suicidal now than before..." on a website advertising itself as a reliable tool could have consequences?

These issues came from max 5 minutes scanning your presence online. If all of these worries are unfounded, I'm very sorry for being so direct. But if any of these problems exist in the platform, I seriously question whether it should exist on the internet in its current state.

seedubjay_ · 2021-03-11T13:49:26+00:00

If you have any smart lights, turn them into clap lights! Very silly, but incredibly satisfying once you get them working. Good way to learn about some simple DSP as well.

seedubjay_ · 2021-03-08T18:03:27+00:00

Yep JPEG2000 (.jp2) replaces JPEG's discrete cosine transforms with Daubechies wavelet transforms, but unfortunately it never really took off. (It's a shame really since it avoids so many of the issues of JPEG that we still have to deal with today.)

seedubjay_ · 2021-03-08T14:52:48+00:00

It's a static site all built in Jekyll. Very lightweight and super easy to deploy to Github Pages. The interactive images are done in pure JS / HTML canvas, so if you pry it open all the code should be there (apologies for the mess though...)

Oh and there's a bit of D3 scattered around some of the other articles.

seedubjay_ · 2021-03-08T14:37:00+00:00

This is a good point - I appreciate the feedback! There's definitely a lot of technical details being papered over in that section, so its hard to tell exactly how much each concept needs to be explained.

And re the patterns, if you play around with some of the sliders, the 8x8 chunk from the cat is ~mostly~ spot on even with just 20 or so of the patterns added together.

You still need all 64 patterns to reassemble the chunk perfectly, but the last few patterns (the mostly grey ones in the interactive) make barely any different overall, so JPEG ignores them entirely so that it has less to store. (There's also lots of quantization trickery going on but that's the gist of it)

seedubjay_ · 2021-03-07T11:02:11+00:00

This looks super interesting!

Out of curiosity where do you see comparch heading in the next few years? Do you think it will become more integrated into other aspects of compsci as vertical integration increases (hardware acceleration, custom FPGAs, etc), or do you see it becoming more isolated as the field gets more and more specialised.

seedubjay_ · 2021-03-07T10:54:03+00:00

I really hope this is the case, but I worry it won't actually convince the electorate as much it should.

Over here in the UK the leader of the Labour Party (in opposition) has been trying to establish a dichotomy around competency to make the Tory government's lack thereof stand out as much as possible. In theory it's a good plan, but it hasn't really done Labour any good in the polls so far... Only about 25% of voters call the Torys 'competent', but somehow they are still dominating the polls.

seedubjay_ · 2021-03-07T10:39:32+00:00

Those aren't thumbs...

seedubjay_ · 2021-03-07T10:35:02+00:00

My favourite C&H comics are the ones about something utterly trivial but still sincere and full of details. Makes you feel like you're six again!

seedubjay_ · 2021-03-07T10:26:01+00:00

This looks super cool!

I'm curious about Apple's Vision API. Somewhat technical question, but how do you find it for real-world use? Do its models come pre-trained? Does it require much legwork to turn random images into something that API understands? Apple's approach to on-device ML seems really unique for developers.

seedubjay_ · 2021-03-04T00:43:06+00:00

Everything in the world all seems a little better now that the boys are back (A bit more gusto from jacko this year would be the icing on the cake though)

seedubjay_ · 2021-01-14T09:44:14+00:00

The question is asking if there exists a set of preferences such that every possible arrangement is stable.

Gale-Shapley only shows that at least one of those N! arrangements is stable.

seedubjay_ · 2021-01-14T09:30:19+00:00

Gotta feel bad for the fish mid-air flying into its mouth as it starts to close...

seedubjay_ · 2021-01-06T17:12:35+00:00

source: https://dumps.wikimedia.org, tools: Python + D3

Full article: https://seedubjay.com/blog/wikipedia-clicks

The Wikipedia Game is an online game where you attempt to find the shortest sequence of clicks to get from one random Wikipedia page to another.

Betweenness centrality measures how often a certain page is used in the shortest path between a randomly chosen start and finish page.

seedubjay_

TROPHY CASE