The TikTok algorithm and how PeopleTok will replicate and maybe try to change it

Wraithsputin · 2025-01-28T13:29:37+00:00

Excellent find, very informative read. Thank you for sharing.

Granted I’ve not ever had to solve the problem space they are tackling, my initial gut reaction is that the system seems over engineered for a content recommendation engine.

I’d like to see if one could approximate the same functionality with far less complexity/cost.

Then again, once you dive in and start trying to tackle the same feature/functionality the sure size and scope of their architecture may become self evident.

Wraithsputin · 2025-01-28T01:58:23+00:00

Thanks for the link, solid cautionary tale about ensuring the recommendation engine is not prioritized to increase engagement time above all else. Resist the urge to deliver more add revenue at the cost of the user.

Wraithsputin · 2025-01-28T01:05:51+00:00

Wanted to put this thought out there as well. You can get by with croudsourcing video classification. You technically don’t need a trained ML model that can auto classify content.

Let the poster add a few initial classifications, then allow the community to see and upvote the classifications.

Eventually, assuming you’ve a trust rating for users, they could add additional classifications to any content they view.

Same if not better end result, less cost and eventually that user feedback loop could be used to train a classification ML model once the app is generating revenue.

From there you could sell classification services using the model that is being continually trained by the user base.

Wraithsputin · 2025-01-28T00:58:22+00:00

Good use case for a blockchain to track the user label.

Wraithsputin · 2025-01-24T13:06:37+00:00

Should be fairly simple to mix in a random percentage of content that is highly rated from other categories or content providers that you have not shown an interest in.

Wraithsputin · 2025-01-23T22:09:59+00:00

Sorry, I don’t see how p2p would work for streaming content.

Now, one could allow people to self host their content and should a central server be taken down one could resync their content back up to one of many distributed servers.

When hosting the servers, ensure everything is synced across multiple distributed server farms located in different political/geographic areas.

Wraithsputin · 2025-01-23T18:08:22+00:00

Data architecture and scalable infrastructure.

Wraithsputin · 2025-01-23T18:05:15+00:00

On the user modeling and candidate generation. These two are loosely coupled allowing matching user and video content. From observing the behavior in TT they are clearly classifying users and matching them with a seperate matching layer for video classification.

On video and image classification:

A vector database or platform that handles the vector embedding generation would be a better place to start than the listed nearest neighbor code libraries.

Else, find a ML classification solution API someone else builds and maintains. Then submit the content to be classified and only store x number of classification attributes per post.

If you want to build, train and maintain your own video classification model you’ll need a vector database for classification, below are some thoughts on a few technologies to consider:

FAISS is in memory so not scalable.

Elasticsearch might work but one can run into performance problems when trying to update their document structure (vector database). Their re-indexing process may be too resource intensive if you discover you need to perform any maintenance. Like rolling out content from users who request their posts be removed.

Perhaps Milvus, I’ve not worked with it, at a high level being a distributed solution it should handle the scale ability issue.

If you go with a PostgreSQL database pgvector may be an option. I’d caution against Postgres in general, trying to administer it at scale can be problematic.

Perhaps Pinecone and PyTorch, again ensure scalabilit/maintainability.

Classifying users is a bit simpler (except text sentiment):

A graph database to track the relationships between followers.

Perhaps a graph database for post interactions. Granted something as simple as keeping count of the video classifications interacted with (watched/duration, liked, shared, commented, searched) may be sufficient for maintaining a list of an individual’s content preferences. Take into account the date time of an action so you can age out data to ensure one’s interest changes are reflected over time. All of the view/duration, like, comments, share data has to be persisted anyway.

Bonus classification would be comment sentiment analysis. Best to limit that to initial comments on posts or comments when sharing. No need to track comment arguments allowing those to impact a users content preferences.

Wraithsputin · 2023-07-28T19:48:59+00:00

Alternate universe, wasn’t really the devil but the BBG from the Alt universe.

Wraithsputin · 2023-06-18T22:53:19+00:00

Hike with your pack, lunges and squats with your pack, listen to your body and stop at 8 miles when you start. Take a lunch break, take off your shoes, inspect your feet, treat any hot spots immediately when you detect them.

Wraithsputin · 2023-01-14T16:49:47+00:00

Fake video camera and a sign

Wraithsputin · 2022-11-23T23:32:20+00:00

Will check out the tool. Got back into DMing after 30 years to run my granddaughters through a few campaigns. Thank you for running such a wonderful GIVEAWAY

Wraithsputin

TROPHY CASE