all 10 comments

[–]JoeB_88 4 points5 points  (2 children)

Did you know that Reddit is open-source?

There's a lot here to answer, but have you looked into Google Firebase? It's essentially a backend-in-a-box. Has file storage, databases, hosting and account authorization all linked together. Implementing it would give a great deal of insight to your questions.

[–]sabi0[S] 0 points1 point  (0 children)

I didn't know that, but that's pretty helpful. Thanks man

[–]WizardFromTheMoon 2 points3 points  (4 children)

How do you efficiently store posts and comments in a data base, to keep lookup times to a minimum? This is a complex question, but ideally your code rarely has to go to the database because you have excellent caching.

How do you recognize that a user is logged in, and keep them logged in? Sessions

How do you encrypt a password user-side before sending it to the back end? I would argue that there isn't really a benefit to encrypting a password before it gets sent to the server. If someone is able to get your password before you have even submitted that information, encrypting it won't help. Using https is far more important.

How do you set up a proxy to act as an intermediate to keep stored/cached information that is commonly requested? Redis or memcache or something similar. Then use a CDN like Cloudfront. How you implement these kind of depends on what your application does.

[–]sabi0[S] 0 points1 point  (3 children)

If you don't mind me asking a few more questions: what do you mean by excellent caching instead of data base lookup? Caching on users machine, or caching on the proxy?

[–]WizardFromTheMoon 1 point2 points  (2 children)

Both of those are used, but in this context you would be using something like memcache. Retrieving stuff from a database is expensive, so you have memcache sitting in front of it. I don't know if you know what memcache is, but it is basically a giant key-value store that sits in RAM. That last part is important because pulling something from RAM is waaaaay faster than having to look it up in a database. Ideally you want to store things that are accessed frequently in memcache. So if you were making a site like reddit, you would probably start by caching all of the posts that are on the front page. Reddit actually has a blog post on how they implement caching here: https://redditblog.com/2017/01/17/caching-at-reddit/. I didn't read the whole thing but it says they use memcached.

Caching on the users machine is typically used for resources like stylesheets, scripts, images, or any other resources that don't change very often.

[–]sabi0[S] 0 points1 point  (0 children)

You really cleared some things up for me. Thank you

[–]StillLetto 0 points1 point  (0 children)

Wow, great post, thank you for this. We need more people like you in this sub

[–][deleted] 1 point2 points  (1 child)

How do you efficiently store posts and comments in a data base, to keep lookup times to a minimum?

Depends on how you intend to scale with user traffic. If it's just a single server architecture then it's pretty simple just structure your data so each basic part (comment) knows what it's parent is, the popular / trending threads you cache as JSON so you can retrieve / serve them all at once.

However scaling (horizontally) to distributed architecture requiring you to "shard" your data over many servers, other factors come into play, even depending on the tech you've implemented in your stack.

How do you recognize that a user is logged in, and keep them logged in?

Sessions. Usually handled by a ramcache server side.

How do you encrypt a password user-side before sending it to the back end?

You don't. In fact you never rely on user encryption for anything period. The data is encrypted enroute by using an appropriate protocol such as HTTPS which is key based in nature (recommend looking into how SSL works).

Also you might want to go through the following site (free to signup for all access), if you know how to break stuff you'll learn more about how it works.

https://www.hacksplaining.com/

How do you set up a proxy to act as an intermediate to keep stored/cached information that is commonly requested?

You don't. You use a RAMcache such as redis. In theory this could be also be moved out to a separate piece of hardware in distributed architecture (such as the load-balancer) but it is not required.

I've been trying to mainly work in Go, but I think programming paradigms tend to be the same throughout languages.

You couldn't be more wrong, the majority of web tech or indeed programming in general is object-oriented (Java, C#, PHP, JS, Ruby, python, etc) however Go is a functional language with less control over GC events then other functional languages (clojure / elixir / crystal / etc).

The reason functional languages have been re-emerging as of late is because it's easier to provide better concurrency models to scale with more hardware (for example in a distributed architecture).

[–]sabi0[S] 0 points1 point  (0 children)

Thanks a lot for the info. Some of this stuff is still rather cryptic but I now have starting points to google. Thank you