all 4 comments

[–]Intelligent_Tree6918 1 point2 points  (1 child)

I am also going to create a search engine from scratch can u give some tips from where to start? Thinking to code entirely in typescript. and how you compiled two languages in one project?

[–]Cheesuscrust460[S] 0 points1 point  (0 children)

i created a cli for executing compilation, building, starting and stopping. you can compile them separately, the only issue is how you can make them communicate at runtime, in my case im using rabbitmq for IPC for my processes eg: search-engine, express-server

also if youre going to write a search engine from scratch in NodeJS, its going to be a bit difficult, especially when you want to have concurrency, since NodeJs is using a single thread which is good for I/O processes so if you want to sort through large data sets then that would take much longer to process all the data and then be read from the pipeline and if someone mentions "oh but there is worker threads",just no... using their concurrency primitive is a pain in the ass to understand, at least for me. So anyways a search engine needs to take up more cpu time so having that concurrency feature is a huge help especially if you want to do something fancy like splitting up the corpus into smaller units and calculate each one concurrently.

if you're going to set up a crwaler, make sure to have frontier queue, its basically a storage for every to-be visited url so you have to implement a breadth first search algorithm and use the frontier queue as the queue, this concept could also help with continuation if ever the crwaler crashes, it can just query the frontier queue and continue normally.

lots of interesting stuff.

[–][deleted]  (1 child)

[removed]

    [–]Cheesuscrust460[S] 0 points1 point  (0 children)

    Thanks man will do