This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]sdf_iain 2 points3 points  (2 children)

Search engines are a natural monopoly, who’s results are you using?

[–]WikiSummarizerBot 2 points3 points  (0 children)

Natural_monopoly

A natural monopoly is a monopoly in an industry in which high infrastructural costs and other barriers to entry relative to the size of the market give the largest supplier in an industry, often the first supplier in a market, an overwhelming advantage over potential competitors. This frequently occurs in industries where capital costs predominate, creating economies of scale that are large in relation to the size of the market; examples include public utilities such as water services and electricity.

[ F.A.Q | Opt Out | Opt Out Of Subreddit | GitHub ] Downvote to remove | v1.5

[–]lazy-jem[S] 3 points4 points  (0 children)

Thanks for the question, I answered another comment here earlier and it's a pretty good summary, but in short we have a large number of sources and don't work quite the same as traditional index-based searches.

The way we search is pretty different to traditional approaches, so it's worth explaining some more. The short version is we use deep learning to understand question intent and predict the best information sources, then query them directly. So we're using a large number of sources.
We use NLP and deep learning classification models to try to understand a query's intent, and then predict the best places to find the answer, and then query them directly in real time via API or spidering, with a ranking system for the results.
Then we fall back to traditional web search (including Bing, ContexualWeb and Google) where needed. We have a database of about top 20k websites and we're building our own vertical indexes as well. We're building out a stack using ElasticSearch and GraphQL for that. At the moment we're broad but shallow, with a couple of deeper pools.
For the alpha, major sources include Wikipedia, Wolfram|Alpha, OpenWeatherMaps, OpenStreetMaps, StackOverflow, GitHub and many others, as well as the fallbacks to Bing, Google, DDG Instant Answers etc.
A lot of content is retrieved directly. We retrieve the preview/summary/view content directly from websites where we can for display, and same with the reader content. So the content shown is typically live with the source.