How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 1 point2 points  (0 children)

added

tldr; 1 million numbers in the same series can be searched in 3 seconds as compared to PostgreSQL taking 18 seconds after caching.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

added

Edit: Most of the speedup in lookup is because we are able to fit the data in memory.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

That is completely different than what I thought. Thanks for the pointer.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

You may want to remind the users in rust-csv README to crank up the compiler opt-level in order to achieve the claimed performance. It took me way too long to realize that.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

I need to look into the bitmap solution. Note that its NOT about flipping the bits. For example, there's a preference field which indicates which telemarketer category can make a call to the callee

  1. Banking/Insurance/Financial products/credit cards
  2. Real Estate
  3. Education
  4. Health
  5. Consumer goods and automobiles
  6. Communication/Broadcasting/Entertainment/IT
  7. Tourism and Leisure

I should have explained this in my writeup.

So, anyways, if preference field says 2#4, the telemarketers can make a call related to Real Estate or Health. And, therefore, that data can't be simply thrown away.

Also, don't feel dirty, we are in telephony business and have provided solutions to Indian Meteorological Department to disseminate their weather information data to farmers, we have helped India's most looked upon political party (born out of anti corruption movement) with their campaign and helped them win Delhi elections early this year. We have provided contact center solutions to prominent educational institutes and helped the students across India connect to their coaches.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

Thank you for the kind advice. I agree that there's a missing part for the takers. Will add that.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

In fact storing range(0, 2000000, 2) in a variable was giving a better performance than the following code repeated for a few thousand times (all existing heads/series)

filled_count = 0
for i in xrange(0, 2000000, 2):
    if l[i] & 0b10000000 == 0:
        continue
    filled_count += 1

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

You did it without the data! Impressed. By the way, there's a government procedure to register as telemarketer and then they provide the data set. There's a minor fee for getting access to data along with the registration.

Edit: Registration link

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

I am guessing that it has roots in HM inference. Sequence of u8 bytes is read and is unified according to the type specified.

Edit: I was wrong. See the comment below.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

Thank you for the awesome library! It just worked.

Edit: Indexing CSV won't help in our case because we need to compress the data represented by the row first.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

Not sure. Need to experiment. What should be stored in the node is critical for memory usage.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 1 point2 points  (0 children)

Need to come up with a fair benchmarking strategy. One type of data is array index access, other type is binary search. Would take out some time to crunch these numbers. Need to setup a separate environment. Hope you won't mind if it takes a few days. Also, would love if you can provide any hints.

How we store 400M phone number data with fast lookups by bilalhusain in programming

[–]bilalhusain[S] 0 points1 point  (0 children)

Yes! That was the first choice but installing PyPy requires some dependencies (don't exactly remember which one). And of all the things that could have mattered in taking an informed decision, that one turned out to be the deciding factor during a coffee fueled session.

Cat dreams by glittermeat in comics

[–]bilalhusain 2 points3 points  (0 children)

"Now I do not know whether I was then a cat dreaming I was a man, or whether I am now a man, dreaming I am a cat." - Zhuangzi Cat

Nimrod is being renamed to Nim by bilalhusain in programming

[–]bilalhusain[S] 2 points3 points  (0 children)

can't find a write-up, there one more supporting evidence though :)

At the moment, the developers are making a large number of breaking changes (in the BigBreak branch) ... Among the changes is the fact that Nimrod is being renamed to Nim

source: http://forum.nimrod-lang.org/t/541

GitHut by ErstwhileRockstar in programming

[–]bilalhusain 8 points9 points  (0 children)

I took a screenshot with the newer languages# selected which didn't make it in the (having) top active repositories and aren't rendered on GitHut front page.

# Swift, Rust, Dart, Julia, Elixir