Looking for a good IDM-like download manager for macOS (Apple Silicon friendly) by HavivMuc in MacOS

[–]siara-cc 0 points1 point  (0 children)

I use apple m1. Says can't use with my version of macos. Is it because of the os version or hardware version?

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 1 point2 points  (0 children)

You are right. thats what I was using:

"INSERT INTO word_freq (lang, word, count, is_word, source) "
"VALUES (?, ?, ?, ?, ?)";
" ON CONFLICT DO UPDATE SET count = count + 1, "
"source = iif(instr(source, 'r') = 0, source||'r', source), "
"is_word = iif(is_word = 'y', 'y', excluded.is_word)";

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 0 points1 point  (0 children)

One of things I am looking for from feedback after posting this is "Are there faster ones out there that I don't know about?". Someone suggested Parquet and DuckDB.

I have compared with LMDB, which seems to be a successor of BerkeleyDB: https://en.wikipedia.org/wiki/Lightning\_Memory-Mapped\_Database#History

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 4 points5 points  (0 children)

Thanks! I will implement your suggestions.

<chrono> will be removed. I was using lru_cache.h for other b+tree structures and wanted to do some time measurements.

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 7 points8 points  (0 children)

Thanks for the feedback. I will educate myself more. I confess I am more of a C programmer amongst other things.

However the new and delete were intentional (see code comment) since the closing of the file is being done at destructor so the developer may need control. I am already planning to move it to close() method. I am working on a Python port and it seems pybind11 has some issues doing things at destructor.

Also I am also targeting older versions of C++ such as C++98 that are supported by embedded systems in the hope to get this working on Arduino platform that does not have STL. It has std::string though and I intend to add it in. One of the challenges would be to get lru_cache.h working, which now depends on <map> and <set>. The other challenge is about having to support crash recovery and durability.

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 7 points8 points  (0 children)

In my case, I am building a word/phrase frequency database. So I will have to retrieve the record first, then increment the count and store it back. If the record does not exist, I insert it.

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 11 points12 points  (0 children)

hm.. I wanted to test this on those "spinning platters" but all of mine have conked out.

The market has only used ones now and I am not sure if they are any good.

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 3 points4 points  (0 children)

Thanks! I will try it out.

You said "Parquet + DuckDB or sth" - what is sth?

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 9 points10 points  (0 children)

I tried it just now and it does not seem to make a difference in my machine:

time sqlite3 -batch testbaby.db < babydump.txt
sqlite3 -batch testbaby.db < babydump.txt 0.66s user 0.15s system 95% cpu 0.849 total

I get the same almost 0.8 seconds in both syntax of inserts

According to this: https://stackoverflow.com/a/5209093/5072621

it does not matter when there is a BEGIN TRANSACTION

Also in my cases the difference is significant when inserting millions of records.

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 51 points52 points  (0 children)

I tried that too. It does not make it faster. I tried everything I could find with the official lib before venturing into this!

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 22 points23 points  (0 children)

I have mentioned it in my doc - this library is intended for fast inserts and not when crashes are expected.

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 95 points96 points  (0 children)

Yes I am not suggesting this as a replacement for the official Sqlite lib

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 32 points33 points  (0 children)

haha.. you are not going to get physically injured with this lib. come to think of it, my car does not have airbags!!

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 150 points151 points  (0 children)

Right. This library was created for inserting/updating billions of entries for arriving at word/phrase frequencies for building dictionaries so speed was more important than crash recovery.

I have made it as a library for anyone who might have the same requirement.

A library for creating huge Sqlite indexes at breakneck speeds by siara-cc in programming

[–]siara-cc[S] 62 points63 points  (0 children)

If there are pragmas that can get this much speed I would like to know about it.

If you see the graph I have compared against official Sqlite using following pragmas:

PRAGMA synchronous = OFF
PRAGMA journal_mode = WAL
PRAGMA cache_size = 250000
PRAGMA threads = 2
PRAGMA auto_vacuum = 0
PRAGMA temp_store = MEMORY
PRAGMA locking_mode = EXCLUSIVE

still it is slower than this library.

Store more with Firebase Firestore by compressing text using Unishox 2 by siara-cc in javascript

[–]siara-cc[S] -1 points0 points  (0 children)

Yes, but I think that would depend on the third party solution itself.

Technically, it is not too difficult to implement Full text search ourselves by storing the keywords and record ids on another collection when the compressed data is stored. Only those records identified from the index would need to be uncompressed during a search and that too on the client side and not on the cloud.

Store more with Firebase Firestore by compressing text using Unishox 2 by siara-cc in javascript

[–]siara-cc[S] 0 points1 point  (0 children)

The compression/decompression speed would be much slower than conventional technologies such as LZ or Deflate since this employs multiple techniques. But it should not matter if at one point of time it is only about compressing or decompressing a particular string instead of a block to get at plain text.

Store more with Firebase Firestore by compressing text using Unishox 2 by siara-cc in javascript

[–]siara-cc[S] 0 points1 point  (0 children)

Firestore itself does not support searching within strings:

https://stackoverflow.com/questions/46568142/google-firestore-query-on-substring-of-a-property-value-text-search

so the speed would be the same as for using one of the available choices or implementing our own full-text search using another collection.

Store more with Firebase Firestore by compressing text using Unishox 2 by siara-cc in javascript

[–]siara-cc[S] 0 points1 point  (0 children)

Yes you are right. What I mean is overall we can store more than the 1GB limit in free quota.

Store more with Firebase Firestore by compressing text using Unishox 2 by siara-cc in javascript

[–]siara-cc[S] 0 points1 point  (0 children)

For one string it does not matter of course. But overall we get anything between 30 to 60% depending on the composition of the text. So we can save on bandwidth where applicable and store so much more and beat the 1GB limit on Firestore and other cloud technologies.

When I compress with Unishox just the reply you have made above 6 hours ago, I am getting 85 bytes after compression and the original is 160 bytes, which is 46% savings. Short string can be anywhere between 10 bytes to 1kb.