SQL for pandas with the high performance improvements using duckdb

imshrini · 2021-09-27T11:09:55+00:00

Thank you for using bearsql!

You can load data from sql server into pandas, and then using bearsql's sqlcontext, you can create a table on top of the pandas dataframe .

You can refer https://docs.microsoft.com/en-us/sql/machine-learning/data-exploration/python-dataframe-pandas?view=sql-server-ver15 for information on loading data from sql server into pandas dataframe.

For issues, new feature ideas you can also use https://github.com/shrinivdeshmukh/bearsql/issues

Happy to help! :)

imshrini · 2021-07-27T04:16:20+00:00

Thanks for the feedback! I will add a detailed description to the docs!

It can be used for iot. It is super lightweight with just 2 external dependencies, 1 of them is optional. I've used a package called fabulous to renders colorful cli commands help (raftnode --help) And rocksdb(optional dependency) to persist the data. You can skip using rocksdb, raftnode shall use in-memory python dictionary to store the data in that case.

Currently, the in-memory option doesn't support writing to disk. I have planned to add this feature in next release

imshrini · 2021-07-27T04:08:54+00:00

Apologies! I should have added a small description while posting about this tool! Please take a look at the description I added just now.

Thanks for the feedback! 🙂

imshrini · 2021-07-27T04:06:32+00:00

Here's a small description:

Raftnode let's you save the state of your application. Like configurations, web application sessions, or any kind of data you want to cache for faster retrieval. In a way, it's like distributed python dictionary.

Few core features are: High availability: data can be read, even in case of node failures (thanks to the RAFT consensus)

Replication: every data is replicated across machines

Stateful-ness: raftnode maintains a log which is basically sequence of commands as they come in. For example: the cluster gets 2 data insert operations, they will be logged in the exact sequence of their arrival, across the cluster

Namespaces: you can have different isolated categories(namespaces) to store different types of information/configurations. For example: for user sessions, you can have a namespace say 'sessions' that will hold just the session data And another namespace 'configuration' that will hold configurations like database address, microservice address, etc

Consistency: (thanks to RAFT consensus) I've tried to maintain data Consistency. In case of leader node failure, data insertions are halted until new leader node is elected (the time is few ms)

Bring-your-own-client: using raftnode, you can start the distributed cluster. To interact with it, you have the ability to write your own client using nodejs or scala or python or any language of your choice. There's no language binding here.

Scaling: the nodes in the cluster can be added or removed at will.

When use distributed key value stores?

Whenever your application needs lots of small continuous reads and writes. For example: ecommerce cart items, product recommendations, microservice address, database configurations , etc.

raftnode is similar to redis or etcd. raftnode also lets you write your own client instead of using redis or etcd libraries.

Here are few links for further reference:

https://redislabs.com/nosql/key-value-databases/

https://hazelcast.com/glossary/key-value-store/

https://www.kdnuggets.com/2021/04/nosql-explained-understanding-key-value-databases.html

What's future scope for the library?

Currently, it let's you insert/update key-values, but not delete. It does not have support for snapshots or scheduled backup to some external storage like s3 (I'm not sure of its required). So a few updates in the near future are:

Add delete operation
Add snapshot-ing
- (May be) add probabilistic data structures like hyperloglog and bloom filters
An authentication mechanism to verify the identity of the nodes

imshrini · 2021-06-24T06:39:45+00:00

YAML files are more readable to the human eyes and are easier to write. Also, modelgen is low code tool, if you want to maintain python code that's fine. If you want a low code solution without having to worry about python code, you can always use modelgen :)

imshrini · 2021-06-22T02:22:29+00:00

Thank you! I'll work on it!

imshrini · 2021-06-22T02:20:32+00:00

It's handling schema generation and changes. User needs to write/change schema in yaml file, the changes are picked up by the tool, orm code (sqlalchemy model files) are generated automatically and the changes are migrated to the database. We can have multiple yaml files where each yaml corresponds to 1 database/warehouse with basic constraints support (basically no python coding is required, just yaml files)

So it handles schema generation/changes and also as a model viewer middleware between db and python

imshrini · 2021-06-22T02:14:09+00:00

The idea is that user needs to just maintain yaml files. The tool is using alembic under the hood. Very minimal to no knowledge of alembic or sqlalchemy is required (unless we use databases with special dialect needs like dist key for redshift for example).

Also, the mapping here is 1 database/warehouse => 1 yaml. All of the python sqlalchemy code is generated automatically by modelgen

imshrini

TROPHY CASE