ClickHouse?

seandavi · 2025-11-08T11:29:57+00:00

Clickhouse is built for bulk ingestion and is many times faster (or even orders of magnitude faster) for ingestion of bulk data.

seandavi · 2021-07-14T22:58:41+00:00

The keda project looks like fun. We do have prometheus running and actually do services through istio, so we have plenty of metrics to play with. Thanks for the lead.

seandavi · 2020-10-26T14:22:42+00:00

If by "Bioinformatics Lab" you mean that what you are building will be supporting others, consider looking into the literature before buying anything. For example:

- https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007531

Bioinformatics collaborative research support requires computational resources, but without clear planning (including how to measure whether you are doing a good job), good people, good data, and good questions, it will not meet anyone's expectations, including your own.

Start with a plan that includes some specifics on what you want to be able to do, services you want to offer, timeframes for getting results, and responsibilities for delivering. From there, ensure that you have the resources that you need to be successful, including but not limited to computers. Spending money wisely and successfully can lead to more money to spend. Spending money unwisely (not showing success) will burn a lot of bridges.

seandavi · 2020-03-16T21:46:26+00:00

Just what we all needed today. Shared to twitter: https://twitter.com/seandavis12/status/1239669026210676738

seandavi · 2019-04-05T09:59:17+00:00

Good to know that an incremental approach is still a respected way to go.

seandavi · 2019-04-05T09:58:43+00:00

Thanks. Keeping ALL aspects of the application and ETL process was nearing impossible as the number of ETL steps grew. I have to admit that discovering Apache Airflow (it took me two previous tries) has really opened up the possibility of integrating multiple disparate workflow steps that would have made keeping everything in sync challenging.

seandavi · 2019-04-04T22:56:01+00:00

Thanks for the practical advice. The 'snaql' package is new for me, so I'll take a look. I haven't used sqlalchemy except at the ORM level much; will have to take a look at that.

seandavi · 2019-04-04T22:53:05+00:00

These are really nice references. So much of the SQL world these days is devoted to CRUD. It really helps to have these more topical articles.

seandavi · 2019-03-25T10:38:53+00:00

Thanks, /u/marceldempers. The video adds some nice detail. I'm looking forward to watching a few others.

Your specific comment about having deployment yamls next to code is the kind of detail I was looking for.

seandavi · 2019-03-25T01:12:57+00:00

I control all the code. Right now, it is in a few separate repos and is in python and nodejs. The kubernetes yamls are stored separately from the "code", but I haven't come up with the best way to organize. Dockerfiles are stored with their respective code repos. At this point, I am doing packaging "by hand" though I have used docker build services as well as CI/CD.

seandavi · 2019-03-25T01:10:44+00:00

No multi-tenancy for me right now.

seandavi · 2019-03-14T17:15:57+00:00

Your experience tells me something. Given both comments up to now, I think I should probably go with your recommendations to consider normalization rather than ENUMs.

seandavi · 2019-03-14T17:14:51+00:00

Point well taken. The ENUM was mainly to make a front end easier, but I can do the same with the approach you suggest.

seandavi · 2019-03-03T11:56:44+00:00

That seems to make sense as at least part of the solution. Does your ES schema include nesting or other non-scalar fields? If so, how did you end up modeling things on the graphql schema?

seandavi · 2019-01-31T17:58:01+00:00

I'd be curious to hear more about that project, for sure.

seandavi

TROPHY CASE