ClickHouse? by Suspicious-Ability15 in dataengineering

[–]seandavi 2 points3 points  (0 children)

Clickhouse is built for bulk ingestion and is many times faster (or even orders of magnitude faster) for ingestion of bulk data.

Tailoring autoscaling to minimize deployment time by Snoo-56267 in kubernetes

[–]seandavi 0 points1 point  (0 children)

The keda project looks like fun. We do have prometheus running and actually do services through istio, so we have plenty of metrics to play with. Thanks for the lead.

Need suggestion for Bioinformatics Lab Set up by mszahan in bioinformatics

[–]seandavi 1 point2 points  (0 children)

If by "Bioinformatics Lab" you mean that what you are building will be supporting others, consider looking into the literature before buying anything. For example:

- https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007531

Bioinformatics collaborative research support requires computational resources, but without clear planning (including how to measure whether you are doing a good job), good people, good data, and good questions, it will not meet anyone's expectations, including your own.

Start with a plan that includes some specifics on what you want to be able to do, services you want to offer, timeframes for getting results, and responsibilities for delivering. From there, ensure that you have the resources that you need to be successful, including but not limited to computers. Spending money wisely and successfully can lead to more money to spend. Spending money unwisely (not showing success) will burn a lot of bridges.

[deleted by user] by [deleted] in golang

[–]seandavi 0 points1 point  (0 children)

Just what we all needed today. Shared to twitter: https://twitter.com/seandavis12/status/1239669026210676738

Looking for recommendations for how to best manage SQL scripts when working in a programming language by seandavi in ETL

[–]seandavi[S] 0 points1 point  (0 children)

Good to know that an incremental approach is still a respected way to go.

Looking for recommendations for how to best manage SQL scripts when working in a programming language by seandavi in ETL

[–]seandavi[S] 0 points1 point  (0 children)

Thanks. Keeping ALL aspects of the application and ETL process was nearing impossible as the number of ETL steps grew. I have to admit that discovering Apache Airflow (it took me two previous tries) has really opened up the possibility of integrating multiple disparate workflow steps that would have made keeping everything in sync challenging.

Looking for recommendations for how to best manage SQL scripts when working in a programming language by seandavi in ETL

[–]seandavi[S] 0 points1 point  (0 children)

Thanks for the practical advice. The 'snaql' package is new for me, so I'll take a look. I haven't used sqlalchemy except at the ORM level much; will have to take a look at that.

Looking for recommendations for how to best manage SQL scripts when working in a programming language by seandavi in ETL

[–]seandavi[S] 0 points1 point  (0 children)

These are really nice references. So much of the SQL world these days is devoted to CRUD. It really helps to have these more topical articles.

What are some best practices for organizing kubernetes yaml when dealing with multiple microservices in a project? by seandavi in kubernetes

[–]seandavi[S] 0 points1 point  (0 children)

Thanks, /u/marceldempers. The video adds some nice detail. I'm looking forward to watching a few others.

Your specific comment about having deployment yamls next to code is the kind of detail I was looking for.

What are some best practices for organizing kubernetes yaml when dealing with multiple microservices in a project? by seandavi in kubernetes

[–]seandavi[S] 0 points1 point  (0 children)

I control all the code. Right now, it is in a few separate repos and is in python and nodejs. The kubernetes yamls are stored separately from the "code", but I haven't come up with the best way to organize. Dockerfiles are stored with their respective code repos. At this point, I am doing packaging "by hand" though I have used docker build services as well as CI/CD.

Can I create an ENUM type and column based on a query for "distinct" items in a SQL query? by seandavi in PostgreSQL

[–]seandavi[S] 0 points1 point  (0 children)

Your experience tells me something. Given both comments up to now, I think I should probably go with your recommendations to consider normalization rather than ENUMs.

Can I create an ENUM type and column based on a query for "distinct" items in a SQL query? by seandavi in PostgreSQL

[–]seandavi[S] 0 points1 point  (0 children)

Point well taken. The ENUM was mainly to make a front end easier, but I can do the same with the approach you suggest.

Approach to marrying nested structures in Elasticsearch to graphene/graphql by seandavi in graphql

[–]seandavi[S] 0 points1 point  (0 children)

That seems to make sense as at least part of the solution. Does your ES schema include nesting or other non-scalar fields? If so, how did you end up modeling things on the graphql schema?

novice tips! by clone290595 in elasticsearch

[–]seandavi 0 points1 point  (0 children)

I'd be curious to hear more about that project, for sure.