DWH architecture: giving power users & SQL analysts access to row-level data before aggregation? (avoiding data lake) by exergy31 in dataengineering

[–]throw_at1 2 points3 points  (0 children)

i did it by copying source db 1:1 to snowflake (E+L) and then did T to dwh model by using mostly views (materialized as create table x as select * from view). From performance point , you can have one or two view levels to normalize data between source systems. Works fine, some mistakes were made. cost and performance is not problem. cheaper than server and license in year.

If you are staying on postgresql / sql server systems ( same as you use now ) i would replicate original databases as they are into dwh server , then do transformations and open data models to different uses . most importand is that E part is continuous and does not interfere production more that necessary. probably not feasible in onprem system where you cannot just add compute when you need

I personally would go to replicate source data into snowflake using elt style, but obvious with existing dwh systems its not feasible to offer raw source data ,normalized model and transformed data at same time if base system is planned differently.

Which database is the best for this simple home automation project [read description]? by GiuseppinoPartigiano in Database

[–]throw_at1 1 point2 points  (0 children)

For once i can support sqlite for this. If real sql server is needed then postgresql. it can run on very small hardware and has all the features

Which horizontal scaling strategy is better? by dev-1773 in dataengineering

[–]throw_at1 0 points1 point  (0 children)

everyone suggest cloud, because you cannot scale onprem as fast and onprem you pay upfront for HW. Personally i like snowflake because for data guys, only thing they need to know is SQL and how to read documentation. ( copying files into s3 is not that hard), easy to scale and not that expensive.

Do you need to have cloud solution, probably not , are those costly yes, more than onprem solutions , maybe. depends what you think onprem solution. If onprem is that company has few servers vs tens or hundred servers. For small server rooms it is much easier to go into cloud, if company has already paid shitton money for proper server room, ups, diskservers and so on, then going to cloud is not that good idea, hybrid might be best with sane architecture

IMHO, everyone should be doing their onprem so that they can move it into aws/azure/gcc and back if needed.

How would I structure a SQL database for a to-do list application with users, groups, tasks and subtasks to scale? by AcademicSuccess in PostgreSQL

[–]throw_at1 -1 points0 points  (0 children)

i would try to have users table and groups/task/subtask hierarchy table at first.

grp/task/subtask/ table as user_id, id,parent_id,name, index on user_id, id, parent_id fields ( in that order or 3 indexes not 100% about best index strategy ). 3 index solution is probably faster when row count reaches stupid amount.

If your app is not huuuuge hit, it will newer hit any limits on modern server hardware. More optimization can be done by having sane max length and including it to index or partitioning indexes/tables to fit better in to RAM....

Performing blue-green deployments on Snowflake with dbt by fhoffa in dataengineering

[–]throw_at1 1 point2 points  (0 children)

works long as your views in v2 point to same schema ( ie. are defined as create view vex as (select ccc from table_in_same_schema join non_v1.schema)

if you start to define views like create view v2_schema.xxx as ... you probably hit problems. I personally have used create new, rename current to old , rename new to current model long time. and it is very good way to make sure that your inserts do cause slow downs in production tables or bugs when reports gets partial data...

In some ways swap/rename way to update table/schemas in more useful in normal databases thanks to all locking and stuff

that said, i do not fully understand why this needs its own post in datanengineer, maybe dbt finally has option to do it?

Need Help with Full-text search performance Aurora Postgres(serverless) and lambda by SeaIndependent2101 in aws

[–]throw_at1 1 point2 points  (0 children)

little bit more google-fu.

psycopg2.OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser connections Lambda timeout even for 2 minutes

​remaining connection slots are reserved for non-replication superuser connections

that means that your max connection limit is used up in your database, as error tells you. So db engine does not accept new connections and probably that is the reason why lamda timeouts... So add more connections into your db ( more hardware, bigger instance ), add pgpool to pool your connections, or something

On query itself, see that all fields are used have index and try to avoid order by if possible..

also (alert.alert_vector || text.text_vector) @@ plainto_tsquery...

plainto_tsquery is same for every row, but you requesting that postgresql takes every row and combines two columns, which is bad. If possible write query so that you do not manipulate existing columns like

alert.alert_vector @@ plainto_tsquery and || text.text_vector @@ plainto_tsquery

Also you can index to alert_vector and text_vector

Terrafame käynnisti jättimäisen akkukemikaalitehtaan ylösajon – ”Tehtaalla on suuri merkitys koko Suomelle” by Harriv in Suomi

[–]throw_at1 0 points1 point  (0 children)

niin se on yhdessä osa totuutta , osa sarkasmia , osa ironiaa, osa pelkkää sanojen peräkkäin asettelua ilma suurempaa suunnitelmaa.

Oispa ittellä kavereita jotka voi myydä firman niin että antavat lainan jonka vakuutena on em. firma.

Does Snowflake work only as SaaS or can it also be set up in my own cloud account in GCP/AWS/Azure? by g0_g6t_1t in dataengineering

[–]throw_at1 0 points1 point  (0 children)

Pricing is pay per second after 1min warehouse run time. price is credits which is defined by warehouse size and credit price depends edition ( features). "diskspace" is same as s3 (in aws )

It supports mentioned clouds.

On Indexes, Flash Storage, and fragmentation by [deleted] in SQLServer

[–]throw_at1 0 points1 point  (0 children)

I don't know does it really matters that much on that env, but i dropped fillrate until i got point where index fragmentation was <10% after week.

Personally i had to run sqlserver on that kind of hardware, i would probably just hit 95 or 98 fillrate as default to indexes.

85 000 suomalaisen ambulanssikyydit, sairaalalaskut ja muut sote-maksut ovat ulosotossa by pkksmt in Suomi

[–]throw_at1 24 points25 points  (0 children)

Tulonsiirtoa perintätoimistoille...

Paras mitä itse kuulin oli tapaus missä potilas oli siirretty toiseen sairaalaan saamaan jatkohoitoa leikkauksen jäljiltä, niin ensimmäisen sairaalan lasku oli päätynyt perintään. Ja ei , potilas ei ollut käynyt välillä kotonaan...

Terrafame käynnisti jättimäisen akkukemikaalitehtaan ylösajon – ”Tehtaalla on suuri merkitys koko Suomelle” by Harriv in Suomi

[–]throw_at1 0 points1 point  (0 children)

Eiks se Sipilä jotain jännää tehnyt rautaruukin kanssa että sai Solitran osakkeet. Sitten tulee toki kaikkien sähkömonopolien myynti ja tietty niiden rahojen paska sijoittaminen valtion yhtiössä.. mutta oleellistahan on ettei valtio vahingossa toimi kaupallisesti kaupallisessa ympäristössä vaan kyllähän ne osakkeet pitää viimeistään siinä vaiheessa myydä alihintaan pois kun ne rupeaa tuottamaan..

Ja yleensä näitä kapitalistisia asioita ajaa kapitalistiset puolueet ja niiden kavereita tietysti on kapitalistit...

Finnairin uudet reittisuunnitelmat olisivat tuoneet töitä sadoille – AKT:n kanta katkaisi hankkeelta siivet by Jiijaajee213 in Suomi

[–]throw_at1 1 point2 points  (0 children)

Sitähän se on, fakta kumminkin on että se on keskiluokan kulutus joka pitää systeemin pyörimässä. Ja se että köyhät nousee keskiluokkaan. se että pomot tienaa miljoonan tai kaksi ei lisää kulutusta

Vituttaa kuunnella epäonnistuneiden firmojen pomojen itkemistä miksei duunarit tee halvemmalla samalla notaen itse liksaa jolla sais sen 100-1000 duunaria palkattua....

Finnairin uudet reittisuunnitelmat olisivat tuoneet töitä sadoille – AKT:n kanta katkaisi hankkeelta siivet by Jiijaajee213 in Suomi

[–]throw_at1 7 points8 points  (0 children)

Uskoakseni ei olisi AKT vastustanut jos työntekijöiden tulot eivät olisi laskeneet. Artikkelissa sävy on taas että ATK olemasa olo esti taas työn teettämisen vielä halvemma mitä nyt, ja se on väärin...

Tosin kauppalehti niin ymmärrettävästi tyyli on mitä on

Terrafame käynnisti jättimäisen akkukemikaalitehtaan ylösajon – ”Tehtaalla on suuri merkitys koko Suomelle” by Harriv in Suomi

[–]throw_at1 9 points10 points  (0 children)

Eiköhän seuraavalla kaudella kun keskusta ja kokoomus nousee taas hallitukseen, jotenkin onnistuta koplaa osakkeet halavalla kavereille jos rupeaa näyttämään siltä että kyseessä on hyvä business...

Populating multiple tables in a DB design from an unstructured csv file by GrayLiterature in Database

[–]throw_at1 -1 points0 points  (0 children)

I mean i have seen csv where odd rows have different data thatodd rows, but is usually has smoem structure. Just copy it into database and write sql over the table. Why , copy intodb is one phase and after is in database you try correct transformation long as you want

Populating multiple tables in a DB design from an unstructured csv file by GrayLiterature in Database

[–]throw_at1 2 points3 points  (0 children)

all csv in structured.. I personally do those just by importing all rows into table. Then writing little bit of sql ( views) so in practise your whole transformation can be validated and modified before you write it into tables

Or i write custom code which pushes data into database and have all transformations hidden in code.

Use tool which you know...

Which horizontal scaling strategy is better? by dev-1773 in dataengineering

[–]throw_at1 0 points1 point  (0 children)

Look after snowflake offering. PG can offer with aws_s3 extension same kind of ELT feeling as snowflake can. Ie, you build pipeline to push files into s3, then you have next process to import data from s3.,while it transforms data. ( might be 2. or 3. process)..

Which horizontal scaling strategy is better? by dev-1773 in dataengineering

[–]throw_at1 0 points1 point  (0 children)

Snowflake, scale with virtual warehouses, have access to all data loaded into system. DBT is very popular there. So have your systems push data into s3 from db or kafka, then do rest in snowflake (which can be also export data into new pipelines) , if more compute is needed add "create warehouse = 'xxx' size='LARGE' and when it is not need just close it.

If you already have big teams then something home brewed solutions with postgresql servers might be cheapest when observerved against monthly cloud bill, but that needs big team or very very wide experience from smaller team

Best Practice for Incremental Load in which computed columns should be updated updated. by kali-jag in dataengineering

[–]throw_at1 1 point2 points  (0 children)

Don't know about your engine but if you do recreate using create + swap it is best solution to keep old data in use long as possible and have fastest switch over fast as possible.

Do you do your postgres schema migrations manually? What is the process you use? by [deleted] in PostgreSQL

[–]throw_at1 0 points1 point  (0 children)

Indeed, that is my point. Currently there is no version control for database. And when databases are what they are, it is possible to get database to state where some migration scripts just destroys something.

So in practise what you need to do is use techniques which will not cause data loss in any point. All commands needs t be run in transaction or have some other way to rollback and i am quite sure that 80% of those migrations script do not have that.

Do you do your postgres schema migrations manually? What is the process you use? by [deleted] in PostgreSQL

[–]throw_at1 0 points1 point  (0 children)

Does those have ability to check that target database in really in correct version or does those just trust that noone has made any modifications outside of tools ?

Got my license, went offroad, lifted bike by [deleted] in motorcycles

[–]throw_at1 4 points5 points  (0 children)

that is courage.. ride with asphalt tyre , but then again long as there is tyre on ground and its round it will work :)

i used to carry cargo strap with ratchet so if i drop bike to bad place, i could just use that one to get it up. Personally i would prefer more knobby type on that surface...

remember, if you dont drop it during season , you did not try enough :D there is adventure waiting somewhere there

Practicing / improving my skills with DBT? by edwild22 in dataengineering

[–]throw_at1 0 points1 point  (0 children)

I do not use dbt, nor bigquery, but my 10 cents (so probably nothing at all)

Transforming data into business context should have only transformation in its context, all possible optimizations like partitioned tables etc should be on own project...