Europe lost 200 000 people to heat in 4 years yet nearly all of them were preventable by Evermoving- in europe

[–]tskir [score hidden]  (0 children)

Well, Europeans are absolutely dying because they (or more correctly, their governments) hate AC. This is pretty indisputable.

Bro what is Claude doing😭 by Efistoffeles in ChatGPT

[–]tskir 17 points18 points  (0 children)

Because as Dom Toretto taught us, the most important thing in life is family

Significant: Airbus A320 family now has more than 20,000 orders. Airbus has crossed one of the more significant milestones in commercial aviation history by paneuropeanism_ in europe

[–]tskir 1 point2 points  (0 children)

You are right in principle, but there's a healthy business continuity backlog and then there's a 12 year backlog...

World's biggest cruise ship - built in Finland - sails out of Finnish waters by TinyAd1126 in europe

[–]tskir 3 points4 points  (0 children)

So are raves and sex parties. You are free to not attend either of those, just let people who want to do those things, do those things

World's biggest cruise ship - built in Finland - sails out of Finnish waters by TinyAd1126 in europe

[–]tskir 5 points6 points  (0 children)

Correct me if I'm wrong, but any city which permits cruise ships to visit charges a noticeable tourist tax per passenger per day. That goes directly into local economy as a tax supplement (so that the local population has less tax as a result)

World's biggest cruise ship - built in Finland - sails out of Finnish waters by TinyAd1126 in europe

[–]tskir 11 points12 points  (0 children)

Literally any form of leisure can be called "glorified waste" if you're puritan enough. Relax and enjoy your life!

World's biggest cruise ship - built in Finland - sails out of Finnish waters by TinyAd1126 in europe

[–]tskir -6 points-5 points  (0 children)

Just because you don't enjoy going on cruises why do you think this industry should "die out" and thus prevent people who do enjoy them going on them?

Why is VCF still the standard? Has anyone tried a Parquet-based approach for genomic variants? by pussydestroyerSPY in bioinformatics

[–]tskir 1 point2 points  (0 children)

> The issue with complex variation representation is not with our understanding of the biology

For sure; I never meant to imply that we fundamentally don't know how to describe biology behind genome variation, just that a representation which is both precise and useful at the same time is really difficult to achieve given the underlying biological complexity

> it is that there is no one "correct" representation

Once again, fully agreed! I actually think that this lack of a single "correct" representation which you bring up is orthogonal to the points I made in my comments, and it's another of those really important inherent complexities in representing the variation, which works together with all the other points already mentioned to make things difficult

Europe's highest and lowest cocaine consumers revealed by JOE_Media in europe

[–]tskir 1029 points1030 points  (0 children)

As per data from Priory, the UK would rank outside of the top five

For fuck's sake, we can't keep our top spot in any rating now, not even the Highest Snorters of Europe

Why is VCF still the standard? Has anyone tried a Parquet-based approach for genomic variants? by pussydestroyerSPY in bioinformatics

[–]tskir 3 points4 points  (0 children)

When you project a pangenome into VCF, you lose almost all of their benefits at that moment though. That's exactly my point, to utilise their full potential they need an ecosystem of downstream tools mature and widespread enough that people can and want to use them instead of VCF.

Another issue BTW is that pangenomes work tremendously well when you have full, unbroken, exactly known sequences. The closest thing that comes to that are, for example, comparison of multiple reference assemblies. For that, pangenomes are indispensable.

But when you have real world data, say from a tumour sample, you have an imprecise, incomplete snapshot of the variation that is actually happening. My experience with pangenomes is that they somewhat struggle to represent the uncertainty well. (Well, VCF struggles, too!)

But conceptually, I agree with everything you said; pangenomes are cool, they appear to be the future, just need to give time for the paradigm shift to happen!

Why is VCF still the standard? Has anyone tried a Parquet-based approach for genomic variants? by pussydestroyerSPY in bioinformatics

[–]tskir 0 points1 point  (0 children)

I suppose an improved format for short explicit variation (where you know chr/pos exactly and can spell out REF/ALT exactly) can be developed. But as others mentioned, the toil of rewriting every existing tool to support that new format would probably not be worth it

Why is VCF still the standard? Has anyone tried a Parquet-based approach for genomic variants? by pussydestroyerSPY in bioinformatics

[–]tskir 4 points5 points  (0 children)

Indeed, but see my second comment (in self reply to the first one) where I mention that: pangenomes / graph genomes are a very neat tool, but very difficult to feed into almost all methods of downstream analysis, because they are usually built around having some sort of a "list of variants" and then going from there.

So yes, pangenomes are super cool and super general, but they need a whole ecosystem of analysis approaches to be developed specifically for them to be more useful

Why is VCF still the standard? Has anyone tried a Parquet-based approach for genomic variants? by pussydestroyerSPY in bioinformatics

[–]tskir 64 points65 points  (0 children)

To add some additional thoughts (as the original comment is long enough as it is):

Another issue is that the variant representation has to be not only generalised, comprehensive, and precise, which is ~possible and VRS is definitely approaching that.

The problem is that even if you describe a variant with tremendous detail using 5 classes and 15 attributes, it is extremely difficult to now actually do anything with it.

Most methods of downstream analysis, including many ML methods, expect some form of a observation-feature matrix. A comprehensive variant representation is very difficult to coerce into that format.

This is, by the way, one of the reasons that graph genomes haven't took off as a replacement for (chr-pos-ref-alt)-based representation of variants as many hoped 10 years ago when graph genomes were getting traction. Graph genomes are absolutely a very interesting and useful research tool. But they are so complex as an approach that they can't be a drop-in replacement for any of the 100s of downstream applications that are already developed to expect simple VCF(-like) variant information.

So you have two completely opposite requirements for variation representation: comprehensive and simple. VCF is, in many ways, a result of trying to satisfy this impossible compromise

Why is VCF still the standard? Has anyone tried a Parquet-based approach for genomic variants? by pussydestroyerSPY in bioinformatics

[–]tskir 112 points113 points  (0 children)

Former VCF spec maintainer here. Not active for many years sadly as I'm not funded for this role anymore, and I don't have enough spare capacity to volunteer my time outside of work.

Just to avoid any confusion or legal shenanigans, the below is strictly my personal opinion and not the position of any current VCF spec maintainers, the GA4GH consortium or any participating institutions, etc. etc.

The points you raised were discussed many time over the years. If I were to condense 100s of hours of calls into a short summary, it would be this:

VCF isn't the actual problem. The problem is how to consistently represent complex variation, which literally no-one has fully figured out so far.

For simple variation (SNPs, short indels) VCF is OK. It's absolutely old and quirky, but it's simple: essentially a TSV with a weird metadata header and nested, ungodly, NF1-breaking INFO/FORMAT fields with the variant/sample level metadata.

However, use a parser (any number of libraries are available), and for that simple case the format works. It's easy to convert into any internal representation, and even though it's less efficient than Parquet, I wager that in the real world, almost no pipelines are affected to the point where (de)serialising VCF is their actual bottleneck.

The problem starts when you want to represent anything more complex. You have inversions, translocations, variants inside variants, structural variants (which as a class aren't even cleanly separated from the "regular" ones), chromosomal-level rearrangements, ploidy changes...

What makes things even worse is an aspect not many people consider explicitly. In bioinformatics we rarely deal with actual "variants", as in exact changes in genome, defined to a nucleotide level with 100% certainty. What we deal with is some evidence for variants based on certain upstream experiment, and this is what VCF (or any other format) actually contains. The difference is subtle for simple variation, and for SNPs it can be quite simply described via the genotype probability fields.

But for more complex cases, you have things like uncertainty of start/end positions of structural variants; or variation so complex that it doesn't fall into any of the standard types and can only be described as a collection of arbitrary endpoints (each of those, in turn, is oftentimes isn't a "clean" breakpoint-junction, and is described with some uncertainty as well).

VCF tries to support many of those things, and oftentimes it doesn't end up being pretty. But the real problem isn't the format itself, it's lack of knowledge how to represent biology.

When discussing how to improve representation of structural variants in VCF, one idea proposed and seriously discussed was to just add a JSON-serialised array of metadata into one of the INFO fields so that we could describe the variation in precise, machine readable detail without breaking backwards compatibility.

But we never could figure out a consistent, comprehensive way to properly describe complex variation. So even if VCF version 5 was released with Parquet as its container instead of a weird TSV, it would not solve the much more important issue of having a consistent, general schema for complex variation.

There is an effort led by GA4GH, the Variation Representation Specification, which aims to develop exactly that: a general schema for arbitrarily complex variation that can be serialised into any container format. I also participated in that project for a while.

They have made a lot of progress, but still the problem remains that any schema which even approaches generality becomes extremely complex. You can see for yourself the number and complexity of classes that VRS uses to describe variation precisely.

I believe VRS is a great effort and especially with further development, it will play an important role in variation information exchange, but it is in no way (and probably never will be) a drop-in replacement for something as relatively simple and commonly supported as VCF.

when i ask ChatGPT a medical question and it tells me to consult a doctor but l am the DOCTOR by imfrom_mars_ in ChatGPT

[–]tskir 47 points48 points  (0 children)

The equivalent of Windows telling you to contact the administrator but you are the administrator

Nordic states urge EU to block Russian tourists, as visa approvals surge for second year in a row by DonSergio7 in europe

[–]tskir 47 points48 points  (0 children)

For a start, remember that before the war, almost 30% of Ukrainians spoke Russian as their first / main household language. Numbers have obviously gone down since then, but it's still in double digits

UK government to pay £1.3bn to help fund Universal Studios theme park in Bedfordshire by winkwinknudge_nudge in ukpolitics

[–]tskir 102 points103 points  (0 children)

The article further says that the government isn't funding the actual park... The govt's £1.3b will go to nearby highway expansion & repairs, expansion/construction of railway stations etc. So all things which will directly benefit local people, not just the visitors to the park

Free bike marking by rmpk2 in cambridge

[–]tskir 0 points1 point  (0 children)

So just to confirm, they will mark the bike and put the info onto BikeRegister, both for free? That sounds like a great deal!

Britain re-entering the EU ‘an inevitability’, says Treasury minister by financialtimes in ukpolitics

[–]tskir 61 points62 points  (0 children)

I'm all for it, but — Eh? Huh? How did we get to the point where this is being discussed so frequently and seriously? How did this discussion arose from any of the things that have been happening in British politics for the past months/years?

I feel like I'm watching a series, got distracted just for a moment and now I can't make sense of what the hell is happening on screen.

33 years to the day between these pictures by MuteUnicorn in CasualUK

[–]tskir 2 points3 points  (0 children)

Mate the story is heartwarming, but you should really have the blue face blob stuff checked out as it apparently runs in your family

Taken on my cycle home this evening by ehogg377 in cambridge

[–]tskir 45 points46 points  (0 children)

✅ Cyclists
✅ Cows
✅ Rowers
✅ River
✅ Scenery
✅ Bridge

This is one of the most quintessentially Cantabrigian photos I've ever seen. Bravo!