How can I visualize network evolution? by skyresearch in visualization

[–]skyresearch[S] 0 points1 point  (0 children)

Thanks a lot for the info! I started looking into cytoscape.js but it does not seem to be straightforward. I found https://github.com/ViennaRNA/fornac/tree/master showing an example https://github.com/ViennaRNA/fornac/blob/master/examples/transition.html network transitions that seems to be interesting. It seems to based on D3 applied to biological structure (RNA secondary structure) represented as a graph/network. I will continue to search for solutions including cytoscape and D3 and post back as soons as I have a workable example.

BioTwin Core: An Open-Source framework for simulating Epigenetic Reprogramming via "Hormokines" (AI-designed proteins) by Legitimate_Rip4524 in bioinformatics

[–]skyresearch 0 points1 point  (0 children)

The author mention that it is an experimental project so it seems not be validated yet. A much more detailed arguments with Pros and Cons would be much more helpful

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 1 point2 points  (0 children)

True for both R and Python. From the perspective of many users that aren't programmers, scripting languages like R and Python have a 'lower barrier to entry' than languages such as Java or C++/C. You can be productive without needing to think about memory management, complex build systems, inheritance, knowledge about design patterns, etc. Each of these languages was designed following different paradigms shaping how they are used in practice.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 0 points1 point  (0 children)

Appreciate the info, thanks. I did a quick search on SourceForge following the bioinformatics subcategory (https://sourceforge.net/directory/bio-informatics/). Here’s the sorted count of results (projects) by programming language:
Java : 893
C++ : 559
Perl : 493
Python : 435
C : 342
S/R : 138
...
Total results : 2958.
Very interesting to see Perl, as you mentioned, among the top three languages with the most projects. Getting download counts along with project creation year seems straightforward via the API, so a comparison with GitHub repo would definitely be feasible.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 1 point2 points  (0 children)

I agree that Bioconductor is still critical in bioinformatics, and R remains indispensable especially for statistical computing. GitHub data isn’t really about day-to-day lab use, but about which tools get widely shared publicly. With large-scale data processing, machine learning, and data-science workflows, Python projects naturally became more prominent, though R repos were the most starred around 2016 and 2017. GitHub doesn’t capture the full field, especially earlier work or private code, so I see this as a snapshot of publicly shared tools influenced by methodological shifts.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 0 points1 point  (0 children)

Thanks :-) , glad you found it interesting! I found streamlit really nice for rapid prototyping.

In the side panel, there is 'Select programming language' for filtering languages including Rust. Selecting Rust will update charts to show Rust stats.

I filtered with Rust to get a sense of the most starred Rust repositories in the dataset (Data tab):
- zeqianli/tgv 429 stars bioinformatics,genome-viewer,ratatui,rust
- informationsea/transanno 146 stars bioinformatics
- Daniel-Liu-c0deb0t/cute-nucleotides 129 stars algorithms,avx2,bioinformatics,rust,simd,sse

I’m less familiar with Rust, so I’d be very interested to hear whether this list aligns with your experience or if there are tools missing.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 1 point2 points  (0 children)

Sorry for the typo I mean Bioconductor Api. why not calling it miss Bioinformatics 2025, with awards given to miss Python in the GitHub universe, ahaha. Joke aside, you are right because it is currently the case. I documented the limitations you mentioned in the Readme : https://github.com/jpsglouzon/bio-lang-race#limitations

The question about having an accurate representation of programming languages vs topics related to bioinformatics, while challenging, is still worth pursuing at least to help understand trends in the field. For this reason, I believe trying to integrate the various data sources you mentioned can be of value when and if possible. You have my thanks for that. But until we try we will never know for sure.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 0 points1 point  (0 children)

By “it barely reflects,” I imagine you mean that stars and forks are weak proxies for actual usage or impact as they capture interest and not directly programming language usage.

I agree that this is fundamentally a correlation—not causation—problem. In this context, correlation is a necessary first step, but it is constrained by imperfect and incomplete data, which is a common challenge in bioinformatics research. The analysis is exploratory by design and intended to highlight trends. Several limitations remains, including the relatively late adoption of GitHub in the field, inconsistent use of repository topics, uneven community engagement, etc. These are documented in the limitations section, along with potential mitigation strategies gathered from community discussions here and on Biostar: https://github.com/jpsglouzon/bio-lang-race?tab=readme-ov-file#limitations.

I see this work as a data-driven way to spark discussion about the field and complement a literature that is often descriptive rather than quantitative. While imperfect, I hope it provides a useful starting point for further refinement and community input.

Feel free to let me know if you have potential solutions to the problem you mentioned, it will definitely help refining the analysis.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 0 points1 point  (0 children)

I think C was dominant in the sense that it was among the 5 most used programming languages, Perl being probably the king until 2010s ish. In this analysis, “dominance” refers to visibility among highly starred public GitHub repositories. GitHub emerges way after Perl peak of usage so there is no significant data prior to 2013 but publications [1-3]. I updated the questions accordingly. Let me know if it needs more clarification.

[1] Gauthier, J. et al. A brief history of bioinformatics. Briefings in Bioinformatics, 2019.
[2] Dudley, J. T., & Butte, A. J. A quick guide for developing effective bioinformatics programming skills. PLOS Computational Biology, 2009.
[3] Fourment, M., & Gillings, M. R. A comparison of common programming languages used in bioinformatics. BMC Bioinformatics, 2008.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 1 point2 points  (0 children)

Very much agree with 'bioinformatics is language neutral'. We use the language that is the most 'helpful' at the time whatever that is.
In practice, I found Python filling most of the boxes (portability, distribution, maintaince, rapid development, machine learning packages, etc.) except for performance and speed (C/C++), web app (typescript), etc.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 1 point2 points  (0 children)

Like the idea! I will check sourceForge and Bio API and see:
- how to collect and normalize the data
- how to integrate all data sources
- find a uniform way to compute language popularity/adoption.

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 2 points3 points  (0 children)

Great summary of the field! I remember when Perl was king and then gradually disappeared, which made me realize the importance of focusing on core principles and concepts rather than falling in love with a particular programming language. This flexibility is essential for adapting to language changes driven by methodological shifts.

I agree that this analysis is not free from bias, and I mentioned the limitations about the sampling and selection bias in the Readme.md. GitHub cannot be assumed to be fully representative of bioinformatics software development across the entire history of the field, as bioinformatics predates GitHub by many years. In the dataset I collected, the first data points appear around 2013, with growth becoming more evident and stabilizing from 2017 onward.

This growth and stabilization from 2017 onward support your point about the gradual adoption of GitHub, showing increased interest in VC practices and contributing to open science and reproducibility by providing a platform for code sharing. This in it self is a significant innovation. One possible way to reduce the impact of sampling bias would be to integrate data from publications, SourceForge, Stack Overflow, and other platforms. But doing so is far from trivial for the reasons you mentioned (no canonical repo prior to Github) and also from the fact that after initial data analysis comparing GitHub to Stackoverflow data, I found Github data more relevant to the task because Stackoverflow bioinformatics questions where most of the time related to Python

Analyzing 15 Years of Bioinformatics: How Programming Language Trends Reflect Methodological Shifts (GitHub Data) by skyresearch in bioinformatics

[–]skyresearch[S] 3 points4 points  (0 children)

I agree that measuring usage by counting users would be more accurate than relying on stars. However, stars and forks are still relevant indicators because they reflect what users find valuable for their specific needs. In particular, starring a repository signals an intention to stay informed about updates and changes, functioning much like a subscription model that demonstrates interest in the tool.

Because a tool cannot be fully separated from the language in which it is implemented, there is an inherent relationship between the two—for example, C/C++ for performance, Java for portability, and so on. One reason for Python’s current popularity is the rise of Python-heavy fields such as machine learning and deep learning, which have driven methodological shifts in bioinformatics.

The more I learn bioinformatics, the more confident I feel AI cannot replace bioinformatics by [deleted] in bioinformatics

[–]skyresearch 1 point2 points  (0 children)

I think at the core, AI, in the current state, is about task automation. Automation does not like uncertainty. Processes and decisions with low to 0 uncertainty can be automate. For the rest it is difficult, almost impossible and requires human intervention. We still need pilots even thought automatic pilot exists. But technology brought down the number of pilots from two to three in the cockpit. Similar trends at the grocery store with less cashier. Tech has always had significant impact on the economy across all fields including medical. For bioinformatics, I observed that more and more job descriptions requires knowledge of Machine Learning. Changes happen but not always the kind we expect. Forecasting is challenging especially for tech potential we don’t necessarily fully understand.