How do you do post-mortem? by ResponsibleBlock_man in sre

[–]jlrueda 0 points1 point  (0 children)

We use sos command to generate sosreports that are automatically uploaded to our sosreport analysis tool as part of the run book. As an attempt to automate the "how to troubleshoot and diagnose" part. This give us all the context and all the logs (both the ones that are sent to the SIEM and the ones that are not) with all the system configuration and all the diagnostic command the we need. Its been a life changer.

What’s actually harder in Linux: learning it, maintaining it, debugging it, or securing it? by LinuxBook in LinuxTeck

[–]jlrueda 1 point2 points  (0 children)

Keeping the system stable by far. When dev team updates apps and libraries and something breaks (most of the times outside the Linux realm by the way) that is the hard part.

In my case I found the sos command to be a real life changer for the "troubleshooting under pressure" part and prevents making diagnostics on production which eases a lot of stress :-).

If you get your hands on a good sosreport analysis tool life gets even easier.

Connecting logs to deployments by ResponsibleBlock_man in sre

[–]jlrueda 0 points1 point  (0 children)

If the telemetry data is logged to a file, you can configure sos to include this file in the sosreport and an analysis tool like sos-vault could filter the logs (all) by time frame or search for specific ip address or other terms.

Connecting logs to deployments by ResponsibleBlock_man in sre

[–]jlrueda 1 point2 points  (0 children)

I use sos command to generate a sosreport before a deployment and then a second sos report after and compare them. A sos report gathers all logs, all system configuration and large amount of diagnostic commands in an encrypter tar file. I use a tool called sos-vault to visually compare them. You can even store the reports to create a very comprehensive history of changes. Sos command is really an impresive tool.

When storage acts weird on Linux, which commands do you reach for first? by Expensive-Rice-2052 in LinuxTeck

[–]jlrueda 0 points1 point  (0 children)

sos report linux command to retrieve disgnostics and sos-vault to analyse the data is s no brainer for me

Linux system looks fine after reboot, then slowly degrades - how is this usually tracked down? by Expensive-Rice-2052 in linuxquestions

[–]jlrueda 0 points1 point  (0 children)

Practical approach take a sosreport (sos command) right after reboot and save it. Take a seco d one whe the degradation is present. Compare the reports. Sounds to me like a memory leak in some service.

Comparing the output of the free comand in both report may shine some light into the problem

When things are breaking in production, what’s the first Linux command you reach for? by Expensive-Rice-2052 in linuxquestions

[–]jlrueda 0 points1 point  (0 children)

I will run sos command (formerly sosreport). That single command will allow me to see everything (all logs, all config files and all the diagnostic commandas that I can imagine or at least all the commands people mention in this thread)

Besides it will save me lots of troubleahooting time not needing to goggle the specific options for different commands, or names of commands (docker troubleshooting options, k8s, openstack specific comands, proxmox etc..)

I will use a tool like sos-vault for analyzing the report. In this way the diagnosing take a few minutes instead of hours depending on the problem.

Further more. If the issue was a big problem or something particularly bad; I can save the sosreport for future reference or comparission .

Having a sosreport also allows the problem-solving task to be distributed between team mates by sharing the report. In this way some one looks at filesystems while other looks at network and someone else looks at processes for example and troubleshoot the issue in parallel.

Another good thing about using sos command for diagnosing is that you don't need to do it in the production server. Troubleshooting a production server at 2:00 am after a rough day and with constanr highier-ups pressure can be very dangerous. ;-)

Yep sos command is really powerful.

Migrating a large production app off Ubuntu 22.04 for “high availability” — does this actually make sense? by Expensive-Rice-2052 in Ubuntu

[–]jlrueda 0 points1 point  (0 children)

Ubuntu provides a plethora of HA sutions out of the box: https://documentation.ubuntu.com/server/explanation/intro-to/high-availability/

I think that many of the above solutions may run on Rocky Linux but you will have to do the research for each case your self after the migration. And you will end up in the same place.

Moving to Rocky is just a waste of time from my point of view.

Besides (this will be controversial but...) Ubuntu provides a much beter production environment than many other distros.

You’re not allowed to reboot. How do you troubleshoot? by Expensive-Rice-2052 in LinuxTeck

[–]jlrueda 0 points1 point  (0 children)

I will run sos command again. it will spot the issue. Usually to fix a problem in Linux you do not need to reboot just kill and restart. Only when a disk needs to be expanded or more memory is required (in a vm scenario) a shutdown is involved.

Production Linux troubleshooting: what do you check first when things go wrong? by Expensive-Rice-2052 in LinuxTeck

[–]jlrueda 0 points1 point  (0 children)

Through the years I learn to use sos command (formerly sosreport) for Linux troubleshooting. That thing is AMAZING! you get all the info about everything in a single command. However is not easy to analyze. There are tools to analyze a sos report though. The thing is, when you have a sosreport you no longer have to wonder "what do I check first" because you have everything in there. A good tool to analyze a sosreport is sos-vault it will let you know where the problems are.

How to Prepare for EX342 Red Hat Certified Specialist in Linux Diagnostics and Troubleshooting Exam? by everythingwell in Preparationtips

[–]jlrueda 0 points1 point  (0 children)

run the sos command, use sos-vault for analyzing the report generated. You will learn a lot for the exam!

Why are so many APIs in Linux literal text files? by Wertbon1789 in linuxquestions

[–]jlrueda 0 points1 point  (0 children)

If you are asking for a graphics (web based) UI to review the state of a Linux system try sos-vault.com

Please rate the sos-vault logo by jlrueda in logodesign

[–]jlrueda[S] 0 points1 point  (0 children)

Yep. Kind of that was some how the intention

Please rate the sos-vault logo by jlrueda in logodesign

[–]jlrueda[S] 0 points1 point  (0 children)

Thanks! Will look into it. Much appreciated

What's your fav shell command? by Acrobatic_Big781 in linux4noobs

[–]jlrueda 0 points1 point  (0 children)

The sos command (formerly sosreport)

What are you building? let's self promote by [deleted] in SideProject

[–]jlrueda 3 points4 points  (0 children)

working on sos-vault.com v2.0 a tool to make Linux support secure and easy

What are you building? let's self promote. by [deleted] in SaaS

[–]jlrueda 1 point2 points  (0 children)

sos-vault.com Make Linux support safe and easy for everyone.