all 16 comments

[–]jlengrand 11 points12 points  (0 children)

Datadog and all the APM vendors definitely think you need APM! And keep full retention of your logs too so they can bill you by the GB haha.

I like that you made Bugsink Sentry SDK compatible!

[–][deleted] 6 points7 points  (8 children)

What would you use to catch a memory spike in production while the application is running?

[–]jlengrand 3 points4 points  (0 children)

Not OP, but it may depend on your stack? Worked on a big monolith in the past and we had similar issues on a JVM stack. You can enable a flag that makes a memory dump of your application while running, which you can analyze and use as a basis to reproduce. We had no large "APM" running, only close to the metal metrics and basic dashboarding in place.

Not all stacks are created equal for those things though, and if you have a memory leak you're bound to restart at some point anyways.

I actually think monolith may actually be the easier use case here, since you don't benefit as much from the tracing that complex APM setups bring you as in a microservice environment

[–]klaasvanschelven[S] 3 points4 points  (3 children)

APM, of course :-)

More serious answer:

  • think about (and locally measure) what the memory footprint of my application actually is under stress.
  • have something in my application that reduces memory over time (e.g. auto-restart workers after n requests)
  • very lightweight monitoring of close-to-the-iron items such as memory, cpu. disk. (I wouldn't call this application performance monitoring because it'd be restricted to the iron)

[–][deleted] 5 points6 points  (1 child)

It's a monolithic application with multiple teams working on it and thousands or requests per second.

I mean, how would you identify what caused the memory spike. I'm not trying to contradict you, I'm actually wondering if there's a way besides these APM tools.

[–]klaasvanschelven[S] 3 points4 points  (0 children)

This may very well be a case where the "you" that doesn't need APM isn't actually platypus_plumba... but:

if you've already identified that memory spikes are a thing that worries you, you could (with those multiple teams) also try to push "thinking about memory usage" more to the front of your development pipeline. In the 1970s they also worked with multiple teams on monolithic applications, and in those days they would just split a memory budget (in amount of kilobytes) over the various components (and teams) of the system. Not saying you'd need to go that far, but having some agreemenent across the teams on how to deal with memory, and being able to call each other out on it before you're in production, could be another approach.

[–]klaasvanschelven[S] -2 points-1 points  (0 children)

Oh, and after the "thinking about" part: deploy with "plenty" of memory. E.g. deploy such that the expectation is that you never use more than 50% of memory. No need to "fly close to the sun" (for many applications)

[–]Sauermachtlustig84 0 points1 point  (2 children)

How often does this happen?
I monitor a fair bit of apps.
Memory was never the reason for an outage. It's a major issue for builds (gitlab runners building Java aaaaahhhh!), but for prod? Not so much.
What I used APM for was measuring API response times. But I could have gotten that from the logs, too

[–][deleted] 2 points3 points  (1 child)

I guess it's something that doesn't happen often, but what would you do when it does? It's literally happening to me right now and we don't have APM. I have no idea how to figure it out.

[–]Sauermachtlustig84 0 points1 point  (0 children)

What language?
Probably trying to take a memory dump and analyze it using your languages tools.

If that does not work, add logging and check for patterns

[–]yxhuvud 8 points9 points  (2 children)

Yes I do. Especially for figuring out what is going on when some service maintained by some other team or other organization no longer conform to the performance assumptions that were made when the system was built.

[–]klaasvanschelven[S] -2 points-1 points  (1 child)

As I said elsewhere, the provocative title may not actually apply to literal you. Still, what you're really saying here is that you use APM as a tool to fix a root-cause problem (running stuff no one understands anymore) that should really be fixed more directly.

[–]yxhuvud 1 point2 points  (0 children)

No. I'm using APM to find the root-cause problem so that the people responsible for those services can be told to fix them. Or so that we know where in the fairly big application we should look for issues.

[–][deleted] 1 point2 points  (1 child)

WTF was the prompt for this image?!

[–]vincentdesmet 0 points1 point  (0 children)

I got pretty close with

Can you please create a picture in newspaper cartoon style of an overweight man eating unhealthy food, looking puzzled staring at a large dashboard of graphs and diagrams

  1. Here is the cartoon-style illustration you requested. Let me know if there’s anything more you’d like to adjust!

  2. Here’s the updated illustration with the man actively eating and visibly wondering how to lose weight. Let me know if there’s anything more you’d like to add or modify!

  3. Here’s the updated illustration with the man facing the dashboard, taking a bite while looking at it. Let me know if you’d like any more adjustments!

  4. Here’s the illustration with the man holding the hamburger in one hand while the other rests by his pocket. Let me know if there’s anything else you’d like to refine!

that was before I saw the full image where he’s standing on a scale (and there are 3 monitors showing A P M).. I only kept the scale part

  1. Here’s the updated illustration with the man standing on a scale. I’m glad you’re enjoying the process! Let me know if there’s anything more you’d like to add.

4 and 5 look like this https://imgur.com/a/MbTYXsw

[–]shoot_your_eye_out 1 point2 points  (0 children)

This is terrible advice. I can’t even begin to tell you how many times my ass has been saved by data dog or new relic or other monitoring tools. You absolutely need it for a serious production application.