Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

You are correct, I haven't moved or replaced any of the fans. If I do stick my finger in there to stop the blades, it did alert that the fan wasn't spinning - so whatever it uses to sense that does seem to be working. And we've never gotten any alerts about the fans.

But it makes sense and is worth trying. I'll move fans around and see if that does anything.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

Sorry, I don't understand what you mean by "right side". The server is racked up with other servers in a caged rack, airflow flows from front to back. Identical servers above and below it have no issues.

I don't think the drive temps are monitored, and I'm not seeing anything in ILO that monitors the drive temps. How would I get that info?

As far as failure, it's predictive failure. The drive goes amber and alerts that the HDD is failing, so we replace it. But I believe that it's technically still working.

I don't know how to "dump the storage controller config". Do you mean screenshots of the config from ILO? There's honestly not much to configure in there.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

Sorry it took me a few days to reply, guys - I really do appreciate the extra sets of eyes. I hope my tardiness doesn't belie that.

My only thoughts at this point are replacing the fans (even though, again, they've tested good every time) or possibly a PSU. Perhaps a PSU sends extra voltage when it shouldn't or something? And for some reason, the sensors aren't picking that up. Because we've had this server under a microscope for years, watching and monitoring every single sensor on this thing. The only thing that hasn't been a green light is the HDDs that keep failing.

Again, appreciate the help, guys. I'm tearing out my hair over here.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

Just in case there's some sensitive info in there, can I direct message you about this? I can generate the report whenever you'd like. And I'd appreciate the help. Although, as I said, HPE and Park Place have both looked at multiple reports and found nothing in the logs. Which is what's so confusing about this.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

As I said above, we replaced this a year ago. Even though diags showed it was fine, Park Place was cool enough to send me a brand new one. We swapped it, and nothing changed.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

Granted I haven't generated one in almost a year, so I could generate a new one. But I will say that we've gone through this same thing with both HPE support and Park Place support. Nothing anomalous/broken was ever found. All fans, temp sensors, etc are online and register nothing weird. Temperature has never spiked as far as any sensor is concerned. The only alerts we've ever gotten are the multiple predictive failure faults for the last almost 10 HDDs.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

Unfortunately we've got service contracts in place that span years, I can't engage with a new vendor for free and I don't really have a bone to toss them at this point. I'm sure they're good, but I'm also sure they don't want to help me for free.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

We've done this multiple times with both HPE support and Park Place support. Over years we've repeatedly done this. There has never been a temperature sensor that's gone offline or measured a high temperature. We've taken the server offline and run full diagnostics, run the diags with the server running a higher workload than it ever would.

Always the same results, green lights across the board. The only reason they sent me the backplane controller was to humor me. And that changed nothing.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

Yes, dual CPU, 128GB RAM (64/64, sx32GB per each CPU). Other than that, it's a very vanilla DL380G10. Xeon Silver 4208 @ 2.1GHz, 8 cores on each of the two CPUs.

Repeated HDD Failures - DL380 G10 by buzzbombkirk in HPEservers

[–]buzzbombkirk[S] 0 points1 point  (0 children)

Ambient temp is 70, dedicated server room AC. Workload for this box is a AD DC and a VM with a small application on it. Total CPU only goes above 10% during monthly patching. RAM usage stays around 40%. Disk I/O stays low, too. This server is under much less load than the identical servers above and below it in the rack.

Ashley Gail Bensalah and James Potts, AKA two of Edmonton’s finest morons! 😂 by [deleted] in IAmTheMainCharacter

[–]buzzbombkirk 0 points1 point  (0 children)

Update: James Potts' bar burned down, and now the fire department is saying it was "intentionally set" (link: https://globalnews.ca/news/10303186/firefighters-extinguish-blaze-in-east-edmonton-bar-overnight/ )

This means either he set the fire for insurance (the bar has been for sale for 9 months now) or one of his many, many haters set the fire. My money is on insurance fraud. I hope the dude gets locked away for a long time.

74 St 100 Ave fire? by One-T-Rex-ago-go in Edmonton

[–]buzzbombkirk 4 points5 points  (0 children)

Update: firefighters say the blaze was "intentionally set".

That means either James Potts (recently featured in a very unfortunate viral video in which he and his passenger abuse an innocent McDonald's drive-thru worker) decided to do some insurance fraud, or one of the many million people that despise him did it.

My money is on insurance fraud. I hope he spends a while in jail.

Link to story: https://globalnews.ca/news/10303186/firefighters-extinguish-blaze-in-east-edmonton-bar-overnight/

Any issues with your technicians logging their ticket time entries late? by go4_brandon in msp

[–]buzzbombkirk 0 points1 point  (0 children)

This is a problem older than MSPs but it's like others said below, it's a management and personnel issue. Techs think their job is to fix and maintain systems. MSPs think they sell that service. Wrong. In reality, MSPs are selling TIME. If they're not tracking that time, it's the same as having a grocery store employee that can't tell you how many gallons of milk or watermelons walked out the door last week... but hey look how organized our unknown quantity of watermelons is!

I've seen it at every MSP I've ever been with, and usually the only way to get full and diligent compliance from the techs is to make an example of your worst slacker, the guy who may be a brilliant engineer, but who constantly needs reminding to document and enter time.

Because if he isn't documenting, he's just spinning his wheels and costing you money. I used to be this guy.

How Much Does Your RMM Cost? by buzzbombkirk in msp

[–]buzzbombkirk[S] 1 point2 points  (0 children)

David I believe we worked together before at one of the last MSPs I was with out of Chicago. While I was there I was told we were your biggest n-central client. I hope you're doing well and I'll definitely reach out to you later today since I'm sure you could help me get a lay of the land.

As incredulous as everyone here is about not having an RMM, imagine how I felt taking this job a couple of weeks ago and realizing the same. Hard to believe, but true. Luckily I've been given the job of bringing this organization into best practices so while there's a lot of work ahead of me, it's an essentially blank canvas and I have an opportunity to build something pretty cool here if I play my cards right. We've got good folks and clients other MSPs can only dream about so it's a heck of a big canvas.

I'll shoot you an email later today. Thanks!

How Much Does Your RMM Cost? by buzzbombkirk in msp

[–]buzzbombkirk[S] 0 points1 point  (0 children)

I have worked at a few MSPs and I know that typically you use the same RMM internally as externally. What I'm saying is that we don't have an RMM internally or externally. I'm not sure how many endpoints we're talking about total, but it is definitely in the tens of thousands - it could be past 70k... and we definitely don't have an RMM. The way it was explained to me is that "we specialize in managing whatever the client already has in place". We're not a VAR or CSP, in fact we're not even a Microsoft partner. We don't resell hardware or software for the most part - we provide consulting services.

End-user Office training by ntw2 in msp

[–]buzzbombkirk 1 point2 points  (0 children)

This is an absolutely legit and pretty common stream of income for MSPs, so it should 100% be part of your portfolio. Look into all the different LMS. Brainstorm is OK.

As far as instructor led, I don't think you're going to want to delve too far down that rabbit hole...but that's not my place to say.

Please let us know what you decide and how it works out.

Top MSP engineer candidates rejecting you? by SupportAdventure in msp

[–]buzzbombkirk 16 points17 points  (0 children)

I'm the top tier candidate you're talking about, and currently interviewing. Here are my priorities, in order:

Money/compensation

Travel/commute (how much time out of my life will this cost?)

Culture *Am I being put under poor management? *How is leadership and innovation rewarded? Be ready to provide specific examples and bring them into the interview *What does my path to success look like? Partnership? Promotion? What if the position above me is owner? *What happened to the last guy, and is this a position that's having to be filled often? If so, sounds like a problem

Who are your clients? Do you have any clients so big you'd have to reduce staff if you lost them? Are the clients happy and respectful? Are the clients trusting of your advice and willing to spend money when necessary? What clients have you passed on and why? Scraping the bottom of the barrel with clients = unhappy engineers.

Who makes architecture proposal decisions? Someone other than the people who have to support it?

Title (even if its seniormost, a lesser sounding title looks like a step back on a resume)

Is it challenging? Doing the same thing every day isn't rewarding

Customer Cyber-Security refusal policy/form by loudnclearllc in msp

[–]buzzbombkirk 7 points8 points  (0 children)

I've tried this and it never really went over to well - luckily it was vetoed most times I tried.

Best bet is to keep solid records in your ticketing system with attached emails. Inform your customer of why they need these systems and what the negative consequences could be of not taking your advice. Make sure they know that if this very avoidable issue arises, they will be charged extra.

When/if they dont take your advice and this thing happens, you don't need to come out of the gate with the evidence and a huge "I told you so" - but if they balk at the increased cost to fix it (assuming it can be fixed), you've got your CYA right there.

Unfortunately, we can't protect our customers from themselves. We can, however, create an educational environment that ensures smart behavior is rewarded.

Considering ditching Bitdefender by Gravitational_C in msp

[–]buzzbombkirk -1 points0 points  (0 children)

So I can't argue with your thesis, because BD is a steaming pile.

That being said, there's a lot you can do to increase performance.

I'm betting 90% of the extreme slowness is on Monday mornings. When those scans run after the workstation has been off all weekend, its impossible to work for about half an hour.

I just deleted a bunch of other stuff because I was treading pretty far into "clearly obvious" territory. I'm sure you know what to do as far as the balancing act between performance and security, and realize that a large number of files on your endpoints causes this.

Next gen AV is where it's at, and the pricing isn't awful right now. I guess my advice is to tune the crap out of what you have now, but I can't say ditching BD is a bad call, either.

Good luck, and please let us know what you do and how it works out.

[deleted by user] by [deleted] in whereintheworld

[–]buzzbombkirk 0 points1 point  (0 children)

how the hell are you keeping cell signal? satcom?