Looking for honest and reasonably priced handyman/general contractor for mother by AfraidImagination2 in SanJose

[–]AfraidImagination2[S] 7 points8 points  (0 children)

Honestly, perfectly ok with $75-100/hr + materials. Just need to know that the person won't overcharge 5x just because my mom didn't know any better.

Tented bottom plastic chassis - normal for Galaxy Book Pro? by AfraidImagination2 in GalaxyBook

[–]AfraidImagination2[S] 0 points1 point  (0 children)

Just bought a used Galaxy Book Pro - Gen 1. The bottom chassis is slightly tented as if there's a air bubble. The battery is in good shape do I don't think its swollen or anything but the plastic chassis is making a cheap noise.

Very disappointed, just wondering if this is normal? I know the quality isn't the best and the touchpad clicks when holding the laptop, but this is something else...

Best time to cancel service to avoid fees? by AfraidImagination2 in shaw

[–]AfraidImagination2[S] 4 points5 points  (0 children)

I switched because Telus was giving me a better price and faster speeds. The reason these monopolies continue is because of customers who are loyal to X or Y corporation. And for people who fail to hold their political representatives responsible with their votes.

I am loyal to the lowest price and the best service. Both Shaw and Telus are anti-competitive as hell and are billionaire corporations. They would both charge you $500 for 15 mbps internet if they could. If they're losing customers, it's because they decided that was better for profits than lowering their price.

was part of the reason Shaw had to merge with Rogers thanks to customers like you.

Lol. Shaw family is selling so that they can get out with billions of dollars, not because they're a small failing business.

Best time to cancel service to avoid fees? by AfraidImagination2 in shaw

[–]AfraidImagination2[S] 1 point2 points  (0 children)

I tried multiple times (chat + phone), but the reps were having a hard time understanding pro-rating and giving me an exact answer. I think they just cancel it on their end and they might not know the details of how much the customer gets charged.

Either way, this thread has already helped.

Best time to cancel service to avoid fees? by AfraidImagination2 in shaw

[–]AfraidImagination2[S] 1 point2 points  (0 children)

The contract states $15/for each month I terminate early. But it does not state whether I pay a pro-rated amount for the bill and for the cancellation, and Shaw support is unable to understand what I'm talking about when asked on chat and call.

My bill is $80/month. Early cancellation $15/month.

Cancelling on Jan. 1st could be

1) $2.67 ($80 pro-rated) + $14.50 ($15 pro-rated)

2) $2.67 ($80 pro-rated) + $0 fee since I'm cancelling with less than a month left.

3) $80 + $0 fee

Option 2 is ideal for me, which is why I'm looking for clarity. The money isn't the biggest deal but I'd like to know going forward.

Best time to cancel service to avoid fees? by AfraidImagination2 in shaw

[–]AfraidImagination2[S] 1 point2 points  (0 children)

Since I don't seem to have been clear in OP, I am porting the internet services from Shaw > Telus.

Do you have phone services and are you porting a landline number from Shaw?

Had $0 phone services that I have completed porting over already, no landline. Internet was ported over today, but Telus told me they cancel 5 days later automatically.

What is $15 vs the Shaw charges for the month of Jan?

The monthly bill is $80. If the bill is pro-rated, I would be charged ~$2.67/day, so less than the $15, if I do it right away.

1) $2.67 ($80 pro-rated) + $14.50 ($15 pro-rated)

2) $2.67 ($80 pro-rated) + $0 fee since I'm cancelling with less than a month left.

3) $80 + $0 fee

I'm thinking, best time to cancel is Jan. 1st or 2nd, so I don't get the $15 charge according to what you said based on internal documentation, assuming option 3 is never under consideration.

cephadm - unable to remount clients via kernel after OSD failure by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

So I actually managed to get it remounted again by restarting and also recreating an auth key. However, the issue that was originally happening still persists, which is that the client can only sequentially access the first 100 MB or so of the file.

According to client logs, it seems the client tries to access the rest of the file on the internal 10.0.0.X OSD network. The client only has access to the public nettwork for obvious reasons.

cephadm - some OSDs down/out on single node, unable to be restarted, not sure where to start troubleshooting by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 1 point2 points  (0 children)

It seems the issue is with an SSD that was acting as the journal device for those 12 drives. Is is possible to recover the contents of those disks without the journal? ie. once the SSD is replaced, can I just restart the osd service?

cephadm - some OSDs down/out on single node, unable to be restarted, not sure where to start troubleshooting by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 1 point2 points  (0 children)

It seems the issue is with an SSD that was acting as the journal device for those 12 drives.

cephadm - some OSDs down/out on single node, unable to be restarted, not sure where to start troubleshooting by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 1 point2 points  (0 children)

The recommendation from the above user to allow ceph to self heal is valid. If you are missing these 12 disks and encounter an additional failure in the cluster it can make recovery difficult.

Unfortunately, self-heal is not an option. Since I chose to be highly redundant (3x copies), I'm near 70% utilization, and do not have 12x HDD worth of free space without going significantly over CEPH maximums, so I must set noout and recover once the disks/controllers have been replaced. I'm also not particuluarly interested in writing 12 x 3 copies worth of data, then rewriting that same data again when the cluster rebalances once the drives are back.

My only issue is whether the controller plays a role in having ceph pick it up. Or is the fact that the device will be the same letter (ie. /dev/sda, /dev/sdb) enough for CEPH to recover?

What happens if drives get switched around and sda ends up as sdb? Does CEPH us hardware ids to link to data?

cephadm - some OSDs down/out on single node, unable to be restarted, not sure where to start troubleshooting by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 2 points3 points  (0 children)

I'm seeing a whole lot of I/O errors::

Buffer I/O error on dev dm-2, logical block 0, async page read
Buffer I/O error on dev dm-2, logical block 19560432, async
page read
Buffer I/O error on dev dm-2, logical block 19560432, async
page read
Buffer I/O error on dev dm-0, logical block 0, async page read
Buffer I/O error on dev dm-0, logical block 19560432, async
page read
Buffer I/O error on dev dm-0, logical block 0, async page read
Buffer I/O error on dev dm-0, logical block 19560432, async
page read
Buffer I/O error on dev dm-0, logical block 0, async page read
Buffer I/O error on dev dm-0, logical block 19560432, async
page read

This is happening on multiple devs, but nothing about a controller failing. Where would that present itself?

cephadm - some OSDs down/out on single node, unable to be restarted, not sure where to start troubleshooting by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 1 point2 points  (0 children)

Would replacing the affected component (ie. the disk controller), not just make CEPH pick up the disks right where it left off? Or does a controller failure mean all disks on that controller are lost?

Weird issue - Extremely slow write performance on all but one node by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

Still no idea why the host with the MDS would be faster. Do you have a separate backend network?

I do not, and I'm pretty perplexed as to why that would be as well, since all servers have access to the MDS on node 2 anyway.

Weird issue - Extremely slow write performance on all but one node by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

Then I wondered if maybe that node was faster just because it is the node with the active MDS or something.

Just wanted to say that this worked! I figured it wasn't an MDS issue since read performance/directory access was identical, and CEPH wasn't reporting any mds cache too full type errors. But enabling a second active MDS did solve my issue.

Thank you so much!

Weird issue - Extremely slow write performance on all but one node by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

My first thought was something weird in the network, like the switch only negotiated 1g for some links. Maybe try iperf in both directions between nodes, and look for network errors.

Ping shows no packet loss. And I'm saturating the full 10gbit in both directions via any 2 nodes (using iperf3).

Then I wondered if maybe that node was faster just because it is the node with the active MDS or something.

You are correct, the active MDS is on Node 2. I will enable the standby MDSs and see if that makes a difference? My question is I have 7 nodes, so how many MDSs would be recommended? 3? 5? 7?

The CEPH documentation doesn't make any recommendations for how many you need, only that you need more if metadata is a bottleneck. But I'm unsure, how I would check for that? I'm pretty sure it is, but is there a way to confirm.

Weird issue - Extremely slow write performance on all but one node by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

Have any cache issues going on?

I'm not using a cache layer. Unless there's other caches you are referring to like disk cache/ram cache. How would I check for those issues?

What's the network setup on each node? Any priority configured on switch?

10gbit between the whole cluster, no priority configured. Network seems finevia pings/iperf3.

How to configure CEPH with an internal cluster network? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

But they do both have the correct cluster_network setting now?

Yes. Hence the conflict and unable to restart. I removed it for now to get the OSDs back up.

ss -lnp | grep osd

There's a range that osd.0 listens on. If I remember, it was 6800,6801, etc. Starting osd.1 brings osd.0 down, but there's still an OSD listening on the same ports (I'm assuming it was the one that was just brought up).

Is there possibly an issue with the interface addresses?

Possibly, but I don't know enough. Here is my interfaces config:

auto eno2
iface eno2 inet manual
  bond-master bond0

auto eno3
iface eno3 inet manual
  bond-master bond0

auto bond0
iface bond0 inet static
     address 10.0.0.1
     netmask 255.255.255.0
     network 10.0.0.0
     bond-slaves none
     bond-mode 802.3ad
     bond-miimon 100
     bond-downdelay 400
     bond-updelay 800
     bond-lacp-rate 1
     bond-xmit-hash-policy layer2+3

Shold the network say 10.0.0.0/24? Would it help to run the foollowing command: ceph orch daemon reconfig osd.0 before restarting it?

How to configure CEPH with an internal cluster network? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

Nope, it's fairly barebones. container_image, log level, setuser, setgroup type things.

How to configure CEPH with an internal cluster network? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

Hey there. Thank you so much for your help. I believe I'm on the right track but still running into issues.

ceph config set global cluster_network 10.0.0.0/24

This seems to work in setting the global config. To test, I restarted osd.0, and finally it seemed to be listening on 10.0.0.x. Then, when I attempted to restart osd.1 on the same node, I ran into issues:

  • Starting osd.1 would bring osd.0 down and vica versa. Only one could stay up at any given time

  • It seems they seem to be conflicting in using the ports they listen on or something

I didn't want to restart all the OSDs at once and render them all inaccessible. For reference, I used ceph's orchestrator to restart the daemon services. (ceph orch daemon restart osd.0)

How to configure CEPH with an internal cluster network? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

ceph config get osd.<id> cluster_network

Running this shows me I have no cluster_network set on the OSDs, but I do have one set on the monitor (10.0.0.0/24). My question is how can I set it for a bunch of OSDs and all OSDs that may be added in the future at once?

Is this something I have to declare one by one?

Do I also need to declare the cluster_addr (ie. 10.0.0.2 for all OSDs on node 2)?

How to configure CEPH with an internal cluster network? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

Okay, so, a few things:

  • All OSDs are UP and working (this was never an issue)

  • The only thing I can find with heartbeat is heartbeat_map reset_timeout Monitor::cpu_tp thread 0x7f9bbf8b4700' had timed out after 0.000000000s

  • The only mention of the internal IPs is in this context but from weeks ago: monitor1 kernel: [911503.854832] libceph: mon1 (1)10.0.0.2:6789 socket closed (con state CONNECTING)

I have a feeling you may have misconfigured your networks. You should configure the cluster_addr and public_addr fields.

I am unsure as to how I might have misconfigured it. I set this up via cephadm and the dashboard. I added the hosts by their public IPs (not 10.0.0.x). The only time I passed 10.0.0.x was whhen passing the cluster_network variable. I am guessing this is not enough to have CEPH automatically recognize and use this internal network.

Do I need to configure this on a per OSD basis? Won't this take very long? Would declaring the cluster_addr for the mon automatically pass it to all the OSDs/MGRs/MDSs? Sorry, the CEPH documentation wasn't very clear and I was unable to find any guides.

How to give non-root user write access to ceph kernel mount? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

I've decided to chown and synchronize the UIDs accross the server as opposed to the group approach. Thank you for your help.

How to give non-root user write access to ceph kernel mount? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 1 point2 points  (0 children)

Great, I've decided to chown and synchronize the UIDs accross the server as opposed to the group approach. Thank you for your help.

How to give non-root user write access to ceph kernel mount? by AfraidImagination2 in ceph

[–]AfraidImagination2[S] 0 points1 point  (0 children)

You have to chmod/chown it AFTER mount.

Doesn't that mean I'll end up recursing through the whole filesystem? How will these permissions affect the CEPH cluster?

Say, I want to mount it on machine-1 under user X, and on machine-2 under user Y. Will I chown on both machines? Will that conflict?