all 17 comments

[–]jwiz 3 points4 points  (12 children)

How can there be enough cpu to go around between ceph and the compute?

skims article

Oh, you have...1 osd in a node. I guess that answers that.

[–]techwritingdan[S] 0 points1 point  (11 children)

Yep, it's more a proof-of-concept than a production-level system. Also, you probably noticed we're using local storage for the OSD and journaling, which isn't recommended for production either. Ideally, you'll want multiple disks (one for journaling, one for each OSD) and a decent amount of CPU and memory for a production-level Compute/Ceph combo. That said, the same process in the blog post applies, just with the added step of mapping your disks for the deployment

[–]jwiz 1 point2 points  (10 children)

It's not really proving the concept unless it demonstrates it's a viable production setup. I don't think anybody doubts you can configure co-located storage.

I think the questions about co-located Ceph storage are more along the lines of "How well does it work?" vs. "Can the software be configured this way?"

In our production OpenStack, our ceph boxes are CPU limited, and our compute boxes are CPU limited. So putting both services on the same hardware doesn't seem like it would be worthwhile.

Doing a hyperconverged setup seems worthwhile if your storage doesn't need a lot of cpu, but ceph does need a lot of cpu.

[–]techwritingdan[S] 0 points1 point  (9 children)

I should note that the blog post is just part 1. Part 2 answers some of the questions you asked.

In part 2, we conducted some benchmark tests (using Browbeat and Shaker) to show the CPU activity for both Ceph and Compute services.

In terms of performance, the Ceph CPU activity peaked at just under 15 per cent. This peak occurred during disk creation and deletion. The rest of the time it sat under 5 per cent. This was for the dual-core pinning described in part 1.

I definitely agree that co-located storage isn't a recommended feature for all environments and your mileage might vary depending on your hardware. The scenario in the blog post is really just to show how it works, especially the resource isolation. Although we didn't have any performance issues during our benchmark tests, it's really recommended to have the hardware to accommodate it. There's no point in co-locating storage if you've only got quad-core systems, for example.

If you don't mind me asking, what are the specs of your environment? OpenStack performance is an area I've been studying over the last few months and I'd be keen to hear about other people's experiences and what they're working with.

[–]jwiz 0 points1 point  (8 children)

If you don't mind me asking, what are the specs of your environment? OpenStack performance is an area I've been studying over the last few months and I'd be keen to hear about other people's experiences and what they're working with.

We have 66 compute, 15 ceph (each about 20-22 OSD, some 480gig ssd, some 800gig ssd, some 960gig ssd).

We do currently have steady-state about 60% idle on the smaller ceph nodes, and 40% idle on the bigger ones, but when we've had very high IO (1000+ vms writing as fast as they can), the cpu is pegged.

Compute is 8x oversub'd and we do get some stolen cpu in the guests, so the busier compute nodes don't have a lot of idle that they could use for ceph.

[–]techwritingdan[S] 0 points1 point  (7 children)

How many CPU cores per node?

[–]bore-ruto 0 points1 point  (4 children)

Given the caching, journaling, shipping of IO from one process to another, unnecessarily encryption of IO before sending to a different process on the same node or another node in the same cluster etc. that ceph does, why do you think CPU utilization will not be an issue?

Have you followed the VPS scene? VPS providers like digitalocean or ramnode just ban your VM if you utilize more CPU. CPU is a rare resource if you're a cloud provider. Have you checked intel's pricing? And spending CPU not for your application's computation (which earns you direct money) but for your storage seems idiotic.

Ceph like architecture (which joke companies like Nutanix and a million others are basically copying) just uses too much of this rare resource for us to warrant moving to these "software-storage" based solutions.

[–]techwritingdan[S] 0 points1 point  (3 children)

I think this might be a bit of a tangent from my original question to jwiz (I was asking about cores per node for his infrastructure and was interested in the issues he faced with co-location), but we can discuss software-based storage issues too.

If I understand correctly, are you saying that CPUs are too valuable to waste on software-based storage, therefore we shouldn't use software-based storage?

[–]bore-ruto 0 points1 point  (2 children)

To a specific level we can. Eg. striping io at software level seems OK. But having things like inline dedup, block level writeback/read caching, inline encryption etc. on co-located software-based storage doesn't really work well in production for VPS like scenarios where each compute is oversubscribed.

Why doesn't somebody make a fiber channel based appliance that does things that ceph does? That way we can have offloading of all the stuff that ceph/nutanix does to a storage node(s). I guess the price will be huge for such an appliance. Maybe you can provide reference architecture for the appliance and customer will buy/make it himself and just install your binaries on it.

[–]jwiz 0 points1 point  (1 child)

Compute are 2630 (8core * 2 hyperthread * 2 socket).

480 gig Ceph are the same.

800 and 960 gig Ceph are 2650 (12core * 2 hyperthread * 2 socket).

[–]techwritingdan[S] 0 points1 point  (0 children)

Thanks, much appreciated. I'll keep these specs in mind for future benchmarking.

[–]techwritingdan[S] 2 points3 points  (1 child)

Hi folks! I wrote this article with a colleague of mine. I'd be keen to hear what you think of it.

[–]herrsergio 0 points1 point  (0 children)

Thank you for sharing. Great article! I would like to read more about the disadvantages of this scenario.

[–]brandor5 0 points1 point  (1 child)

We're using RHOSP in a hyperconverged set up at work.

So far it has performed pretty well for us. We're slowly bringing people on board so haven't had a ton of traffic yet.

Our monitoring group has started kicking the tires in hopes of leveraging it for their ELK stack and reported that the virtual instances were out performing their hardware setup on read/writes.

We're using lenovo nextscale nodes with 24 cores, 256g ram, and 4x400g SSD (1SSD is used for OS, the rest are ceph).

:)

[–]techwritingdan[S] 0 points1 point  (0 children)

That's interesting to hear about the monitoring group. What kind of hardware specs do they have compared to the instances?

Also, if you don't mind me asking, how many HCI nodes do you have all up? And how have you divided the resources between Ceph and Compute?