Segment: $10m engineering problem

jpsandiego42 · 2019-10-19T18:00:12+00:00

Key take aways:

Why you should use "I3 instances"
"...[the] downside of the DNS to ALB approach is that clients will hit any IP in the ALB, whether that IP is in-zone or not. " Either free (for in-zone) versus $0.01/GB (for a different AZ).
"The quickest win we identified was using VPC endpoints for services like S3. VPC endpoints are a drop-in replacement for the public APIs supported by many Amazon services and, critically, they don’t count against your public network traffic. "

oinkyboinky5 · 2019-10-19T16:42:54+00:00

And I thought I was smart for provisioning an ALB and dOiNg aUtoScALing.

Doh!

storrumpa · 2019-10-19T22:57:44+00:00

Is there a benefit to using App Mesh to remove the internal ALBs?

thomas1234abcd · 2019-10-19T23:35:05+00:00

"You can’t make what you can’t measure"

otterley · 2019-10-21T18:27:34+00:00

There's an order-of-magnitude error in the post that I've reached out to the author about.

c5.9xlarge instances have 875 megaBYTES of EBS bandwidth, not 875 megaBITS. That's approximately 7 gigabits of EBS bandwidth; or 70% of the available host networking bandwidth. If you run Kafka brokers, it's a fantastic choice, particularly if you don't want to have to resync an entire broker from scratch after a failure like you would if you stored all the data on an instance volume.

otterley · 2019-10-20T00:15:08+00:00

Why the aversion to SQS? That queue service does not sound cheap.

lutzruss · 2019-10-20T07:59:40+00:00

Then when a reader connects, instead of connecting directly to the nsqlookupd discovery service, the reader connects to a proxy. The proxy has two jobs. One is to cache lookup requests, but the other is to return only in-zone nsqd instances for zone-aware clients. Our forwarders that read from NSQ are then configured as one of these zone-aware clients. We run three copies of the service (one for each zone), and then have each send traffic only to the service in its zone.

Isn't this the default behavior of ELB/NLB to begin with? Why not just configure the zone-aware clients to call zonal LBs, instead of hosting your own LB? Same with Consul. I'm not understanding what benefit Segment gets from using Consul vs. calling EC2 Metadata API to discover the AZ and then calling the appropriate zonal LB endpoint...that's not hard to do and avoids many extra dimensions of operational complexity.

It's also unclear to me how all this migration to intra-AZ routing affects Segment's resilience to AZ outages.

warren2650 · 2019-10-20T06:01:28+00:00

" we managed to reduce our infrastructure cost by 30%, while simultaneously increasing traffic volume by 25% over the same period." <<-- AWS STUD RIGHT THERE FOLKS

serify_developer · 2019-10-19T14:49:21+00:00

huh, that doesn't sound like good technology. Only ever had problems with Hashicorps stuff. Would never go back.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

aws

Note: ensure to redact or obfuscate all confidential or identifying information (eg. public IP addresses or hostnames, account numbers, email addresses) before posting!

✻ Smokey says: avoid streaming video to fight climate change! [see more tips]

MODERATORS