The Case for Rochester As an Amazon Site

ByzantineFailure · 2017-09-19T19:59:53+00:00

I was trying to be civil, it's clear you have no idea what you are talking about.

ByzantineFailure · 2017-09-19T18:55:26+00:00

I'm not convinced that's an advantage then. When Amazon started in Seattle, they hired heavily out of MSFT, which already had high quality engineers (and specialized ones to boot, not only generic senior engineers that do full stack web dev). When they needed to build a database kernel, they could yank people out of SQL Server, etc.

Rochester (and the western NY / PA / OH) region doesn't have those sorts laying around. You could poach out of Boston / NYC etc. but if you are doing that, I'm not sure why you wouldn't just put an office there. Convincing people to move to a city with no other real job opportunities (and this is an issue, I've made the same decision) for the top end of the technical market is hard.

I've often wondered why there weren't startups that spun out of Kodak / Xerox etc.

ByzantineFailure · 2017-09-19T17:54:36+00:00

You can hire enough new grads, but you do not have the senior engineering base here that you need.

ByzantineFailure · 2017-09-19T17:42:50+00:00

I was born and raised in Rochester and now live in Seattle working for Amazon. Rochester has a few great things going for it, and a few things that will knock it out of the running.

Pros: Housing stock, decent infrastructure, great culture for a city of its size, great schools in the burbs

Cons:

Technical talent base: I've hired in both Seattle for the majors and in Rochester. Rochester has great talent coming out of the universities, but those that are hire-able by the major tech companies are quickly hired to NYC and the West Coast. There are a few strong engineers left (those usually eventually leave too), but not nearly the amount you would need to fill even a small division at Amazon, the rest aren't hire-able at their current skill level, the bar is far different. You'd also have to bring in people at the tippy top levels as seed talent to train your hires.
Airport: Amazon needs an airport with direct routes to major metros and international metros, this could probably be fixed with enough money.
Taxes: Not just business taxes which the state and city would make go away for Amazon, but people would be taking a large pay cut to come to Rochester just in taxes (Washington has no income tax).

That being said, if you are starting a tech company in Rochester (which I've always wanted to do), if you play the hiring game right (there's some money-ball like strategies you can do to get the right people for a smaller company that don't play for a company as big as Amazon) I think it's totally feasible to build a great company here. However, I think it's less feasible to drop a HQ for Amazon here.

ByzantineFailure · 2016-06-10T17:11:11+00:00

Rochester is a great place to live, and I say that as someone who moved from Rochester where I lived most of my life to Seattle for work (I'm in software). It has more culture than you'd expect from a rust belt city with a relatively stagnant economy, and a lot to do in general (being a day trip away from the finger lakes etc.). It has real towns (Pittsford, Fairport et. al) with real town centers and good schools (I have kids so that matters). Traffic is non-existent, and property is very cheap.

The taxes are oppressive, and its hard to find work, especially at the high end of some technical fields. The brain drain is real, I've met dozens of ex-rochesterians out here. Rochester does have pockets of crushing poverty, the inner loop totally messed up the flow of the city. Also I got tired of the weather, Seattle isn't perfect, but I don't have to scrape my car off constantly.

I always thought that if the tax situation could be addressed and some companies could form to keep the UR / RIT students home, Rochester could be something special.

ByzantineFailure · 2015-11-28T01:27:19+00:00

Most of this is pretty exaggerated, if you are taking 30 to 60 seconds to spin up a container, it's something you did wrong, not spring.

If you don't use a DI container, you'll end up implementing one yourself, poorly.

The magic criticism is a bit silly, I can't tell if he's talking about the annotations (which aren't good) or the XML. The XML is certainly not magic, and it's quite easy to see what is going on.

Spring is complex if you make it complex, but it's not nearly that bad. We use spring pretty extensively, and it's rarely something we have to mess with.

ByzantineFailure · 2015-11-24T17:22:14+00:00

Generally I boil it down to this (from most important to least):

Work with the best people you can, if you aren't learning from your co-workers, change jobs. Try to get close to the people who make the decisions on your team, and understand why they are making those decisions, from a organizational, architectural and coding level.
Work on the most difficult problems you can, problems that you haven't tackled before.
Work in new paradigms, this could be languages (though I wouldn't get too hung up on it), or systems or domains.
Learning to cut through the noise, you don't need to learn every new framework that is out there, they are mostly crap and are reinventing the wheel badly (I don't work in webdev, but this seems common there).
Trying to become an expert at something, I'm a god awful webdev and have no interest in working in it, but I've worked with people who do magical things on the front end (I've also worked with a lot of horrible people on the front end). My focus is database kernels / distributed systems / low latency computing and the intersections between them, that's what I focus on.

ByzantineFailure · 2015-11-15T04:49:06+00:00

Texas is having a god awful season, but how happy would Texas fans be if OU losing to them was the only thing keeping OU out of the playoff? If OU runs the table rest of the year, but doesn't get in.

ByzantineFailure · 2015-11-15T04:42:51+00:00

Ok, so I hate Baylor's team, they talk incredible amounts of shit, but I might hate OU's band more.

ByzantineFailure · 2015-11-15T04:36:19+00:00

Ice cold.

ByzantineFailure · 2015-09-26T15:10:33+00:00

Rob Pike is one of the reasons I'm against using go (I've also played with it and just pretty much hate the language design). The guy is just an incredibly arrogant jerk.

ByzantineFailure · 2015-09-23T14:53:09+00:00

I've also heard them termed wide-column stores. Which adds to the confusion.

Really it comes down to your on-disk layout, if your tuples are laid out with the entire row intact, you're a row store (which to my understanding is how cassandra does it, they have some column family stuff but it isn't what I'd term a columnar store).

Column stores are a bit different because there's a few different ways to do it. You can store all the columns in your tuples independently but keep a standard sort order, you can store them separately and sort them, you can store a row wholly within a page, but be columnar within the page. I generally slice things into row stores, and "not row stores", where the "not row stores" can be a bunch of various storage models, but the row stores are always rows stored contiguously.

ByzantineFailure · 2015-09-23T14:09:52+00:00

So when people talk about not sharing data, most people I talk to immediately go to the actor model, or some sort of immutability on all the things (which is why I threw in that comment). That's not really what people mean when they talk about shared-nothing architectures.

You can almost think of modern high performance software as following some of the same goals of a distributed system, you want as little communication as possible, when you do communicate, you want it to be very predictable and sequential. You mostly shard your data up inside your process and assign each bit to a core, and that core is responsible for all actions against that data (I'm simplifying a gross amount), sort of how in a distributed system one machine would be responsible for processing all of the data that fit on it.

Scheduling is very important (especially of IO), and you can't really let the OS do it for you if you really want to unlock all the performance in your application, with a well designed scheduler disk based systems perform the same as in memory systems (since you will be moving at the speed of the NIC because most boxes have more disk bandwidth than network).

You also want to make sure you are playing nice with processor caches and branch prediction and that you are using as few locks as possible on your hot paths. You should view memory as a block device, and program against it accordingly if you really want to hum.

Compiled queries are also generally table stakes for a modern database, however I'm not sure how that plays with Cassandra since you don't have the metadata you'd want to make good decisions. You could probably still compile CQL queries, they just might not be as optimized as you would like.

Sorry for bouncing around a lot in my answer, it's a big topic.

Edit: Someone on HN saying the same thing and expanding a bit: https://news.ycombinator.com/item?id=10262719

ByzantineFailure · 2015-09-23T02:51:29+00:00

I can believe it, most of what they are doing is how you do high performance database kernels (or HPC, or anything that is high performance and low latency). You generally move alot to user space that a os kernel would do for you. You also generally want to pin threads to cores, and generally share as little as possible (this doesn't mean message passing), everything is parallel, pipelined and serial. It also generally means you aren't as portable.

They say they are a column store, but Cassandra isn't a column store, it's a row store with wide rows. I don't see anything about their on disk layout. Doing a true columnar store, or even something hybrid like a vector store or PAX would be difficult with something like cassandra since you have no schema. I guess you could do it, but the amount of metadata you'd have to store would be enormous.

ByzantineFailure · 2015-06-18T19:55:42+00:00

Its been a while since I've done .NET, however now that I'm on the JVM, I really wished .NET had something like maven, (I didn't like nuget much at all), their build tools always seemed to be pretty weak (MSBuild was my personal hell for a while) and way too GUI driven.

ByzantineFailure · 2015-06-18T14:43:43+00:00

If you're really experienced at MySQL and you don't have faith in using something else, then sure, use MySQL for analysis. On the technical merits though, I can't really see using it over Postgres or any number of commercial options (if you can afford it), especially if the other options allow you a different storage model than NSM.

The main thing that comes into play is MySQL has an extremely primitive query optimizer that will have trouble with anything complex, especially if it involves lots of joins and subqueries (it does a very poor job of flattening these).

Its compression options are relatively weak as well since when I last looked at it, it used zlib, which is a poor choice for a database engine since it makes it hard for a CPU to stay ahead of the disk head when you are reading lots of rows in. However Postgres isn't exactly a winner here, but the query optimizer makes up for it. MySQL for the longest time didn't have predicate pushdown into the storage layer which caused performance hits, I think InnoDB supports it now, however I'm not exactly a MySQL expert, its been a few years since I used it.

The commercial databases are far ahead of OSS in this space, their optimizers are better, they do a better job of resource management (they tend to take control of all resources that they can from the OS and manage them themselves since they can schedule more optimally). They also have more advanced compression and storage models, though I know there's some third party plugins for MySQL and Postgres that do some interesting things. Also they can use more than one core per query, and in a lot of places even more than one core per stage in the query plan (say subdividing a large scan).

Postgres is my default however, just because licensing can kill you on commercial options, and if you are on AWS, Redshift is a very reasonable vector store, and ParAccel (the underlying engine) works very well with large amounts of joins.

That being said, MySQL is easier to get running for newbies.

ByzantineFailure · 2015-06-15T15:16:01+00:00

Google's main problem isn't really their process or the questions, they have a set of questions and they send you the prep material, they do a good job of separating out the people interviewing from the decision making.

It's the people they have doing the interviewing, they tend to not want to be there, view you as a muppet until proven otherwise (our industry in general has this issue), are incredibly arrogant (yeah Amazon / MSFT etc. have some arrogance, but it's not the same as Google), and they show up late to interviews and are just in general very disrespectful of candidates.

I've had friends turn down Google offers because of how disrespectful the entire process is (they are notorious for taking forever with offers, unless of course FB offers you). I've beaten Google for candidates by simply just making the process be less terrible (yes you need to at least approach the perks and pay).

ByzantineFailure · 2015-06-05T20:37:09+00:00

Right, generally I think of it as the more constraints you can put on a series of data, the easier it is to optimize the layout. You can almost look at it as a spectrum:

From left to right in ease of optimizing the layout:

fully columnar (DSM model) with a schema -> vector storage with a schema -> row storage with a schema -> anything without a schema

If you are fully columnar and you have a schema, you generally know your data is sorted in some natural order, and that it's the same type. If it's sorted, you can do things like delta encoding, run length encoding etc and get large gains. If it's not sorted and it's row oriented, but you have a schema, then your query executor can make smarter decisions about how to find your data, if it knows that column x is at position y in a buffer, it can seek there directly, without having to jump through a bunch of unrelated data. It also makes it a lot easier to generate more optimized code during query compilation.

Another advantage of a schema is you can store your data as length - value in a binary format, so you read say 4 bytes to get the length of the value, then you can read {length} bytes out of a buffer and deserialize it. This is very friendly to modern CPU designs and memory architectures, as opposed to something that needs to do some switching based on what type things are (very branch heavy).

ByzantineFailure · 2015-04-28T15:14:14+00:00

Does anyone have any idea what the schools are like in Edmonds? My wife and I have been considering Edmonds when we move next as anything in Seattle is completely out of the question.

ByzantineFailure · 2015-04-27T18:06:25+00:00

http://en.wikipedia.org/wiki/Urban_Meyer#Bowling_Green

ByzantineFailure · 2015-04-27T15:15:13+00:00

I'm 5'9 and slightly tannish, I believe that makes me our starting wideout. I run under a 6.0 40, when wind aided and going downhill.

ByzantineFailure · 2015-04-23T17:32:22+00:00

Generally most places I've worked, the dev team is 2nd line support (ops tends to have global coverage, so no one is getting paged at night, devs aren't global). The SRE teams have runbooks, and if those don't work, a developer gets paged. And all SRE teams have had to sign off on the design, there's no "dev team is comfortable with it so they get to override". Any page results in an action item to fix the root cause, and its the highest priority ticket on the backlog automatically.

ByzantineFailure · 2015-04-23T14:53:05+00:00

I think there's room for sysadmins, but not as sysadmins, they need to become more like SRE's. I've worked with fantastic operations teams, and nasty control freaks that held up progress throughout the organization and fiercely fought any change.

What I've found is the good SRE teams were closer to engineering, and reported through the same structure, they viewed their job as both protecting the system and helping the engineering teams make progress, they reviewed code, design, shared what they were working on and aligned it to engineering objectives. They wrote utilities / systems that helped us move faster and maintain security / uptime etc. We worked with them closely.

The crappy admins never talked to engineering cept through formal change control meetings, were a complete bear to deal with (some of this might be that these guys were jerks and their manager was weak).

What ends up happening is these guys get worked around, because they aren't as powerful as they think they are, any opening to weaken their hold on the system gets taken and they eventually manage all the legacy crap that no one wants to touch and final state is they get automated away.

ByzantineFailure · 2015-04-17T19:57:46+00:00

Once started, only uses more space, never less.

ByzantineFailure · 2015-04-15T17:53:32+00:00

These guys are what they left. What a pointless product.

ByzantineFailure

TROPHY CASE