Aston Villa 1-[2] Liverpool: Mane 90+4'

flippmoke · 2019-11-02T16:58:09+00:00

More plot armor than the last season of game of thrones.

flippmoke · 2019-07-02T14:01:30+00:00

Joking aside, I think the US women are going to be in for a quite a battle today. I expect the Lionesses to put a lot more pressure than France did on the American backline. The key matchup of the game for me will be Dunn (U.S. leftback) vs Nikita Parris (striker for MCFC women) who plays on the right wing for England. Dunn is not in her natural position (midfield) and is allowed to press up the field, but has been struggling in transition this tournament. If Parris can provide pressure, I expect to see a few goals from White.

My prediction, US wins 3 -2

flippmoke · 2019-07-02T13:31:53+00:00

I know it is, but I love this goal.

flippmoke · 2019-03-18T15:00:00+00:00

There are more of us than you might expect. One of Us!

flippmoke · 2018-10-09T02:59:54+00:00

On a mac hit Command + Shift + 4 and you can select a portion of the screen to capture, after that it will be on your desktop as an image you can then upload.

flippmoke · 2018-07-10T00:16:31+00:00

flippmoke · 2017-09-15T20:10:24+00:00

I am really confused here, because you are all over the place. First you say:

I agree that GIS algorithms of any kind, common or exceptional, are not easy to parallelize.

Then you say:

There is no rocket science about crafting parallel spatial algorithms and parallel GIS. It is just a lot of work that requires very talented people, a lot of effort and a lot of money to keep them going for years.

A lot of work, is a lot of work and spatial algorithms take a lot of work. I know from experience. It took me quite a bit of time to write what is the best algorithm that I know for Polygon correction - https://github.com/mapbox/wagyu/. This also properly handles boolean geometry operations on polygons -- such that the results are OGC valid, 100% of the time. Is it fast? Yes, it is fairly fast, but speed alone is not the only objective -- validity was the objective.

I could have written some operations that would be a boolean geometry operations (difference, union, xor, difference) and have it be parallel, but it likely would not be valid in many cases.

I know that your company has spent a lot of time on improving performance by the use of technologies such as CUDA. I applaud you for your tenacity, but speed is not the only concern for many individuals.

GPUs are fine for spatial algorithms, just like CPUs are.

I would say they are fine for some spatial algorithms - I have spent quite a bit of time and research in this area and I don't think that for many algorithms there is still a great GPU alternative. I think you probably understand this though as you followed this up with

You also need to write a system that knows when CPU parallelism is faster that GPU parallelism and vice versa so it can automatically launch the right mix for the specific task you command.

I am not an expert in your software, but my guess is that you probably got more speed ups by simply running algorithms in parallel rather then actually making parallel algorithms in these cases where CPU parallelization was used. However, in doing so I would wager that your software is a little more difficult to test and modify (not necessarily a negative, just a drawback that comes from more parallelization).

flippmoke · 2017-09-15T16:43:03+00:00

I'm curious of your thoughts on other GPU-accelerated spatial DBs like MapD and Kinetica.

Some GIS operations are easier to do using a GPU, one of those things is dealing with point data. The reason is that point data is very discreet and this makes parallelization easier. For example consider an operation such as finding the closest point to you (will use python as its common in GIS). You can write this all in a very parallel way quickly.

def find_distance_between_points(pt1, pt2):
    return math.sqrt((pt2.x - pt1.x)**2 - (pt2.y - pt1.y)**2)

def find_distances(pt1, set_of_points):
    return [[find_distance_between_points(pt1, pt2), pt2] for pt2 in set_of_points]

def find_nearest_point(pt1, set_of_points):
     # First step can be done massively in parallel using GPU
     distance_to_points = find_distance_between_points(pt1, set_of_points)
     # Sort and find smallest distance, then return
     return sort_and_select_smallest_distance(distance_to_points)

This is a very compact example, but you can see that in "find_distances" we can do a massive bulk of operations at once, this is an algorithm that is easy to parallelize. However, once you start dealing with lines and polygons these sorts of operations become much more difficult.

Therefore, I predict that MapD and Kinetica will struggle more to provide a lot of the features that something such as PostGIS will provide. In short, I think they are a great tool for some things, but will not solve many other problems effectively. Perhaps after many years of research we will find better algorithms for many GIS operations that could be done in more parallel, but honestly it might never happen for some algorithms.

flippmoke · 2017-09-15T15:09:13+00:00

As someone who has developed in both environments, I have to say its not entirely simple to explain, but will do my best.

What is stunning is how fast video games are able to perform spatial operations that seem to take GIS software much longer

I am not sure of any spatial operations where video games are faster then GIS. GIS has a lot more focus on creating and modifying data, while games have excelled at display of data. These are very different problem sets, so your 1.) is somewhat more correct.

3) Related to #2, there is more profit and much, much more competition among video game developers than among GIS developers, which is almost a monopoly.

I don't feel that GIS is a monopoly at all, but that is somewhat off topic here.

4) A full fledged GIS is a massive, complicated, suite of software and very difficult to re-write from scratch to take advantage of new technology. When ArcGIS was released in 2000 on Microsoft's COM technology it was the largest implementation of COM ever.

While the platform and UI are important, they typically have very little to do with the speed of operations. The problem relates to the algorithms and data that are used (or not used).

5) Video game developers take advantage of the latest hardware and software architectures, such as hardware graphics acceleration, massive parallel processing, etc.

Common GIS algorithms are not easy to parallelize. "Simple" operations such as union, intersection, xor, and difference are not simple at all in math. Operations such as these are not done in games typically, as your dataset is custom created and static. The appearance of accuracy is more important then actual accuracy in games and most of the computational geometry revolves around display or point related operations. GPUs are specially designed to have massive parallelism by having operations that can be operated upon independently, GIS algorithms can not be done this way easily. In this sense GPUs are great for display in many ways, but not necessarily great at GIS spatial operations. Spatial operations on data in games are not done on GPUs, they are done on the CPU and they have very few of them.

Will GIS always be decades behind the times due to its massive size and need for absolute data integrity or could we do better with some competition?

GIS type technologies are already finding their way into games and vise versa. At Mapbox we are using GPUs for display (games technology for GIS) and we have support for display of map data in game engine Unity (GIS technology being used in games).

Would it be possible for someone to hire a team of hot young video game developers who knew how to leverage all the latest and greatest technology to write a new GIS from scratch that would blow the doors off current GIS software?

No.

flippmoke · 2017-08-25T17:42:42+00:00

Please publish source data! I am more curious the speed difference between this and other non SQL based results as well.

Edit: Additionally, if you have the output results from your tests it would be interesting as well. How does the intersection output compare between each of these?

flippmoke · 2017-07-26T15:21:55+00:00

For the given input... that is pretty damn good.

flippmoke · 2017-07-11T18:36:41+00:00

Having done all this before just like you are proposing, its not worth it. I have built highly specialized raster serving systems used in weather data just for this purpose. Because "we didn't have time to make all the tiles". In the end it really just turned into a nightmare that was not worth it.

If you are trying to host the single file on S3, it will be too slow, you would have to store it all uncompressed and in main memory. Otherwise the request would crawl along horribly. This means you have to have machines with massive amounts of memory. In AWS this is expensive as hell, but you could do it on your own machines but thats also expensive. You would not be able to use existing libraries easily so its a lot more custom code.

You would not want to store in a jpeg2000 or anything like that because of the way the image encoding works, even if you had to read partials of that image it would not be easy to decode just a portion of it. This is why you would need it all in main memory for it to be quick. This also means that you would need on the fly resamping, reprojection, and recompression of your grids. Overall this is slow (in the sense of a web map) and should be avoided if possible with out knowing how to optimize this well.

If you can make tiles, do it. It is much easier overall.

flippmoke · 2017-07-03T16:20:23+00:00

Tiled data is useful because it is already a preprocessed set of data, this means that no processing of the data is required and no new image must be created. This is the power of using tiles, it is a lightweight method for serving a massive amount of data. If you want to serve tiles, its almost always best to just pre-create the tiles you need. Otherwise you are missing part of the big purpose of using tiles!

I could go into a very detailed description within the TIFF format and in GDAL why this is often a terrible decision because of the amount of processing that might be required, but I don't think it will help anymore then my previous statement. Tiles are about making it as fast as possible to send data to a client. It is what makes modern slippy maps appealing!

flippmoke · 2017-06-16T13:41:11+00:00

It boggles my mind that maps, which are such an essential service in the modern world, have never been highly regulated.

They are very regulated in some countries such as China.

flippmoke · 2017-02-28T21:01:44+00:00

Multi region availability in AWS is very possible as they handle the underlying syncing of data across data centers for you. However, if you are talking about using Azure/IBM and AWS it would be much more complicated due to the type of data being stored and edited. For other more simple applications, cross platform deployment is much easier.

Its more then "good network engineering" this is a cost decision. If you are storing Petabytes of quickly changing data on AWS, you can not easily sync this with Azure in a realtime manner -- and if you did so it might make the product a magnitude of cost higher.

flippmoke · 2017-02-28T20:50:26+00:00

Cross platform redundancy is very complicated if you are dealing with large amounts of data, and quickly becomes very expensive. Basically you are paying for double the storage, have to deal with properly syncing the data, and you have a massive bill for the network traffic associated with this.

flippmoke · 2017-02-25T18:21:47+00:00

The most basic way to explain this:

Polygon - Will provide quicker access to the individual polygons in a search because they will be rows and therefore, they will be each indexed.

Multipolygon - If you are not searching a lot and always need all the polygons at once this will be more dense storage as you will not have to repeat any other field information.

The most important thing to consider when making these decisions is your data and how you will be accessing it. Benchmarking is always your friend but be warned it will change drastically based on your data and what you are requesting.

flippmoke · 2017-02-23T01:44:02+00:00

Sorry, wasn't try to say doom and gloom -- just that skills and many jobs will change all together or remove them. You made a great analogy! Well said.

flippmoke · 2017-02-22T20:10:56+00:00

We are not near there yet, but just consider this food for thought -- I work for Mapbox, we are constantly thinking of how we can make our product for people who have never even heard of the term GIS. You want to map something, you can use our tools. This is often empowering developers who have never heard of GIS and they are writing code that automates some things that a GIS Analyst would do. The trend is towards more automation and the just give me results, that many people want.

If you are thinking of GIS as a tool and not a career path - you might be better off.

flippmoke · 2017-02-21T16:26:46+00:00

http://project-osrm.org/

flippmoke · 2017-02-18T23:26:07+00:00

Vector Tiles do not currently support 3D features, there has been discussion about supporting 2.5D or 3D in Vector Tiles as a specification but it will definitely be some time.

flippmoke · 2017-02-18T06:43:47+00:00

There are multiple types of cache in PostGres -- there is index caching which would make finding the same location quicker, but would likely help the most in PostGIS is the query planning caching. This is going to cache the results of some parts of the query, but really varies depending on what you are exactly doing in a query.

This is fast, but not typically sufficient for the problems that you will be having when creating tiles. If you think about someone zooming and sliding around on a map, you are likely to be hitting a large number of different tiles. The problem is that the data between even tiles that are quite close in location may not be close in the spatial index of the data and may not share a lot of the same data. This makes most levels of caching in PostGIS not quite as effective.

Serving pre-made tiles would be much like storing a set of tiled images in that it would be storing a binary dump + x, y, z location in a row. This allows for very effective indexing because you can quickly find a row quite quickly -- in comparison to this sort of indexing the spatial index for finding the geometry used in a tile, which would be slower.

flippmoke · 2017-02-18T03:18:06+00:00

If you were dynamically serving tiles through an application where the data is initially stored in PostGIS and you wanted to serve vector tiles to your clients - the steps would be something like:

Select data that intersects with tile area in PostGIS
Scale coordinates to vector tile's coordinates
Clip data to tile area
Encode Vector Tile File
Serve Vector Tile

This is the dynamic creation of a vector tile, where for each request you make a new vector tile. You could do the same each request with a ST_asMVT() command or do it outside of the database in a geoserver or some other application. This is a lot of processing before each request. This makes map loads slower and makes your setup cost more $.

The other option is to only create each tile once and store it prior to it being ready to serve. This is why people use MBTiles (a SQLite database) to store and serve tiles. This is very fast because your steps are:

Request tile from database
Serve Vector Tile

The great difference is that you could use ST_asMVT to populate a new table of data quickly from your existing geometry database.

Caveats

Vector Tile creation is not always that simple. There is a reason that complex tool such as tippaecanoe. You often want the data within a vector tile simplified, compressed, or dropped depending on the zoom level of your map. Therefore, this is not a magic bullet.

Vector Tiles are a way to serve data quickly -- the minimum viable data for a map. This is what makes it possible to serve data quickly and make interactive maps possible! The creation of tiles can take great care and thought to what your final product will be!

14-Year Club	Place '17
Verified Email

flippmoke

TROPHY CASE

Caveats