I've had to build out some version of a geospatial vector embedding / latent var...

fnordpiglet · on Sept 17, 2023

Can you explain what I’m looking at? I don’t know how to interpret the hex tiles :-)

ckrapu · on Sept 17, 2023

Sure! The basic idea is that each hexagon is a discrete unit of space for which I obtain a vector embedding. This vector is supposed to represent a sort of data-based summary of that location, obtained in this case using deep learning.

When you put the search on a hex, it looks up the vector for that hex and then performs a similarity search on all other vectors within the circle and shows the ones which are most similar in terms of land cover. The dependence on land cover / land use data is just because that was easy to get.

As other folks have pointed out here, raw satellite imagery is also a potential input source for this. I'm playing around with other sources and really want to integrate something like GeoVex (https://openreview.net/forum?id=7bvWopYY1H) into the embeddings as well.

jebarker · on Sept 18, 2023

Would it provide useful/interesting results if the similarity search was global? E.g. find me neighborhoods in London most similar to this one in Chicago?

wyldfire · on Sept 17, 2023

I'm pretty sure I'm not the intended audience but I also have no idea what this is used for. Surveying? Real estate tycoons? Oil & gas exploration?

potatoman22 · on Sept 17, 2023

It's a way to encode land to make predictions of it. E.g. is the land arable, is it rural, how similar is it to X, etc. Embeddings help encode data in formats more usable by ML models.

lovasoa · on Sept 17, 2023

The question was: in what context do people need to answer a question like "which geographical points are close to X and similar to X"?

I don't understand who the target audience is and what this can be used for.

ckrapu · on Sept 17, 2023

The original idea came from something I saw at work - we needed a way to build generic feature sets representing something about real estate, but beyond the data we had on prices, floors, and other house-specific details.

potatoman22 · on Sept 19, 2023

My guess is this site is simply a way to explore the embeddings. People make similar data visualization tools for word embeddings, so that's what I assumed this was.

wyldfire · on Sept 17, 2023

Sure, I get that part -- but then how do people use the predictions?

foota · on Sept 17, 2023

The embeddings are used by algorithms, not people, generally. You could ask something like "what's the most similar place to X within Y", and it would using the embeddings (which cover a variety of facts) to calculate answer. An embedding is an N dimensional vector (where the dimensions may or may not be meaningful to us), and similarity can be implemented by looking at the similarity between vectors.

ckrapu · on Sept 17, 2023

Yup, and while the similarity search is perhaps the most visually appealing way to work with it, the real use (in my opinion) is in providing generic sets of geospatial features which are reusable across applications. I've built out versions of H3-referenced feature sets at each of the jobs I've had over the last 10 years.

tartakovsky · on Sept 17, 2023

Great question. A legend or brief description of the underlying logic / heuristic would be helpful.

breckenedge · on Sept 17, 2023

The heuristic is likely the result of an ML algorithm, so the underlying logic may not make much sense to us.

nonameiguess · on Sept 18, 2023

Looks like Copernicus updates yearly? I can't tell if they include elevation from the "technical" tab on their home page.

Having originally come from the world of geointelligence, let me tell you this is not an easy problem to solve. For rural land use, this is probably fairly reliable, but depending on the granularity of change detection you want, cities are often building new neighborhoods in the span of months, large construction projects finish, human movement happens more in the span of hours or even minutes, and that's just for land. If you want maritime tracking, you need nearly continuous updates. We managed to do it for the Navy, but the infrastructure required for this is immense, much of the sensor technology is classified and not even available for commercial use, and the resource requirements not remotely practical for a personal side project.

Of course, military intelligence is primarily trying to track the land use of other militaries, especially in active theaters of operations, and that changes even more frequently than regular places where people aren't constantly erecting and moving temporary headquarters, living under camouflage cover, and blowing up existing infrastructure.

I guess you're doing this for peacetime domestic real estate, like neighborhood X in city Y is similarity ranked against neighborhood U in city V? Are you incorporating pricing and demographic data or just land use? It seems to me like neighbors make the neighborhood, as much or more than qualities of the land. Along with things like usability of the sidewalks, responsiveness and level of disrepair of the roads, crime rates, level of visible homelessness, air quality, vehicular traffic congestion.

I don't want to shit on the approach too much. Usefulness is determined by the results you get, but given the heterogeneity of the data here, some of it ordinal, some of it nominal, discrete versus continuous, irreconciability of scaling and dimensional analysis, not necessarily coming from similar distributions if you tried to just z-score it all, I can think of ways using pure numerical voodoo to put them all into the same vector space, but the statistical validity of doing this is dubious at best.

spousty · on Sept 17, 2023

How did you generate the embeddings. The vectors are relatively small for all the embedding I have seen built from image and nlp models.

Which copernicus bands were you using? Did you augment the data with DEM info?

ckrapu · on Sept 17, 2023

The embeddings were obtained using a CNN triplet loss model (~10M parameters) on the Copernicus land cover data. I haven't used DEM data yet but I have done generative modeling on DEMs in other work and would like to do that too:

https://www.linkedin.com/in/christopher-krapu/overlay/157690...