Vision language model; Multimodal dataset; OpenStreetMap; Google earth engine; Large language models