Abstract
Automated methods for building function classification are essential due to restricted access to official building use data. Existing approaches utilize traditional Natural Language Processing (NLP) techniques to analyze textual data representing human activities, but they struggle with the ambiguity of semantic contexts. In contrast, Large Language Models (LLMs) excel at capturing the broader context of language. This study presents a method that uses LLMs to interpret OpenStreetMap (OSM) tags, combining them with physical and spatial metrics to classify urban building functions. We employed an XGBoost model trained on 32 features from six city datasets to classify urban building functions, demonstrating varying F1 scores from 67.80% in Madrid to 91.59% in Liberec. Integrating LLM embeddings enhanced the model's performance by an average of 12.5% across all cities compared to models using only physical and spatial metrics. Moreover, integrating LLM embeddings improved the model's performance by 6.2% over models that incorporate OSM tags as one-hot encodings, and when predicting based solely on OSM tags, the LLM approach outperforms traditional NLP methods in 5 out of 6 cities. These results suggest that deep contextual understanding, as captured by LLM embeddings more effectively than traditional NLP approaches, is beneficial for classification. Finally, a Pearson correlation coefficient of approximately -0.858 between population density and F1-scores suggests that denser areas present greater classification challenges. Moving forward, we recommend investigation into discrepancies in model performance across and within cities, aiming to identify generalized models.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
According to United Nations projections, the global trend of urbanization is accelerating, and by 2050, 68% of the world’s population is expected to live in urban areas (United Nations 2018). This shift increases the demand for housing, transportation, and public services, putting pressure on social and economic systems (Glaeser 2011). Identifying the functions of urban buildings provides insights into space utilization, enabling planners to optimize infrastructure, allocate resources efficiently, and anticipate future needs, thereby supporting strategic decisions for sustainable urban development (Lin et al. 2021; Platt 2014).
Building function classification involves assigning buildings to specific semantic categories, such as residential, commercial, administrative and public facilities, and educational services (Du et al. 2015). The availability of this data is typically limited, even in official records, and when it does exist, it is rarely accessible to the public (Xu et al. 2022). Additionally, building function is primarily determined through on-site surveys conducted by government agencies, which are labor-intensive and prone to subjective interpretations. Furthermore, the data is often aggregated by block rather than provided for each individual building, limiting its applicability (Hecht et al. 2015). To address these challenges, researchers have developed methods to automatically classify building functions.
The classification of building functions has traditionally relied on analyzing their physical (such as area and perimeter), morphological (shape complexity metrics such as squareness and compactness), and spatial characteristics (neighborhood related metrics such as adjacency and shared walls ratio) derived from Remote Sensing Images (RSIs), Street-View Images (SVIs), and vector-building data. These features are processed using a combination of expert-defined rules and Machine Learning (ML) models, which have been well-documented in recent studies (Du et al. 2015; Hoffmann et al. 2023; Kong et al. 2024; Steiniger et al. 2008). Additionally, buildings have also been categorized based on the density and categories of nearby Points of Interest (POIs) which reflect human activity patterns (Lin et al. 2021; Liu et al. 2018; Zhang X et al. 2023).
For cases where POI categories do not readily align with building functions, or when the textual data involves complex human activities such as interactions on social media, researchers have recently used Natural Language Processing (NLP) techniques. Existing studies have employed text similarity, topic modeling and most recently, word embedding methods (Chen et al. 2020; Häberle et al. 2022). While word embeddings improved building function classification, they struggle with capturing deep semantic contexts and polysemous words. In the context of geotagged text analysis, they may struggle to accurately interpret the subtle meanings that arise from diverse human activities. For instance, the word 'bank' might be misclassified as a river bank instead of a financial institution. Transformer-based Large Language Models (LLMs) overcome these limitations by generating contextual embeddings that consider the broad context of words. Our study proposes a novel multi-source building classification approach that combines building features, spatial arrangement, and local POI density with contextual embeddings generated from OpenStreetMap (OSM) building tags using the BERT-based General Text Embeddings (GTE) model. The models are developed and applied across six cities in the United States and Europe, for the classification of residential, commercial, industrial, and institutional building functions. The contribution of feature types to model accuracy is explored through ablation analysis, and the GTE model's performance is compared to traditional methods like FastText for building tag interpretation.
The remainder of this paper is structured as follows: Section 2 provides an overview of related studies on building function classification and NLP techniques. Section 3 introduces the experimental data and study areas. Section 4 describes the proposed building function classification method. Section 5 presents the experimental results. Finally, section 6 presents a discussion and section 7 the conclusions.
Background
Classification of building functions based on physical, morphological, and spatial attributes
The function of a building determines its need for structural strength, energy efficiency, and visual design. These requirements are met through the choice of building materials and architectural styles, each characterized by distinct spectral, textural, and geometric properties (Du et al. 2015). To capture these characteristics, some studies have employed semantic segmentation methods on remote sensing images to delineate building outlines and extract relevant metrics. Notable features extracted include roof materials (Xie and Zhou 2017), as well as the size, shape, and orientation of buildings (Yan et al. 2019). For example, taller structures are often used for commercial or residential high-rises, while circular buildings frequently house public arenas or theaters (Bachman 2004). In some cases, researchers utilize GIS databases that already contain vector representations of building outlines, along with useful attributes such as construction materials. Researchers have subsequently classified buildings into different functions using machine learning and rule-based models based on these features (Arunplod et al. 2017; Lu et al. 2014; Memduhoglu and Basaraner 2024). Additionally, SVIs, panoramic images taken continuously along streets, provide a ground-level perspective of urban spaces, revealing objects that are not visible from the airborne perspective (Li et al. 2017). Convolutional Neural Network (CNN) models have been utilized primarily with Google SVIs to identify building properties and classify into categories (Kang et al. 2018; Srivastava et al. 2018; Taoufiq et al. 2020).
The spatial arrangement and morphological characteristics of buildings within urban environments also indicate their function, as these environments are often intentionally designed for specific uses (Yan et al. 2019). For instance, commercial districts typically feature buildings that are closely spaced, with wide frontages and ample parking areas to accommodate frequent public access, while residential areas might prioritize privacy with detached houses and landscaped yards that foster more secluded environments (Davis 2009). The spatial relationship between buildings is assessed using measures such as adjacency and shared wall ratios, which can indicate the density typical of residential or commercial environments (Fleischmann 2019; Hamaina et al. 2012). Inter-building distances are utilized to identify open spaces or less densely populated areas, which are often indicative of industrial or institutional functions (Caruso et al. 2017). Street alignment refers to the orientation of buildings in relation to road networks, highlighting how placement can enhance or inhibit accessibility and connectivity—important considerations in the location of commercial and institutional buildings (Fleischmann 2019). Morphological properties are analyzed by grouping buildings into divisions such as neighborhoods and blocks (Yan et al. 2019). This includes metrics related to the shape and layout of these units. For example, larger blocks might suggest a lower density of development, which is often associated with suburban or industrial areas where more space is available or required.
Textual data and building function classification
The classification approaches previously discussed focus on the physical attributes of buildings and their spatial organization. Studies have enriched these models by incorporating information about human activity around the buildings of interest, including data from POIs, social media, and floating car trajectories. Researchers have developed methods to quantify the density of activity and patterns of vehicle movement at different times, utilizing datasets such as smart card usage in public transportation, taxi GPS (Global Positioning System) trajectories, and geolocated social media messages. This activity data serves as a proxy to infer the functions of nearby buildings. For example, high activity levels and vehicle traffic in the morning are often indicative of commercial areas or office buildings, while similar patterns in the evening might suggest residential zones (Liu et al. 2018; Zhong et al. 2014).
More recently, the focus has shifted towards leveraging textual data from POIs to determine building functions. Atwal et al. (2022) utilized structured OSM building tags, encoding each building’s attributes as one-hot vectors, where each tag is represented as a binary indicator (0 or 1) based on its presence or absence, and used this data in a decision tree model to predict residential and non-residential building functions. Other studies have mapped categories of POIs to specific functions, categorizing areas densely populated with food-related establishments as commercial zones and those with entertainment outlets as leisure areas (Lin et al. 2021; Liu et al. 2018; Zhang X et al. 2023). In contrast, other studies have applied NLP techniques to interpret complex textual data that is not easily mappable to building functions. Chen et al. (2020) assigned building types to unclassified POIs by comparing their names to those already classified, using the Jaro-Winkler text similarity measurement, which assesses short strings and names by considering character-level differences and common prefixes. For POIs without similar names, they employed the topic modeling TF-IDF method, which quantifies a word's importance in a document based on its frequency and rarity across a corpus, to identify key terms and match them to the most relevant building types. Although this marks an advancement from direct mapping of POI class to building function, these foundational NLP techniques do not capture deep contexts or semantic meanings (Ramos 2003).
Häberle et al. (2022) analyzed geotagged Tweets, converting them into a machine-readable format using FastText embeddings. These embeddings transform words into machine-readable low-dimensional vectors within a multi-dimensional space, positioning similar elements closer together and dissimilar ones further apart. They then used these embeddings to train a neural network to classify the functions of buildings. Word embeddings like Word2Vec and FastText are preferable to simpler vector embeddings such as one-hot encoding, which are sparse and high-dimensional with no inherent semantic information. They also have improved text representation over traditional NLP methods by creating dense vector representations of words from their contexts, capturing semantic and syntactic similarities through surrounding words and large corpora. However, these models are limited by their use of a single static vector per word, which restricts their ability to represent polysemous words (Mikolov et al. 2013; Pennington et al. 2014). This limitation may reduce their effectiveness in analyzing geotagged text that reflects diverse human activities, as they cannot fully capture the dynamic and nuanced meanings that arise from varying contexts.
Transformer-based LLMs such as BERT (Devlin et al. 2018), OpenAI's GPT (Brown et al. 2020), and Llama-2 (Touvron et al. 2023) overcome the limitations of traditional word embeddings by generating contextual embeddings that consider the full context of words from both the preceding and following text in a sentence. This approach utilizes extensive datasets to capture the nuances of human language at both the word and sentence levels, offering deeper semantic representations and a richer understanding of language subtleties compared to earlier technologies. The benefits of LLM embeddings are extensive in processing unstructured data across various fields, including processing unstructured data for diverse applications, like predicting protein structures from genetic sequences (Sadeghi et al. 2024) and analyzing customer sentiment from reviews for marketing purposes (Zhou et al. 2023). Recent work on the application of LLMs in geography has illustrated their ability to effectively translate geospatial tasks into procedural operations and demonstrate some capabilities in spatial reasoning, although they face challenges in precise numerical reasoning and managing abstract spatial tasks (Fulman et al. 2024a;Fulman et al. 2024b; Li and Ning 2023; Mai et al. 2022; Roberts et al. 2023; Zhang Y et al. 2023). Leveraging these innovations, our study proposes a novel multi-source approach that integrates predictions based on the physical and morphological features, the spatial arrangement of buildings, and POI density with LLM embeddings to create vector representations of OSM building tags.
Study area and data
Study area
We selected six cities for building function classification: Fairfax, VA; Mecklenburg, NC; and Boulder, CO in the United States; Berlin, Germany; Madrid, Spain; and Liberec, Czechia in Europe (Fig. 1). Our analysis considers buildings within the boundaries of these cities. We selected these cities based on the availability of detailed building function data, their geographic and cultural diversity, and the completeness of their building datasets within OpenStreetMap.
Ground truth building functions
The administrations of Fairfax County (Fairfax County Government 2024), Mecklenburg County (Mecklenburg County Government 2024), and the City of Boulder (The City of Boulder Government 2024) provide GIS layers of buildings, including their functions. The datasets for Berlin, Madrid, and Liberec were gathered from EUBUCCO, which also provides detailed information about building functions (Milojevic-Dupont et al. 2023). Detailed information about these cities and their metrics can be found in Table 1.
Building height
World Settlement Footprint 3D (WSF3D) Building Height dataset was provided by the German Aerospace Center (DLR) (Esch et al. 2022). This dataset maps the average building height in every settlement worldwide using a 90-meter measurement grid.
Population density
The WorldPop dataset for 2010-2011, in GeoTIFF format, at a resolution of 3 arc-seconds (approximately 100 meters at the equator), was downloaded (Bondarenko et al. 2020).
OSM building tags
OSM is a geographic information platform that offers volunteered open-source geospatial data for the entire planet. OSM provides geometric information as well as semantic details about buildings, contributed by users and presented as tags in a key:value format. For example, a tag such as 'building:apartment' may indicate that the building serves a residential function. While OSM provides a wiki-style guide for using predefined keys and values for certain types of features, users have the flexibility to create and apply their own tags. OSM data generally exhibits heterogeneity in terms of spatial distribution, the information provided by the tags, and their completeness. We collected OSM buildings data using the Ohsome (2024) API provided by HeiGIT (2024), a non-profit organization specializing in humanitarian aid, smart mobility and climate change research. We selected only those OSM buildings that overlapped with buildings from the official administrative datasets. Overall, we gathered an average of 4 tags per building from the OSM building attributes.
POI data
In addition to building data, OSM contains data on POIs, such as restaurants, shops, amenities, and more. Although these POIs are often associated with specific buildings, they are distinctly represented as point geometries in OSM. We utilized a filter through the Ohsome API to selectively extract POIs that involve human activities. This selection includes a diverse range of practical amenities (e.g., cafes, banks, cinemas, restaurants), healthcare facilities, historic places, leisure places (e.g. ice rinks, amusement parks, swimming pools), and various types of shops. Note that in our analysis, we focus on their spatial distribution and do not use the tag data associated with these POIs.
Methods
The methodology for building classification involves two main steps (Fig. 2). First, 32 building metrics are calculated from various data sources. Second, a machine learning model is used to classify the building functions.
Calculating building metrics
In this paper, the terms 'metrics' and 'features' are used interchangeably. For each city, the building functions from administrative datasets were first transferred to the corresponding OSM datasets via a spatial join operation. These OSM datasets were then used to calculate all metrics for the ML model. The building metrics were categorized into four groups: Physical attributes, shape complexity, spatial relationships, and text embeddings.
Physical attributes
Building size metrics
Initially, height information was derived from the WSF3D raster file, which provides height information for buildings in global settlements. The WSF3D raster data was overlaid onto the buildings, and the mean height values were calculated for each building. Subsequently, five size (area, height, perimeter, volume, and longest axis length.) features pertaining to individual buildings were calculated. Size metrics of the individual buildings and their descriptions are detailed in Table 2.
Shape complexity features
The shape complexity features pertain to individual buildings and building blocks. Building blocks were delineated using the Momepy library, an open-source Python library designed for the quantitative analysis and visualization of urban form and morphology (Fleischmann 2019). Momepy creates Voronoi diagrams around buildings to delineate blocks. These diagrams are constrained by a distance threshold, in our case 100 meters around each building, and are further clipped to ensure that blocks do not extend across major roads or other barriers (Fig. 3).
Ten shape complexity (circular compactness, convexity, elongation, equivalent rectangular index (ERI), fractal dimension, orientation, rectangularity, roughness index, square compactness, and squareness) features pertaining to individual buildings and seven block-based metrics (block area, block perimeter, block convexity, block elongation, block ERI, block fractal dimension, and block squareness) were calculated using the formulas provided in Table 3. Some of these formulas were already implemented in the Momepy library, while others were manually implemented in Python. Block-based metrics were calculated using the same equations as those for buildings, but with block boundaries substituted for building footprints.
Spatial relationship metrics
The spatial relationships metrics regard buildings and their surrounding buildings and streets. This category also includes the population density around the building, which is an indicator of human activity and service demand. High-density areas might indicate residential zones or commercial hubs, while lower density areas could correspond to industrial or institutional regions (Luo et al. 2019). Finally, we applied the Gaussian Kernel Density Estimation (KDE) method to estimate density values of POIs across city boundaries. This non-parametric method transforms discrete POI data into a continuous density surface that reveals spatial patterns of amenities and infrastructure, revealing central places or hot spots within the urban environment (Lin et al. 2021; Miao et al. 2021). The KDE was calculated with a 10m resolution raster output for each city dataset (Fig. 4). A 10-meter resolution was chosen due to its balance between detail and computational efficiency. These KDE values were then intersected with building footprints to compute mean density values for each building. For spatial relationships, nine (adjacency, inter-building distance, neighbor distance, shared walls ratio, street alignment, block building count, block building density, block population density, and KDE mean) metrics were calculated. These metrics, along with their equations and descriptions, are detailed in Table 4.
Text embeddings
Although the OSM wiki provides tag usage guidelines, users often generate a diverse array of tags, deviating from these recommendations and complicating interpretation. Using LLM word embeddings, tags are converted to a machine-readable format that preserves their semantic and syntactic features by placing them within a multi-dimensional space, positioning similar elements closer together and dissimilar ones further apart (Zhou et al. 2023).
We utilized the GTE-large model to generate embeddings from OSM building tags (Fig. 5). Developed by Alibaba DAMO Academy, the GTE models are primarily based on the BERT framework and come in three different sizes: GTE-large, GTE-base, and GTE-small (Li et al 2023). These models are trained on an extensive corpus of relevant text pairs, encompassing a diverse array of domains and scenarios. This comprehensive training allows GTE models to be effectively utilized in various downstream tasks involving text embeddings, such as information retrieval, semantic textual similarity, and text reranking. The GTE-large model has a size of 0.67 GB, generates 1024-dimensional vectors for each input, and supports a sequence length of up to 512 tokens. A token can be any meaningful unit in a text sequence, such as a word, punctuation mark, or emoji. However, it primarily caters to English texts, and truncates lengthy text to a maximum of 512 tokens.
We chose GTE-large as at the time of writing, it was the leading model for text classification in benchmarks used to assess LLMs (Li et al. 2023). It consistently outperformed other models, including those significantly larger, on the Massive Text Embedding Benchmark (MTEB). This benchmark tests models across diverse tasks such as text retrieval, classification, and semantic similarity. GTE-large also excelled in zero-shot text classification on the SST-2 sentiment analysis task, which evaluates a model's ability to classify text without specific fine-tuning.
As a preliminary step, we cleaned up the OSM tags for building function classification, and we implemented a series of targeted filters to retain only the most pertinent data. This process involved iterating through each key-value pair in the OSM tags dictionary and selectively excluding entries that did not contribute meaningful information for classification purposes. Firstly, we excluded tags where the key contains the substring ‘wiki,’ as these tags often reference external sources or metadata, which are not directly relevant to functional classification. Similarly, tags with keys prefixed by ‘source’ were removed, since these typically indicate the origin of the information rather than providing intrinsic characteristics of the feature itself. We also filtered out tags with values of ‘yes’ or ‘no,’ as these binary indicators lack the specificity required for distinguishing between different building functions. Additionally, tags with purely numeric values—whether representing quantities, codes, house numbers, or other numerical data—were excluded, as they generally do not offer descriptive information that is useful for functional classification. Then, the remaining key:value pairs were tokenized to create embeddings using the GTE-Large model.
To assess the capabilities of LLM text embeddings, we compare model outputs using them directly with those using one-hot encoding, which lacks inherent semantic information (Mokhtarani 2021), and with NLP models like GloVe, Word2Vec, and FastText, which are limited in capturing deep contextual relationships.
Building metric completeness
To maximize the effectiveness of an ML model, we ensure that all values for each building are complete. We employed a K-nearest neighbor (KNN) imputer to fill in missing values by averaging the values from the four nearest neighboring buildings that have such values. For heights and volumes, which do not exhibit spatial dependence and hence should not be interpolated using KNN, zeros were used as fill values. Similarly, tag vectors were filled with zero vectors.
Building function classification
Function categories
Building functions were categorized into four main types for simplicity: residential, commercial, industrial, and institutional. The classes and related sub-types each of these categories includes are detailed in Table 5. To maintain simplicity, we excluded mixed-type buildings from our analysis, as their inclusion would have introduced additional functional complexities.
Each building’s 1024-dimensional vector embedding makes predicting building functions computationally expensive. To allow classification within reasonable computational resources, we employed eXtreme Gradient Boosting (XGBoost) for its speed and efficiency in handling large datasets and high-dimensional feature spaces (Chen and Guestrin 2016). XGBoost is an ensemble learning method that constructs an optimal predictive model from multiple decision trees using a gradient boosting framework. In this iterative learning process, the decision trees are grown sequentially. The first tree is trained and evaluated, and the incorrectly predicted data points are given more weight in subsequent iterations. This method enhances model performance by focusing on the errors of previous trees. We compare the performance and computation time of XGBoost to other models: Random Forest (Breiman 2001), Decision Tree (Quinlan 1986), KNN (Cover and Hart 1967) and Logistic Regression (Cox 1958), all of which are expected to produce lesser results.
The dataset was split into 80% for training and 20% for testing. Residential buildings dominate the data distribution, resulting in imbalanced data. Accuracy, while potentially higher, can be misleading in scenarios with imbalanced class distributions because it does not consider the disparities in class size (Branco et al. 2016; Galar et al. 2011). For this reason, macro average F1-score was used for performance evaluation. The F1-score is the harmonic mean of precision—which is the proportion of true positive results among all positive results—and recall, which is the proportion of true positive results among all relevant samples (Equation 1). This score was averaged across all classes to provide the macro average.
We employed the XGBoost algorithm with the default parameters: a learning rate of 0.3, a maximum tree depth of 6, and 100 estimators, with a focus on optimizing the multi-class logarithmic loss (mlogloss). This feature evaluates mode accuracy by penalizing the deviation from the class labels. The experiments were conducted in a Python (v3.10) environment using the macOS (v14.5) operating system with an Apple M1 chip. For NLP processing, Apple’s MPS device was utilized.
To analyze the impacts of each group of building features on classification accuracy, we conducted an ablation analysis. Ablation analysis is a systematic method of removing or "ablating" individual features or groups of features from a model to assess their impact on the model's performance and identify which components are most important for predictions (Girshick et al. 2014; Meyes et al. 2019).
Finally, we assessed the contribution of contextual understanding provided by text embeddings to model accuracy. We compared the performance of the model with GTE-Large embeddings to the use of one-hot encodings (direct tag approach), which lack contextual understanding, and to established NLP models such as GloVe, Word2Vec, and FastText, which offer limited contextual capabilities. The latter evaluation focused solely on the embeddings themselves, excluding physical and spatial attributes of the buildings.
Results
A total of 32 features, as detailed earlier in this paper, were prepared for training the XGBoost model. Despite some of these features being correlated with each other, the XGBoost model managed to extract valuable information from these correlations, enhancing its predictive capabilities. Although feature selection and optimization techniques were considered to further improve the model and reduce computational costs, they were not implemented due to minimal or negligible improvements. Ultimately, all features were used as input for the model and applied to six city datasets. The results are presented in Fig. 6. The model's performance reaches a low F1 score of 67.80% in Madrid and peaks at 91.59% in Liberec. These levels and the variation between them align with state-of-the-art benchmarks (Haberle et al. 2022; Atwal et al. 2022).
XGBoost demonstrated better accuracy and lower computation time compared to other models that were tested, across all cities in the study. Figure 7 presents the results from different models using their default parameters, applied to the full set of 32 features for the Fairfax dataset.
The ablation results for each city are shown in Fig. 8. Without text embeddings, Boulder, Mecklenburg, and Fairfax grouped together with F1 scores ranging from 72.44% to 84.82%. Conversely, Liberec, Madrid, and Berlin had lower scores between 52.59% and 59.49%. Incorporating text embeddings significantly improved classification accuracy across all cities; Fairfax saw the smallest increase at 3.86%, while Madrid the highest at 9.66%, excluding Liberec, which showed a dramatic improvement of 39%.
The results of the comparison of the direct tag approach to LLM-generated text embeddings are presented in Fig. 9. As expected, direct tags enhance results in all cases, adding between 1.8% in Fairfax and 8.6% in Madrid, with Liberec as an outlier at 32.2%. Yet language model embeddings provide additional improvements across all locations except for Boulder, where direct tags outperform language model embeddings by 0.9%. The range of improvement from language model embeddings varies from 6.8% in Liberec to a minimum of 1% in Berlin.
Comparing the performance of the GTE-Large embeddings with GloVe, Word2Vec, and FastText, the GTE-Large embeddings demonstrated superior performance across most evaluations, with an average improvement of 6.1% over the next best approach (Fig. 10). The improvement ranged from 0.9% in Berlin to 11.0% in Fairfax. The only exception was in Liberec, where the performance was comparable across all embedding models, including GTE-Large.
The observed relationships between F1-scores and population density across cities drew our attention. Consequently, we conducted a correlation analysis between these variables. The Pearson correlation coefficient between population density and F1-scores in the data is approximately -0.858, meaning that as population density increases, F1-scores tend to decrease.
Discussion
Studies in the field of urban planning and geographic information science have increasingly focused on introducing new types of metrics, such as spatial and text embeddings, and implementing advanced methodologies that fuse various data sources. To make significant qualitative advances, we should study the discrepancies in model performance between cities, and within the same cities, when utilizing different attributes. Through this analysis, it may become possible to identify innovative features and appropriate modeling approaches for generating more generalized models that will be applicable across various urban settings.
In our case, we observed dramatic variations in model performance across different cities, both with and without the inclusion of text embeddings in the models. In addition, text embeddings lead to exceptionally high accuracy improvements in one city compared to the others. These variations in model performance might stem from differences in tag quality and availability, but also from the spatial properties of building functions and urban layout. Our observation that cities with lower population density are easier to predict accurately than those with higher densities suggests a potential direction for investigation. A possible explanation could be that, in densely built-up cities, the physical composition is optimized for efficiency and thus does not reflect function as clearly as in less dense environments. A more thorough analysis is required to understand the underlying causes of these disparities.
The generalizability of future research could benefit from extending the analysis to include cities from more diverse regions. Our study focused on six cities in the United States, Germany, Spain, and Czechia, all of which feature well-structured urban planning. This focus may overlook the complexities and irregularities of more diverse urban morphologies. Expanding this research faces several challenges. First, OSM tags have poor coverage in certain locations, particularly in underdeveloped countries. Our approach can incorporate various forms of geotagged text, allowing for broader application beyond OSM tags. Haberle et al. (2022) demonstrated the effectiveness of using FastText embeddings for analyzing geotagged tweets in urban environments. Flickr images could also serve as a potentially useful source of geotagged text. Applying LLMs to such textual data, which is typically longer and more complex than OSM tags, may improve classification accuracy since LLMs excel at capturing deeper semantic contexts and managing polysemy compared to simpler models.
An additional challenge that arises in less developed regions, is that more tags may be in languages other than English, which are less represented in LLMs, especially in GTE-Large (Li et al. 2023). This could potentially lead to biases in processing linguistic data. To address this, future research could test more advanced LLM models as they become available. As LLMs evolve, a potential, though highly futuristic, approach could involve directly prompting an LLM to predict a building's function using its training data, based on physical attributes and OSM tags or other textual data provided in the prompt. However, at present, LLMs are not typically trained on specific physical metrics of buildings and their surroundings, which are essential for this task.
Conclusion
This study has demonstrated that integrating text embeddings with traditional spatial and physical metrics can significantly enhance the accuracy of building function classification. The application of LLMs to OpenStreetMap tags has particularly shown to be effective over traditional NLP approaches, revealing the potential of advanced NLP techniques in geographic information science and urban planning. Variations in model performance across different cities suggest that factors like population density and the physical characteristics of cities influence classification outcomes. These insights indicate that a deeper analysis of these factors could lead to more generalized models that are adaptable to various urban settings. Moving forward, continued exploration of multilingual capabilities and the integration of additional geotagged data sources are recommended to further enhance the accuracy of these classification systems.
Data availability
The data used in this study are publicly available and can be accessed through the links provided within the text. The code utilized for the analysis is available from the authors upon reasonable request.
References
Arunplod C, Nagai M, Honda K, Warnitchai P (2017) Classifying building occupancy using building laws and geospatial information: A case study in Bangkok. Int J Disaster Risk Reduct 24:419–427
Atwal KS, Anderson T, Pfoser D, Züfle A (2022) Predicting building types using OpenStreetMap. Sci Rep 12:19976
Bachman LR (2004) Integrated buildings: The systems basis of architecture. John Wiley & Sons, Hoboken
Basaraner M, Cetinkaya S (2017) Performance of shape indices and classification schemes for characterising perceptual shape complexity of building footprints in GIS. Int J Geogr Inf Sci 31:1952–1977. https://doi.org/10.1080/13658816.2017.1346257
Bondarenko M, Kerr D, Sorichetta A, Tatem A (2020) Census/projection-disaggregated gridded population datasets, adjusted to match the corresponding UNPD 2020 estimates, for 183 countries in 2020 using Built-Settlement Growth Model (BSGM) outputs . University of Southampton. https://eprints.soton.ac.uk/444005/
Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv 49:31. https://doi.org/10.1145/2907070
Breiman L (2001) Random forests. Mach Learn 45:5–32
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Amodei D (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Caruso G, Hilal M, Thomas I (2017) Measuring urban forms from inter-building distances: Combining MST graphs with a Local Index of Spatial Association. Landscape Urban Plan 163:80–89
Chen W, Zhou Y, Wu Q, Chen G, Huang X, Yu B (2020) Urban building type mapping using geospatial data: A case study of Beijing. China. Remote Sens 12:2805
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785-794
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Cox DR (1958) The regression analysis of binary sequences. J R Stat Soc Ser B (Methodol) 20:215–232
Davis H (2009) The commercial-residential building and local urban form. Urban Morphol 13:89
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dibble J, Prelorendjos A, Romice O, Zanella M, Strano E, Pagel M, Porta S (2019) On the origin of spaces: Morphometric foundations of urban form evolution. Environ Plan B Urban Anal City Sci 46:707–730. https://doi.org/10.1177/2399808317725075
Du S, Zhang F, Zhang X (2015) Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach. ISPRS J Photogramm Remote Sens 105:107–119
Esch T, Brzoska E, Dech S, Leutner B, Palacios-Lopez D, Metz-Marconcini A, Marconcini M, Roth A, Zeidler J (2022) World Settlement Footprint 3D - A first three-dimensional survey of the global building stock. Remote Sens Environ 270:112877. https://doi.org/10.1016/j.rse.2021.112877
Fairfax County Government (2024) Fairfax County Open Geospatial Data. https://www.fairfaxcounty.gov/maps/open-geospatial-data. Accessed 17 June 2024
Feliciotti A (2018) Resilience and urban design: A systems approach to the study of resilience in urban form: learning from the case of the Gorbals [Doctoral Thesis]. University of Strathclyde
Fleischmann M (2019) momepy: Urban Morphology Measuring Toolkit. J Open Source Softw 4:1807. https://doi.org/10.21105/joss.01807
Fulman N, Memduhoğlu A, Zipf A (2024) Distortions in Judged Spatial Relations in Large Language Models. In Press, Prof Geogr
Fulman N, Memduhoğlu A, Zipf A (2024a) Evidence for systematic bias in the spatial memory of large language models. In: Proceedings of the Second International Workshop on Geographic Information Extraction from Texts (GeoExT), pp 57-62.
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern C Appl Rev 42:463–484
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580-587
Glaeser E (2011) Triumph of the city: How urban spaces make us human. Pan Macmillan
Häberle M, Hoffmann EJ, Zhu XX (2022) Can linguistic features extracted from geo-referenced tweets help building function classification in remote sensing? ISPRS J Photogramm Remote Sens 188:255–268
Hamaina R, Leduc T, Moreau G (2012) Towards urban fabrics characterization based on buildings footprints. In: Gensel J, Josselin D, Vandenbroucke D (eds) Bridging the geographic information sciences. Lecture Notes in Geoinformation and Cartography. Springer, Berlin, Heidelberg, pp 317–336. https://doi.org/10.1007/978-3-642-29063-3_18
Hecht R, Meinel G, Buchroithner M (2015) Automatic identification of building types based on topographic databases–a comparison of different data sources. Int J Cartogr 1:18–31
HeiGIT (2024). Heidelberg Institute for Geoinformation Technology. https://www.heigit.org. Accessed 17 June 2024.
Hoffmann EJ, Abdulahhad K, Zhu XX (2023) Using social media images for building function classification. Cities 133:104107
Kang J, Körner M, Wang Y, Taubenböck H, Zhu XX (2018) Building instance classification using street view images. ISPRS J Photogramm Remote Sens 145:44–59
Kong B, Ai T, Zou X, Yan X, Yang M (2024) A graph-based neural network approach to integrate multi-source data for urban building function classification. Comput Environ Urban Syst 110:102094
Li X, Ratti C, Seiferling I (2017) Mapping urban landscapes along streets using Google Street View. In: Advances in Cartography and GIScience: Selections from the International Cartographic Conference 2017, vol 28. Springer International Publishing, pp 341-356
Li Z, Zhang X, Zhang Y, Long D, Xie P, Zhang M (2023) Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281
Li Z, Ning H (2023) Autonomous GIS: the next-generation AI-powered GIS. Int J Digit Earth 16:4668–4686
Lin A, Sun X, Wu H, Luo W, Wang D, Zhong D, Zhu J (2021) Identifying urban building function by integrating remote sensing imagery and POI data. IEEE J Sel Top Appl Earth Observ Remote Sens 14:8864–8875
Liu X, Niu N, Liu X, Jin H, Ou J, Jiao L, Liu Y (2018) Characterizing mixed-use buildings based on multi-source big data. Int J Geogr Inf Sci 32:738–756
Lu Z, Im J, Rhee J, Hodgson M (2014) Building type classification using spatial and landscape attributes derived from LiDAR remote sensing data. Landscape Urban Plan 130:134–148
Luo P, Zhang X, Cheng J, Sun Q (2019) Modeling population density using a new index derived from multi-sensor image data. Remote Sens 11:2620
Mai G, Cundy C, Choi K, Hu Y, Lao N, Ermon S (2022) Towards a foundation model for geospatial artificial intelligence (vision paper). In: Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pp 1-4
McGarigal K, Marks BJ (1995) FRAGSTATS: spatial pattern analysis program for quantifying landscape structure. U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station. https://doi.org/10.2737/pnw-gtr-351
Mecklenburg County Government (2024) Mecklenburg County Open Data. http://maps.co.mecklenburg.nc.us/openmapping/data.html. Accessed 17 June 2024
Memduhoglu A, Basaraner M (2024) Semantic enrichment of building functions through geospatial data integration and ontological inference. Environ Plan B Urban Anal City Sci 51:923–938
Meyes R, Lu M, de Puiseau CW, Meisen T (2019) Ablation studies in artificial neural networks. arXiv preprint arXiv:1901.08644
Miao R, Wang Y, Li S (2021) Analyzing urban spatial patterns and functional zones using sina Weibo POI data: A case study of Beijing. Sustainability 13:647
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Milojevic-Dupont N, Wagner F, Nachtigall F, Hu J, Brüser GB, Zumwald M, Biljecki F, Heeren N, Kaack LH, Pichler P-P, Creutzig F (2023) EUBUCCO v0.1: European building stock characteristics in a common and open database for 200+ million individual buildings. Sci Data 10:1. https://doi.org/10.1038/s41597-023-02040-2
Mokhtarani S (2021) Embeddings in Machine Learning: Everything You Need to Know | FeatureForm. https://www.featureform.com/post/the-definitive-guide-to-embeddings. Accessed 10 June 2024
Ohsome (2024) Ohsome OpenStreetMap History Analytics Platform. http://www.ohsome.org. Accessed 13 March 2024
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532-1543
Platt RH (2014) Land use and society. Island Press, Washington, DC
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Ramos J (2003) Using tf-idf to determine word relevance in document queries. In: Proceedings of the first instructional conference on machine learning, vol 242, no 1, pp 29-48
Roberts J, Lüddecke T, Das S, Han K, Albanie S (2023) GPT4GEO: How a Language Model Sees the World's Geography. arXiv preprint arXiv:2306.00020
Sadeghi S, Bui A, Forooghi A, Lu J, Ngom A (2024) Comparative analysis of LLaMA and ChatGPT embeddings for molecule embedding. arXiv preprint arXiv:2402.00024
Schirmer PM, Axhausen KW (2015) A multiscale classification of urban morphology. J Transp Land Use 9. https://doi.org/10.5198/jtlu.2015.667
Srivastava S, Lobry S, Tuia D, Munoz JV (2018) Land-use characterisation using Google Street View pictures and OpenStreetMap. In: Proceedings of the 21st AGILE conference, Lund, Sweden, 12-15 June 2018
Steiniger S, Lange T, Burghardt D, Weibel R (2008) An approach for the classification of urban building structures based on discriminant analysis techniques. Trans GIS 12:31–59
Taoufiq S, Nagy B, Benedek C (2020) Hierarchynet: Hierarchical CNN-based urban building classification. Remote Sens 12:3794
The City of Boulder Government (2024) The City of Boulder Open Data. https://open-data.bouldercolorado.gov. Accessed 17 June 2024
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Scialom T (2023) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288
United Nations (2018) World urbanization prospects 2018 (keyfacts). https://population.un.org/Wup/Publications/Files/WUP2018-KeyFacts.pdf. Accessed March 2023
Vanderhaegen S, Canters F (2017) Mapping urban form and function at city block level using spatial metrics. Landscape Urban Plan 167:399–409
Xie J, Zhou J (2017) Classification of urban building type from high spatial resolution remote sensing imagery using extended MRS and soft BP network. IEEE J Sel Top Appl Earth Observ Remote Sens 10:3515–3528
Xu Y, He Z, Xie X, Xie Z, Luo J, Xie H (2022) Building function classification in Nanjing, China, using deep learning. Trans GIS 26:2145–2165
Yan X, Ai T, Yang M, Yin H (2019) A graph convolutional neural network for classification of building patterns using spatial vector data. ISPRS J Photogramm Remote Sens 150:259–273
Zhang X, Liu X, Chen K, Guan F, Luo M, Huang H (2023) Inferring building function: A novel geo-aware neural network supporting building-level function classification. Sustain Cities Soc 89:104349
Zhang Y, Wei C, Wu S, He Z, Yu W (2023) Geogpt: Understanding and processing geospatial tasks through an autonomous GPT. arXiv preprint arXiv:2307.07930
Zhong C, Huang X, Arisona SM, Schmitt G, Batty M (2014) Inferring building functions from a probabilistic model using public transportation data. Comput Environ Urban Syst 48:124–137
Zhou W, Zhang C, Wu L, Shashidhar M (2023) ChatGPT and marketing: Analyzing public discourse in early Twitter posts. J Mark Anal 11:693–706
Acknowledgements
We extend our sincere gratitude to Dr. Nizar Polat for his valuable review of the manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL. A. Memduhoğlu was supported by the Scientific and Technological Research Council of Türkiye (TUBITAK) under the program 2219 (1059B192202917).
N. Fulman was supported by the Health + Life Science Alliance Heidelberg Mannheim and received state funds approved by the State Parliament of Baden-Württemberg.
Author information
Authors and Affiliations
Contributions
A.M. conceived the study and developed the methodology. N.F. contributed to the analysis and investigation. A.Z. contributed to data curation and methodology. All authors contributed to writing, reviewing, and editing the manuscript. A.M. supervised the process . All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Communicated by Hassan Babaie.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Memduhoğlu, A., Fulman, N. & Zipf, A. Enriching building function classification using Large Language Model embeddings of OpenStreetMap Tags. Earth Sci Inform (2024). https://doi.org/10.1007/s12145-024-01463-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12145-024-01463-8