Geodata Classification

Geodata Classification#

Spatial data classification in GIS involves categorising geographic information into distinct groups or classes based on shared characteristics or attributes. Each class can be assigned a distinct symbol or colour. This process enhances the organization and interpretation of spatial data.

The attributes of geospatial data are stored in a specific column within the attribute table. Essentially, we choose a column containing the specific characteristics of interest, allowing QGIS to group the data based on these selected attributes.

../../_images/classification_basic.drawio.png — Fig. 58 Basic classification. Source: HeiGIT#

Nominal, Ordinal and Metric scales#

In geographic data classification, nominal, ordinal, and metric scales are used to categorize and measure spatial features based on different levels of precision and hierarchy:

The nominal scale (categorical data) is the simplest form of measurement where features are grouped into distinct categories based on qualitative attributes. These categories don’t have any inherent order or ranking. They don’t have any numerical significance: Values or labels are just names or identifiers (in some cases landcover classes can be identified with numbers).
- Examples: Land cover classes, vegetation types, soil types, amenity type (hospital, church, school, etc.)
Fig. 59 Examples for nominal data and their representation (Source: Dickmann (2018) Kartographie, Westermann)#
The ordinal scale (ranked data) involves categorizing data, but in this case, the categories have a meaningful order or rank. However, the intervals between the ranks are not necessarily equal or known. Rank order is important: Features can be ranked or ordered from lowest to highest, but the actual difference between ranks isn’t measured. You can compare and rank data (i.e., which feature is ranked higher or lower).
- Examples: Land suitability, hierarchical road network, population size classes, vulnerability classes (e.g. for administrative units)
Fig. 60 Examples for ordinal data and their representation (Source: Dickmann (2018) Kartographie, Westermann)#
The metric scale (quantitative data) deals with data that have both order and exact differences between values. Data is represented with precise numerical values and the differences between values are consistent across the scale. Metric data can be further divided into:
- Interval scale: Numerical data where the difference between values is meaningful, but there is no true zero point (e.g., temperature in Celsius).
- Ratio scale: Numerical data where both differences and ratios are meaningful, and there is a true zero point (e.g., distance, area).
- Examples: Elevation data, distance, area, population data
Fig. 61 Examples for metric data and their representation (Source: Dickmann (2018) Kartographie, Westermann)#

Depending on the type of scale, you will use different methods of classification. Below, we will go over the different types of classification that are available in QGIS, and when to use them for which data. We will also go over some scenarios you might come across in your GIS career.

Single symbol classification#

By default, QGIS visualises all layers in the Single symbol setting. This means all the features of a layer are visualised the same. In this setting, you can change many parameters like colour or opacity but you can not classify the data into multiple groups!

Single Symbol classification is useful when you have a simple dataset. For example, you load a polygon layer with the administrative boundaries of a region, and a point layer with the major cities. In this case, you can choose single symbol classification and adjust the symbol for the each layer.

Categorised classification#

Categorised classification in QGIS groups spatial data into distinct categories based on specific attributes.

This classification organises the features into categories based specific values in the attribute table. By specifying a symbol to each category, you can facilitate the interpretation of geospatial information on your map for clearer insights.

../../_images/fr_simple_classification_example_map.png — Fig. 63 Niger – Régions de Tillabéri et de Tahoua Education: infrastructures scolaires fermées ou endommagées pour cause d’insécurité entre 2019 et 2022 (Source: REACH)#

In the map above, the main roads have been assigned a single symbol. The schools have been classified into two categories: primary school (fr.: école primaire) and secondary school (fr.: école secondaire).

Categorised classification is usually used for nominal and ordinal scaled data.

Data Scale	Definition	Example	Typical Data Format
Nominal Scale	Categories without inherent order or ranking	Land cover types, districts, livelihood zones	Text (“Desert”) or Integer (5)
Ordinal Scale	Categories with a meaningful order or ranking	Ranks (e.g., low, medium)	Text (“high”) or Integer (5)

../../_images/Categorized_district_map_SierraLeone.png — Fig. 64 Categorised classification.#

Graduated classification#

Graduated classification in GIS involves categorising spatial data into classes or ranges based on a progression of values. This method is particularly useful for visualising quantitative data, allowing for the differentiation of intensity, density, or magnitude across a spectrum, facilitating a nuanced representation of geographic phenomena.

../../_images/example_classification_hexagons.png — Fig. 65 REACH, Ukraine, IDP Collective Site Monitoring Map, Actives, July 2024 (Source: Reach)#

In Fig. 65, each hexagonal cell holds a value for “Number of sites per 150km²” ranging from 1 to 91. The cells have been organised into 5 categories, making it easy to distinguish between the different values in each cell. By keeping the amount of classes to a minimum, the reader can read and understand the map quicker.

Graduated classification is usually used for quantitative data that is interval or ratio scaled.

Data Scale	Definition	Example	Typical Data Format
Interval Scale	Equal intervals between values, no true zero point	Temperature (Celsius)	Float (44.5 Degree)
Ratio Scale	Equal intervals with a true zero point	Population, Length, Number of trees	Integer (5 Trees) or Float (12.5 km of Road)

To classify quantitative data there are many methods how to set up the classes. There is no single best way to select a method or to decide how many classes you like to use. It all comes down to what you want to show.

Tip

A good range for number of classes is 3 to 7. Do not use more than 9 classes, as it becomes difficult to distinguish between the classes, making the map harder to understand.

Take the example below. You see a histogram of the district population. That means we have a dataset with districts and how many people living in each district. Just based on the histogram we can make a few general statements.

There are no districts with no or zero population
There are just a few districts with very low population
It seems that there are three general groups of districts

../../_images/Histogramm_example.drawio.svg — Fig. 66 Histogramm of population data. Source: Axis Maps#

However, if we want to show which districts have a higher population than others on a map, we need to classify the districts.

There are seven ways in QGIS to split quantitative data into classes. The four most important ones are: Equal intervals, Quantile, Natural breaks, Manual. Let’s have a look at how the classes of the district population would look like if we split the data into three classes using these methods.

../../_images/classification_method_map.drawio.svg — Fig. 67 Different classifications. Source: HeiGIT (adapted from Axis Maps)#

Equal Interval

Equal Interval classification divides data into uniform size classes, such as 0-10, 10-20, 20-30, and so on. It is effective for evenly distributed data across the entire range. However, caution is advised when data is skewed or has significant outliers, as this may result in empty classes. The population data used here, lacking large outliers, is suitable for Equal Interval classification.

Equal Count (Quantile)

Quantiles classification ensures an equal number of observations in each class, creating visually appealing maps. However, it may result in classes with significantly different numerical ranges, and in some cases, similar rates may be separated while different rates are grouped together. It’s advisable to use a histogram to assess potential issues. In the district population example, the quantile classification produced a questionable break, combining a portion of the third cluster with class 2 despite its closer numerical proximity to other observations in class 3.

Natural Breaks

Natural Breaks is an optimal classification method that aims to minimize within-class variance and maximize between-class differences for a given number of classes. However, it produces a unique classification solution for each dataset, which can be a drawback when making comparisons across maps, such as in a series or atlas. In such cases, a consistent classification scheme applied across all maps might be preferable.

Manual

Manual classification allows users to set one or all of the class breaks based on specific needs. This approach is useful when certain break points need to be predetermined, such as aligning with the mean or maintaining consistency across a series of maps. Manual classification is recommended when other methods provide a good solution but may benefit from slight adjustments to better suit specific requirements or visualisation methods.

Logarithmic scale

Logarithmic scale classification is employed when the data spans multiple orders of magnitude, and a linear scale may not effectively represent the variations. This scale applies logarithmic transformation to the data, compressing larger values while expanding smaller ones. It is useful for visualising data with exponential growth or decay. However, interpreting values on a logarithmic scale may require a nuanced understanding. Consider using a logarithmic scale when there is a wide range of values, and a linear scale may obscure important patterns or trends.

Pretty Breaks

Pretty Breaks is a classification method designed to create visually appealing and easily interpretable maps. This approach seeks to generate class breaks that align with “round” numbers, making the map more intuitive for viewers. Pretty Breaks is particularly useful when communicating complex spatial data to a broad audience, as it enhances the clarity and understandability of the map. Keep in mind that the choice of ‘pretty’ breaks may depend on the specific context and the preferences of the intended audience.

Standard Deviation

Standard Deviation classification is a method that determines class breaks based on the standard deviation of the data values. This approach organizes data into classes by considering the distribution of values around the mean. Each class represents a certain number of standard deviations from the mean, providing a statistical basis for categorising data. Standard Deviation classification is effective when wanting to highlight variability within the dataset. However, it’s important to consider the nature of the data distribution and whether this method aligns with the analytical goals of the map.