Non-Spatial processing#
Introduction:#
Non-spatial geodataprocessing in QGIS refers to the manipulation, subsetting, and analysis of attribute data within a GIS environment without directly involving spatial components or information. It involves operations on the non-geometric attributes of geospatial datasets. This can include data cleaning, transformation, enrichment, and analysis based on the associated attribute information, such as population statistics, land use classifications, or economic indicators. Non-spatial geodataprocessing can be used to perform calculations, generate statistics, and gain insights into the non-spatial aspects of geospatial datasets. QGIS offers a variety of tools for non-spatial geodataprocessing to assist users in managing and analysing attribute data effectively.
Non-spatial joins#
A lot of analysis can be done with just a single layer. But, sometimes, the necessary information we need for our analysis is split across different datasets/layers. With QGIS, these layers can be combined to perform the analysis we want. The simplest way to combine layers is via an attribute join. This operation looks up information from a second data source based on a shared attribute value. This value functions as a common unique identifier, also known as an ID, UID, or key (see Fig. 158).
Humanitarian example:
A common GIS workflow in humanitarian work involving non-spatial joins is joining data on administrative boundaries using P-codes as the common identifier/shared attribute.
P-codes are identifying codes for administrative units (e.g. country (adm0), region (adm1), district (adm2)), that were introduced to simplify joining tabular data on administrative regions. These codes clearly identify the administrative units facilitating non-spatial joins.
For example: We have a spatial dataset containing the administrative boundaries of districts (adm2) in Nigeria and a data table containing the population per district, but without the polygons. By using the P-codes as identifying attribute, we can easily join the population data with the vector dataset.
Attention
An attribute join in QGIS only works properly, when the attributes match exactly.
For example: “S. Sudan” will not match with “South Sudan”.
Where possible it’s best to use attributes that have been designed for joining, such as P-codes or ID’s which are not susceptible to spelling mistakes.
Exercise: Performing a non-spatial join#
In this short follow along exercise, we will add the population data to the administrative boundaries layer (adm1).
Download the necessary layers here, unzip them, and add them to your QGIS-project.
Tip
The population layer needs to be added as a delimited text layer (Layer
> Add Layer
> ) with no geometry.
Open the “Join Attributes by Field Value”-tool from the processing toolbox
As the Input Layer 1, select the layer
nga_admbnda_adm1_osgof_20190417
, set the “Table Field” toADM1_PCODE
As Input Layer 2, select the layer
nga_adm1pop_2022
, set the “Table Field” toADM1_PCODE
. Additionally, under “Layer 2 fields to copy”, selectF_TL
,M_TL
, andT_TL
.Click
Run
. A new layer will appear in your layer panel called “Joined Layer”.
Open the attribute table for the new layer and scroll to the right. Here you will find the joined attributes
Great! We have successfully added the population data to our adminsitrative boundaries layer. Now, we can visualise the population distribution or continue to analyse our data.
Table functions#
Table functions usually only involve a single data layer. You can add new field, delete unwanted fields, or even calculate new field using the field calculator.
For a comprehensive overview on the attribute table’s functionality and its purpose, you’re invited to explore the Wiki article on it.
Add field#
The information within a vector layer can be accessed through its attribute table, and it can be enhanced by introducing new fields to this table. These additional fields may be derived from calculations, as exemplified in the following case, where population density is computed to provide deeper insights into spatial population distributions.
Attention
The selection of the appropriate data type should align with the information being added to the new attribute field. Please keep this in mind while watching the example video.
Possible data types:
The most common ones are:
Whole number: Integer (32 and 64 bit)
Decimal number (real)
Text (string)
Additional options:
Date and Date and time
Boolean
Example: Add a field for population density
Delete field#
It is also possible to delete fields from the attribute table. A commonly used practice is to remove all unused or unnecessary fields from a layer before starting to work on it. This makes the dataset much more organised.
Example: Delete all unused/unnecessary fields from a vector layer
Calculate field#
An important practice is to calculate the attribute values for a field, e.g., based on the values of other fields. In QGIS, you can create a new field or update an existing field.
Note
It is necessary to check if the data type of the field (new or updated) and your calculation match. For example, if you are calculating a ratio (e.g. density), the field should not be of type integer but rather of type decimal number.
An example could be to calculate the population density based on the already existing fields Population and Area.
A very important tool for such calculations is the Field Calculator. It allows you to perform calculations based on existing attribute values or defined functions, for example, to calculate the length or area of a geometry feature or in the given example, could be used to calculate the population density based on the already existing fields Population and Area. The results of these calculations can be written into a new field or update an existing field.
The most important groups and their respective functionality that are provided with the field calculator are listed below:
Fields and Values
Contains a list of fields from the layer
Geometry
Calculates the area of a polygon feature:
$area
Calculates the length of a line feature:
$length
Calculates the centroid of a polygon feature:
centroid($geometry)
Calculates the bounding box of a feature:
bounds($geometry)
Calculates the distance between two points:
distance(point_a, point_b)
Maths
Calculates the square root of a field:
sqrt("field")
Calculate
min
andmax
Example: Calculate the population density
Basic statistics for fields#
The tool Basic statistics for fields generates statistics for a specific field of the attribute table of a vector layer. The results are generated as an HTML file and can be accessed by using the file path link in the Results Viewer. This operation is highly valuable for gaining a comprehensive understanding of the data you intend to work with. It allows you to determine the range of values, pinpoint the minimum and maximum values. In the provided example, this operation is applied to calculate the global population density, allowing you to easily identify the most densely populated region worldwide.
Example: Calculate statistics for the field population density for countries worldwide.
Statistics by categories#
To calculate statistics of a field depending on a parent class you can use the tool Statistics by categories. The parent class is a combination of values from other fields.
Questions that need to be considered when doing these calculations:
For which fields should the statistics be calculated in the attribute table?
Which field in the attribute table contains which information?
For greater precision in these calculations, “statistics by categories” offers more comprehensive insights than those mentioned earlier. In this case, it becomes simple to determine the number of cities per country with over 300,000 inhabitants and, for each country, the population living in the largest urban agglomeration.
Example: Cities with more than 300,000 inhabitants and the amount of population in the largest agglomerations
Non-spatial queries#
In GIS, you can query (filter) data based on specific attribute information. Once the filtering is successful, only the desired features that correspond to the chosen attribute are displayed. Data filtering is a valuable technique for creating subsets of features that can be exported as a new layer.
Manual selection#
It is possible to manually select specific rows by clicking on the number on the left side of it. This can be easily used to select a small number of rows. If they are selected successfully, they will appear in yellow.
Example: Manual selection of rows
Select by expression#
In this dialog, you can build your expressions to query the data. There are several operators that can be used to filter your vector layer.
operator |
functionality |
---|---|
+ |
addition |
- |
substraction |
* |
multiplication |
/ |
division |
% |
remainder of division |
operator |
functionality |
---|---|
= |
equals |
!= |
not equal |
< |
less than |
> |
greater than |
<= |
less than or equal to |
>= |
greater than or equal to |
Operators such as AND, OR can be used to combine different queries or criteria
operator |
functionality |
---|---|
AND |
logical AND |
OR |
logical OR |
NOT |
logical NOT |
operator |
functionality |
---|---|
LIKE |
pattern matching |
IN |
checks if a value is in a list of values |
IS NULL |
checks for null values |
BETWEEN |
checks if a value is within a specified range |
CASE WHEN |
conditional expressions |
Querying your data to answer more complex question is of great importance. This can be accomplished using the “Select by expression” tool. In the provided example, we aim to answer the question: Which cities, excluding those with a population of one million inhabitants in 1950, had grown to over ten million inhabitants by 2015?
Example: Cities, excluding those with a population of one million inhabitants in 1950, that have grown to over ten million inhabitants by 2015
SQL#
Another possibility to build your expressions is to use SQL.
SQL (Structured Query Language) is a standardised programming language that is used to manage databases and perform various operations on the data in them. In the Query Builder in QGIS, you can use SQL expressions to use one or more conditions to filter a layer.
You can easily access essential SQL statements by referring to this handy Cheat Sheet. This offers a concise overview of the core functionalities.
Query Builder#
The Query Builder provides an interface that allows you to define a subset of the features in the layer using SQL-like statements and to display the results in the main window. As long as the query is active, only the features corresponding to its result are available in the project. You can use one or more layer attributes to define the filter in the Query Builder. The Query Builder is built as follows:
The Fields list contains all the fields of the layer. To add an attribute column to the expression window, double-click its name or just type it into the box.
The Values frame lists the values of the currently selected field.
To list all unique values of a field, click the All button.
To list the first 25 unique values of the column, click the Sample button.
To add a value to the expression window, double click it in the Values list. You can use the search box at the top of the “Values”-panel to easily browse and find attribute values in the list.
The Operators section contains all usable operators. To add an operator, click the appropriate button.
The Test button helps you to check your query and displays a message box with the number of features satisfying the current query.
Use the Clear button to revert the layer to its original state.
Note
When a filter is applied with the Query Builder, QGIS treats the resulting subset as if it were the entire layer.
In this short video, you will discover the location of the query builder and learn how to create a straightforward query for isolating a particular state from a dataset that covers the entire country. The example focuses on a dataset related to South Sudan and serves as a basic illustration.