Non-Spatial processing

5.5. Non-Spatial processing#

5.5.1. Introduction#

Non-spatial data processing in QGIS refers to the manipulation of attribute data without directly involving spatial components or information, such as the spatial relationships or geometries.

It changes the non-geometric attributes of datasets (i.e., the attribute table)
Non-spatial processing can be used to perform calculations, generate statistics, and gain insights into the non-spatial aspects of geospatial datasets.
QGIS offers a variety of tools for non-spatial processing to assist users in managing and analysing attribute data effectively.
This can include data cleaning, transformation, enrichment, and analysis based on the associated attribute information, such as population statistics, land use classifications, or economic indicators.

../../_images/en_attribute_table_large.PNG — Fig. 5.20 Screenshot of an attribute table for QGIS version 3.28.4#

5.5.2. Non-spatial joins (Join Attributes by Field Value)#

A lot of analysis can be done with just a single layer. But, sometimes, the necessary information we need for our analysis is split across different datasets/layers.
With QGIS, these layers can be combined to perform the analysis we want. The simplest way to combine layers is via an attribute join. This operation looks up information from a second data source based on a shared attribute value. This value functions as a common unique identifier, also known as an ID, UID, or key (see Fig. 5.21).

../../_images/simple_attr_join_example.png — Fig. 5.21 The entries in the two data tables can be joined via the common ID-field#

Humanitarian example:

*A common GIS workflow in humanitarian work involving non-spatial joins is joining data on administrative boundaries using P-codes as the common identifier/shared attribute.

P-codes are identifying codes for administrative units (e.g. country (adm0), region (adm1), district (adm2)), that were introduced to simplify joining tabular data on administrative regions. These codes clearly identify the administrative units facilitating non-spatial joins.

For example: We have a spatial dataset containing the administrative boundaries of districts (adm2) in Nigeria and a data table containing the population per district, but without the polygons. By using the P-codes as identifying attribute, we can easily join the population data with the vector dataset.*

../../_images/en_attribute_join_pcode_example.png — Fig. 5.22 The P-code associated with the district Edo South is NG01201#

Attention

An attribute join in QGIS only works properly, when the attributes match exactly.
For example: “S. Sudan” will not match with “South Sudan”.
Where possible it’s best to use attributes that have been designed for joining, such as P-codes or ID’s which are not susceptible to spelling mistakes.

5.5.2.1. Exercise: Performing a non-spatial join#

In this short follow along exercise, we will add the population data to the administrative boundaries layer (adm1).

Download the necessary layers here, unzip them, and add them to your QGIS-project.

Tip

The population layer needs to be added as a delimited text layer (Layer > Add Layer > ) with no geometry.

Open the “Join Attributes by Field Value”-tool from the processing toolbox
As the Input Layer 1, select the layer nga_admbnda_adm1_osgof_20190417, set the “Table Field” to ADM1_PCODE
As Input Layer 2, select the layer nga_adm1pop_2022, set the “Table Field” to ADM1_PCODE. Additionally, under “Layer 2 fields to copy”, select F_TL, M_TL, and T_TL.
Click Run. A new layer will appear in your layer panel called “Joined Layer”.

../../_images/en_3.36_pcode_join.png — Fig. 5.23 Setting the parameters for the P-code join#

Open the attribute table for the new layer and scroll to the right. Here you will find the joined attributes

Great! We have successfully added the population data to our administrative boundaries layer. Now, we can visualise the population distribution or continue to analyse our data.

../../_images/nga_pop_join.png — Fig. 5.24 The joined data classified using the graduated symbology for the population value.#

5.5.3. Table functions#

Table functions usually only involve a single data layer and are manipulating the attribute table. You can add new field, delete unwanted fields, or even calculate new field using the field calculator.

For a comprehensive overview on the attribute table’s functionality and its purpose, you’re invited to explore the Wiki article on it.

5.5.3.1. Add field#

The information within a vector layer can be accessed through its attribute table, and it can be enhanced by introducing new fields to this table. These additional fields may be derived from calculations, as exemplified in the following case, where population density is computed to provide deeper insights into spatial population distributions.

Attention

The selection of the appropriate data type should align with the information being added to the new attribute field. Please keep this in mind while watching the example video.

Possible data types:

The most common ones are:

Whole number: Integer (32 and 64 bit)
Decimal number (real)
Text (string)

Additional options:

Date and Date and time
Boolean

5.5.3.2. Delete field#

It is also possible to delete fields from the attribute table. A commonly used practice is to remove all unused or unnecessary fields from a layer before starting to work on it. This makes the dataset much more organised.

5.5.3.3. Calculate field#

An important practice is to calculate the attribute values for a field, e.g., based on the values of other fields. In QGIS, you can create a new field or update an existing field.

Note

It is necessary to check if the data type of the field (new or updated) and your calculation match. For example, if you are calculating a ratio (e.g. density), the field should not be of type integer but rather of type decimal number.

An example could be to calculate the population density based on the already existing fields Population and Area.

A very important tool for such calculations is the Field Calculator. It allows you to perform calculations based on existing attribute values or defined functions, for example, to calculate the length or area of a geometry feature or in the given example, could be used to calculate the population density based on the already existing fields Population and Area. The results of these calculations can be written into a new field or update an existing field.

../../_images/en_field_calculator_red_boxes.png — Fig. 5.25 Screenshot of the Field calculator#

The most important groups and their respective functionality that are provided with the field calculator are listed below:

Fields and Values
- Contains a list of fields from the layer
Geometry
- Calculates the area of a polygon feature: $area
- Calculates the length of a line feature: $length
- Calculates the centroid of a polygon feature: centroid($geometry)
- Calculates the bounding box of a feature: bounds($geometry)
- Calculates the distance between two points: distance(point_a, point_b)
Maths
- Calculates the square root of a field: sqrt("field")
- Calculate min and max

5.5.3.4. Basic statistics for fields#

The tool Basic statistics for fields generates statistics for a specific field of the attribute table of a vector layer. The results are generated as an HTML file and can be accessed by using the file path link in the Results Viewer. This operation is highly valuable for gaining a comprehensive understanding of the data you intend to work with. It allows you to determine the range of values, pinpoint the minimum and maximum values. In the provided example, this operation is applied to calculate the global population density, allowing you to easily identify the most densely populated region worldwide.

5.5.3.5. Statistics by categories#

To calculate statistics of a field depending on a parent class you can use the tool Statistics by categories. The parent class is a combination of values from other fields.

Questions that need to be considered when doing these calculations:

For which fields should the statistics be calculated in the attribute table?
Which field in the attribute table contains which information?

For greater precision in these calculations, “statistics by categories” offers more comprehensive insights than those mentioned earlier. In this case, it becomes simple to determine the number of cities per country with over 300,000 inhabitants and, for each country, the population living in the largest urban agglomeration.

5.5.4. Non-spatial queries#

In GIS, you can query (filter) data based on specific attribute information. Once the filtering is successful, only the desired features that correspond to the chosen attribute are displayed. Data filtering is a valuable technique for creating subsets of features that can be exported as a new layer.

5.5.4.1. Manual selection#

It is possible to manually select specific rows by clicking on the number on the left side of it. This can be easily used to select a small number of rows. If they are selected successfully, they will appear in yellow.

5.5.4.2. Select by expression#

In this dialog, you can build your expressions to query the data. There are several operators that can be used to filter your vector layer.

Arithmetic operators

operator	functionality
+	addition
-	subtraction
*	multiplication
/	division
%	remainder of division

Comparison operators

operator	functionality
=	equals
!=	not equal
<	less than
>	greater than
<=	less than or equal to
>=	greater than or equal to

Logical operators

Operators such as AND, OR can be used to combine different queries or criteria

operator	functionality
AND	logical AND
OR	logical OR
NOT	logical NOT

Special operators

operator	functionality
LIKE	pattern matching
IN	checks if a value is in a list of values
IS NULL	checks for null values
BETWEEN	checks if a value is within a specified range
CASE WHEN	conditional expressions

Querying your data to answer more complex question is of great importance. This can be accomplished using the “Select by expression” tool. In the provided example, we aim to answer the question: Which cities, excluding those with a population of one million inhabitants in 1950, had grown to over ten million inhabitants by 2015?

5.5.4.2.1. SQL#

Another possibility to build your expressions is to use SQL.

SQL introduction

SQL (Structured Query Language) is a standardised programming language that is used to manage databases and perform various operations on the data in them. In the Query Builder in QGIS, you can use SQL expressions to use one or more conditions to filter a layer.

SQL Cheat Sheet

You can easily access essential SQL statements by referring to this handy Cheat Sheet. This offers a concise overview of the core functionalities.

5.5.4.3. Query Builder#

The Query Builder provides an interface that allows you to define a subset of the features in the layer using SQL-like statements and to display the results in the main window. As long as the query is active, only the features corresponding to its result are available in the project. You can use one or more layer attributes to define the filter in the Query Builder. The Query Builder is built as follows:

../../_images/en_query_builder_comment.png — Fig. 5.26 Screenshot of the Query Builder#

The Fields list contains all the fields of the layer. To add an attribute column to the expression window, double-click its name or just type it into the box.
The Values frame lists the values of the currently selected field.
- To list all unique values of a field, click the All button.
- To list the first 25 unique values of the column, click the Sample button.
- To add a value to the expression window, double click it in the Values list. You can use the search box at the top of the “Values”-panel to easily browse and find attribute values in the list.
The Operators section contains all usable operators. To add an operator, click the appropriate button.
The Test button helps you to check your query and displays a message box with the number of features satisfying the current query.
Use the Clear button to revert the layer to its original state.