Non-Spatial processing#

Introduction:#

Non-spatial geodataprocessing in QGIS refers to the manipulation, subsetting, and analysis of attribute data within a GIS environment without directly involving spatial components or information. It involves operations on the non-geometric attributes of geospatial datasets. This can include data cleaning, transformation, enrichment, and analysis based on the associated attribute information, such as population statistics, land use classifications, or economic indicators. Non-spatial geodataprocessing can be used to perform calculations, generate statistics, and gain insights into the non-spatial aspects of geospatial datasets. QGIS offers a variety of tools for non-spatial geodataprocessing to assist users in managing and analysing attribute data effectively.

../../_images/en_attribute_table_large.PNG

Fig. 156 Screenshot of an attribute table for QGIS version 3.28.4#

Non-spatial joins#

../../_images/en_join_attributes_by_field_values.PNG

Fig. 157 Screenshot of the Join attributes by field value tool in QGIS 3.36.#

A lot of analysis can be done with just a single layer. But, sometimes, the necessary information we need for our analysis is split across different datasets/layers. With QGIS, these layers can be combined to perform the analysis we want. The simplest way to combine layers is via an attribute join. This operation looks up information from a second data source based on a shared attribute value. This value functions as a common unique identifier, also known as an ID, UID, or key (see Fig. 158).

../../_images/simple_attr_join_example.png

Fig. 158 The entries in the two data tables can be joined via the common ID-field#

Humanitarian example:

A common GIS workflow in humanitarian work involving non-spatial joins is joining data on administrative boundaries using P-codes as the common identifier/shared attribute.

P-codes are identifying codes for administrative units (e.g. country (adm0), region (adm1), district (adm2)), that were introduced to simplify joining tabular data on administrative regions. These codes clearly identify the administrative units facilitating non-spatial joins.

For example: We have a spatial dataset containing the administrative boundaries of districts (adm2) in Nigeria and a data table containing the population per district, but without the polygons. By using the P-codes as identifying attribute, we can easily join the population data with the vector dataset.

../../_images/en_attribute_join_pcode_example.png

Fig. 159 The P-code associated with the district Edo South is NG01201#

Attention

  • An attribute join in QGIS only works properly, when the attributes match exactly.

  • For example: “S. Sudan” will not match with “South Sudan”.

  • Where possible it’s best to use attributes that have been designed for joining, such as P-codes or ID’s which are not susceptible to spelling mistakes.

Exercise: Performing a non-spatial join#

In this short follow along exercise, we will add the population data to the administrative boundaries layer (adm1).

  1. Download the necessary layers here, unzip them, and add them to your QGIS-project.

Tip

The population layer needs to be added as a delimited text layer (Layer > Add Layer > ) with no geometry.

  1. Open the “Join Attributes by Field Value”-tool from the processing toolbox

  2. As the Input Layer 1, select the layer nga_admbnda_adm1_osgof_20190417, set the “Table Field” to ADM1_PCODE

  3. As Input Layer 2, select the layer nga_adm1pop_2022, set the “Table Field” to ADM1_PCODE. Additionally, under “Layer 2 fields to copy”, select F_TL, M_TL, and T_TL.

  4. Click Run. A new layer will appear in your layer panel called “Joined Layer”.

../../_images/en_3.36_pcode_join.png

Fig. 160 Setting the parameters for the P-code join#

  1. Open the attribute table for the new layer and scroll to the right. Here you will find the joined attributes

Great! We have successfully added the population data to our adminsitrative boundaries layer. Now, we can visualise the population distribution or continue to analyse our data.

../../_images/nga_pop_join.png

Fig. 161 The joined data classified using the graduated symbology for the population value.#

Table functions#

Table functions usually only involve a single data layer. You can add new field, delete unwanted fields, or even calculate new field using the field calculator.

For a comprehensive overview on the attribute table’s functionality and its purpose, you’re invited to explore the Wiki article on it.

Add field#

The information within a vector layer can be accessed through its attribute table, and it can be enhanced by introducing new fields to this table. These additional fields may be derived from calculations, as exemplified in the following case, where population density is computed to provide deeper insights into spatial population distributions.

Attention

The selection of the appropriate data type should align with the information being added to the new attribute field. Please keep this in mind while watching the example video.

Possible data types:

The most common ones are:

  • Whole number: Integer (32 and 64 bit)

  • Decimal number (real)

  • Text (string)

Additional options:

  • Date and Date and time

  • Boolean

Example: Add a field for population density

Delete field#

It is also possible to delete fields from the attribute table. A commonly used practice is to remove all unused or unnecessary fields from a layer before starting to work on it. This makes the dataset much more organised.

Example: Delete all unused/unnecessary fields from a vector layer

Calculate field#

An important practice is to calculate the attribute values for a field, e.g., based on the values of other fields. In QGIS, you can create a new field or update an existing field.

Note

It is necessary to check if the data type of the field (new or updated) and your calculation match. For example, if you are calculating a ratio (e.g. density), the field should not be of type integer but rather of type decimal number.

An example could be to calculate the population density based on the already existing fields Population and Area.

A very important tool for such calculations is the Field Calculator. It allows you to perform calculations based on existing attribute values or defined functions, for example, to calculate the length or area of a geometry feature or in the given example, could be used to calculate the population density based on the already existing fields Population and Area. The results of these calculations can be written into a new field or update an existing field.

../../_images/en_field_calculator_red_boxes.png

Fig. 162 Screenshot of the Field calculator#

The most important groups and their respective functionality that are provided with the field calculator are listed below:

  • Fields and Values

    • Contains a list of fields from the layer

  • Geometry

    • Calculates the area of a polygon feature: $area

    • Calculates the length of a line feature: $length

    • Calculates the centroid of a polygon feature: centroid($geometry)

    • Calculates the bounding box of a feature: bounds($geometry)

    • Calculates the distance between two points: distance(point_a, point_b)

  • Maths

    • Calculates the square root of a field: sqrt("field")

    • Calculate min and max

Example: Calculate the population density

Basic statistics for fields#

The tool Basic statistics for fields generates statistics for a specific field of the attribute table of a vector layer. The results are generated as an HTML file and can be accessed by using the file path link in the Results Viewer. This operation is highly valuable for gaining a comprehensive understanding of the data you intend to work with. It allows you to determine the range of values, pinpoint the minimum and maximum values. In the provided example, this operation is applied to calculate the global population density, allowing you to easily identify the most densely populated region worldwide.

Example: Calculate statistics for the field population density for countries worldwide.

Statistics by categories#

To calculate statistics of a field depending on a parent class you can use the tool Statistics by categories. The parent class is a combination of values from other fields.

Questions that need to be considered when doing these calculations:

  • For which fields should the statistics be calculated in the attribute table?

  • Which field in the attribute table contains which information?

For greater precision in these calculations, “statistics by categories” offers more comprehensive insights than those mentioned earlier. In this case, it becomes simple to determine the number of cities per country with over 300,000 inhabitants and, for each country, the population living in the largest urban agglomeration.

Example: Cities with more than 300,000 inhabitants and the amount of population in the largest agglomerations

Non-spatial queries#

In GIS, you can query (filter) data based on specific attribute information. Once the filtering is successful, only the desired features that correspond to the chosen attribute are displayed. Data filtering is a valuable technique for creating subsets of features that can be exported as a new layer.

Manual selection#

It is possible to manually select specific rows by clicking on the number on the left side of it. This can be easily used to select a small number of rows. If they are selected successfully, they will appear in yellow.

Example: Manual selection of rows

Select by expression#

In this dialog, you can build your expressions to query the data. There are several operators that can be used to filter your vector layer.

operator

functionality

+

addition

-

substraction

*

multiplication

/

division

%

remainder of division

operator

functionality

=

equals

!=

not equal

<

less than

>

greater than

<=

less than or equal to

>=

greater than or equal to

Operators such as AND, OR can be used to combine different queries or criteria

operator

functionality

AND

logical AND

OR

logical OR

NOT

logical NOT

operator

functionality

LIKE

pattern matching

IN

checks if a value is in a list of values

IS NULL

checks for null values

BETWEEN

checks if a value is within a specified range

CASE WHEN

conditional expressions

Querying your data to answer more complex question is of great importance. This can be accomplished using the “Select by expression” tool. In the provided example, we aim to answer the question: Which cities, excluding those with a population of one million inhabitants in 1950, had grown to over ten million inhabitants by 2015?

Example: Cities, excluding those with a population of one million inhabitants in 1950, that have grown to over ten million inhabitants by 2015

SQL#

Another possibility to build your expressions is to use SQL.

SQL (Structured Query Language) is a standardised programming language that is used to manage databases and perform various operations on the data in them. In the Query Builder in QGIS, you can use SQL expressions to use one or more conditions to filter a layer.

You can easily access essential SQL statements by referring to this handy Cheat Sheet. This offers a concise overview of the core functionalities.

Query Builder#

The Query Builder provides an interface that allows you to define a subset of the features in the layer using SQL-like statements and to display the results in the main window. As long as the query is active, only the features corresponding to its result are available in the project. You can use one or more layer attributes to define the filter in the Query Builder. The Query Builder is built as follows:

../../_images/en_query_builder_comment.png

Fig. 163 Screenshot of the Query Builder#

  1. The Fields list contains all the fields of the layer. To add an attribute column to the expression window, double-click its name or just type it into the box.

  2. The Values frame lists the values of the currently selected field.

    • To list all unique values of a field, click the All button.

    • To list the first 25 unique values of the column, click the Sample button.

    • To add a value to the expression window, double click it in the Values list. You can use the search box at the top of the “Values”-panel to easily browse and find attribute values in the list.

  3. The Operators section contains all usable operators. To add an operator, click the appropriate button.

  4. The Test button helps you to check your query and displays a message box with the number of features satisfying the current query.

  5. Use the Clear button to revert the layer to its original state.

Note

When a filter is applied with the Query Builder, QGIS treats the resulting subset as if it were the entire layer.

In this short video, you will discover the location of the query builder and learn how to create a straightforward query for isolating a particular state from a dataset that covers the entire country. The example focuses on a dataset related to South Sudan and serves as a basic illustration.

Example: Simple usage of the Query Builder.