Non-Spatial Geodataprocessing#

Introduction:#

Non-spatial geodataprocessing in QGIS refers to the manipulation, subsetting and analysis of attribute data within a GIS environment without directly involving spatial components. It involves operations on the non-geometric attributes of geospatial datasets. This can include data cleaning, transformation, enrichment and analysis based on the associated attribute information, such as population statistics, land use classifications or economic indicators. Non-spatial geodataprocessing can be used to perform calculations, generate statistics and gain insights into the non-spatial aspects of geospatial datasets. QGIS offers a variety of tools for non-spatial geodataprocessing to assist users in managing and analyzing attribute data effectively.

This segment of Module 5 will start with the introduction of table functions. It will then progress into techniques for querying data, ultimately introducing the important concept of non-spatial joins.

../../_images/en_attribute_table_large.PNG

Fig. 131 Screenshot of an attribute table for QGIS version 3.28.4#

Table functions#

For a comprehensive overview on the attribute table’s functionality and its purpose, you’re invited to explore the Wiki article on it.

Add field#

The information within a vector layer can be accessed through its attribute table, and it can be enhanced by introducing new fields to this table. These additional fields may be derived from calculations, as exemplified in the following case, where population density is computed to provide deeper insights into spatial population distribution.

Attention

The selection of the appropriate data type should align with the information being added to the new attribute field. Please keep this in mind while watching the example video.

Possible data types:

The most common ones are:

  • Whole number: Integer (32 and 64 bit)

  • Decimal number (real)

  • Text (string)

Additional options:

  • Date and Date and time

  • Boolean

Example: Add a field for population density

Delete field#

It is also possible to delete fields from the attribute table. A commonly used practice is to remove all unused or unnecessary fields from a layer before starting to work on it. This makes the dataset much more organized.

Example: Delete all unused/unnecessary fields from a vector layer

Calculate field#

An important practice is to calculate the attribute values for a field, e.g., based on the values of other fields. In QGIS, you can create a new field or update an existing field.

Note

It is necessary to check if the data type of the field (new or updated) and your calculation match. For example, if you are calculating a ratio (e.g. density), the field should not be of type integer but rather of type decimal number.

An example could be to calculate the population density based on the already existing fields Population and Area.

A very important tool for such calculations is the Field Calculator. It allows you to perform calculations based on existing attribute values or defined functions, for example, to calculate the length or area of a geometry feature or in the given example, could be used to calculate the population density based on the already existing fields Population and Area. The results of these calculations can be written into a new field or update an existing field.

../../_images/en_field_calculator_red_boxes.png

Fig. 132 Screenshot of the Field calculator#

The most important groups and their respective functionality that are provided with the field calculator are listed below:

  • Fields and Values

    • Contains a list of fields from the layer

  • Geometry

    • Calculates the area of a polygon feature: $area

    • Calculates the length of a line feature: $length

    • Calculates the centroid of a polygon feature: centroid($geometry)

    • Calculates the bounding box of a feature: bounds($geometry)

    • Calculates the distance between two points: distance(point_a, point_b)

  • Maths

    • Calculates the square root of a field: sqrt("field")

    • Calculate min and max

Example: Calculate the population density

Basic statistics for fields#

The tool Basic statistics for fields generates statistics for a specific field of the attribute table of a vector layer. The results are generated as an HTML file and can be accessed by using the file path link in the Results Viewer. This operation is highly valuable for gaining a comprehensive understanding of the data you intend to work with. It allows you to determine the range of values, pinpoint the minimum and maximum values. In the provided example, this operation is applied to calculate the global population density, enabling you to easily identify the most densely populated region worldwide.

Example: Calculate statistics for the field population density for countries worldwide.

Statistics by categories#

To calculate statistics of a field depending on a parent class you can use the tool Statistics by catergories. The parent class is a combination of values from other fields.

Questions that need to be considered when doing these calculations:

  • For which fields should the statistics be calculated in the attribute table?

  • Which field in the attribute table contains which information?

For greater precision in these calculations, statistics by categories offer more comprehensive insights than those mentioned earlier. In this case, it becomes simple to determine the number of cities per country with over 300,000 inhabitants and, for each country, the population living in the largest urban agglomeration.

Example: Cities with more than 300.000 inhabitants and the amount of population in the largest agglomerations

Non-spatial queries#

In GIS, you can query (filter) data based on specific attribute information. Once the filtering is successful, only the desired features that correspond to the chosen attribute are displayed. Data filtering is a valuable technique for creating subsets of features that can be exported as a new layer.

Manual selection#

It is possible to manually select specific rows by clicking on the number on the left side of it. This can be easily used to select a small number of rows. If they are selected successfully, they will appear in yellow.

Example: Manual selection of rows

Select by expression#

In this dialog, you can build your expressions to query the data. There are several operators that can be used to filter your vector layer.

operator

functionality

+

addition

-

substraction

*

multiplication

/

division

%

remainder of division

operator

functionality

=

equals

!=

not equal

<

less than

>

greater than

<=

less than or equal to

>=

greater than or equal to

Operators such as AND, OR can be used to combine different queris or criteria

operator

functionality

AND

logical AND

OR

logical OR

NOT

logical NOT

operator

functionality

LIKE

pattern matching

IN

checks if a value is in a list of values

IS NULL

checks for null values

BETWEEN

checks if a value is within a specified range

CASE WHEN

conditional expressions

Querying your data to answer more complex question is of great importance. This can be accomplished using the “Select by expression” tool. In the provided example, we aim to answer the question: Which cities, not having a population of one million inhabitants in 1950, had surged to over 10 million inhabitants by 2015?

Example: Cities, not having a population of one million inhabitants in 1950, had surged to over 10 million inhabitants by 2015?

SQL#

Another possibility to build your expressions is to utilize SQL.

SQL (Structured Query Language) is a standardized programming language that is used to manage databases and perform various operations on the data in them. In the Query Builder in QGIS, you can use SQL expressions to use one or more conditions to filter a layer.

You can easily access essential SQL statements by referring to this handy Cheat Sheet. This offers a concise overview of the core functionalities.

Query Builder#

The Query Builder provides an interface that allows you to define a subset of the features in the layer using SQL-like statements and to display the results in the main window. As long as the query is active, only the features corresponding to its result are available in the project. You can use one or more layer attributes to define the filter in the Query Builder. The Query Builder is built as follows:

../../_images/en_query_builder_comment.png

Fig. 133 Screenshot of the Query Builder#

  1. The Fields list contains all the fields of the layer. To add an attribute column to the expression window, double-click its name or just type it into the box.

  2. The Values frame lists the values of the currently selected field.

    • To list all unique values of a field, click the All button.

    • To list the first 25 unique values of the column, click the Sample button.

    • To add a value to the expression window, double click it in the Values list. You can use the search box at the top of the “Values”-panel to easily browse and find attribute values in the list.

  3. The Operators section contains all usable operators. To add an operator, click the appropriate button.

  4. The Test button helps you to check your query and displays a message box with the number of features satisfying the current query.

  5. Use the Clear button to revert the layer to its original state.

Note

When a filter is applied with the Query Builder, QGIS treats the resulting subset as if it were the entire layer.

In this short video, you’ll discover the location of the query builder and learn how to create a straightforward query for isolating a particular state from a dataset that covers the entire country. The example focuses on a dataset related to South Sudan and serves as a basic illustration.

Example: Simple usage of the Query Builder.

Non-spatial joins#

A lot of analysis can be done with just a single layer. But, sometimes, the necessary information we need for our analysis is split across different datasets/layers. With QGIS, these layers can be combined to perform the analysis we want. The simplest way to combine layers is via an attribute join. This operation looks up information from a second data source based on a shared attribute value. This value functions as a common unique identifier, also known as an ID, UID or key.

In QGIS the tool Join attributes by field value is often used for such operations:

../../_images/en_join_attributes_by_field_values.PNG

Fig. 134 Screenshot of the Join attributes by field value tool#

Attention

  • An attribute join in QGIS only works properly, when the attributes match exactly.

  • For example: “S. Sudan” will not match with “South Sudan”.

  • Where possible it’s best to use attributes that have been designed for joining, such as P-codes or ID’s which are not susceptible to spelling mistakes.