Running SQL queries on Spanish cadastre

on Wednesday, February 11, 2026

it turns out that for various reasons, I needed to search for houses in the Spanish cadastre that met certain parameters. For example, houses with a maximum and minimum size, without a pool, with a minimum parcel size, within a specific postal code, and with a certain age.

Immediately, my programmer brain thought: that sounds exactly like a SQL query. Obviously, it would be too good to be true if the Cadastre allowed anyone to run SQL queries directly on their database, but I started thinking about how to automate the search in a similar way, since a manual search was unfeasible given the number of properties in the cadastre.

Maybe there’s an API? Ideally, a REST API. Well, no, it seems not. The closest thing is a service where you can send an XML file with the query parameters you want to run, they queue it up, and when they process the query (usually in less than an hour, they say), they provide the results. This feels like something from another era and totally unfeasible for my needs, especially since there’s not even a testing environment to try the generated XML request file. You send it, wait an hour, and with luck, you get the results you wanted. If not, try again in another hour.

However, there was another service that proved useful for my purpose: the download of all cadastre data by province. Since the query I needed to run was limited to one province, I could download the files for the province of interest and run my queries on that file.

Of course, it still wasn’t that easy, as the download files obviously weren’t in SQL format. Nor in CSV. Not even Excel. They use their own format called alphanumeric information in CAT format, which is basically a text data file with fixed-width fields.

So, what I ended up doing was a simple Python script that reads the lines in the CAT files, splits each line according to the width of each column, and imports the content into a database.

The data file downloaded from the Cadastre website is a ZIP archive with multiple different files, each containing different information: plots, properties, buildings, crops, etc. So the first thing the script does is read the file header to know what type of file it’s reading and, based on that, interpret the rest of the data and split each row into columns.

The script has some limitations. First, it’s a script I created for my own need and one-time use, so I focused on the information important to me. I think I implemented the import for almost everything, including info I didn’t care about (e.g., agricultural crops), but I left out some that I didn’t need and that complicated implementation (e.g., info on the distribution of common elements).

There are other limitations beyond my control, related to inconsistencies in the official documentation on the CAT format and the data itself, for example:

No information on values that can be null.
Sometimes the data type specified in the documentation for a field isn’t respected. For example, the docs might say a field is numeric, but you can find cases with letters in it.
There are values for enumerated fields that aren’t documented. For example, the docs list codes for building use like RCT for cathedral, TCM for cinema, EBL for library, etc. But you can also find buildings with code YPS, which isn’t documented. I tried to add all undocumented codes I found, but the script will fail on any new undocumented ones.
Cadastral references, which are supposed to be unique, aren’t always. It’s not common, but duplicates exist. That’s why I used the cadastral reference along with the municipality code to identify parcels. But still, primary key conflicts could occur with duplicates in the same municipality, which I haven’t seen yet.

The rest of the info to run the script is in the code repository:

Catastro SQL