In order to find out how many street are dedicated to men, how many to women, it is necessary to:

  1. find a dataset that includes all street names of a given city or country
  2. find out in a systematic (and ideally, automatic) way which of those streets are dedicated to persons, and among them which are dedicated to either a man or a woman.
  3. manually check if the gender extracted in the previous step is correct
  4. visualise the results

The package ’genderedstreetnames` proposes an approach to deal with each of these steps.

At the current stage, the package addresses instances when a street is dedicated to a single person, and that person can be identified as either male or female. This reductionist approach is of course far from unproblematic, as it leaves out street names dedicated to more than a person, gendered groups of people, as well as people with non-binary gender identities, yet it may provide useful inputs for a public conversation around gender and toponymy. Future version of the pacakge may consider alternative approaches, and of course users can adapt these functions to other purposes.

You can install the package from GitHub:

remotes::install_github(repo = "giocomai/genderedstreetnames")
library("genderedstreetnames")

Before starting, let’s load all required libraries

library("genderedstreetnames") # this package!
library("dplyr") # for data processing
library("purrr") # for serialising data processing
library("sf") # for processing geographic data
library("tmap") # for plotting maps

Step 1: Get street names

There are different possible sources for street names. Relevant open data are often made available by national or local authorities. However, in order to have geolocalised street names, the most straightforward source is OpenStreetMap.

The following commands facilitate downloading relevant data.

As an example, in this vignette I will process data on street names in Romania. You can make the same with other countries simply by changing the relevant parameter.

The following commands download the archive of OpenStreetMap data for a given countries, and extracts data related to streets.

Be aware that data will be stored in subfolders of the current working directory.

The following command downloads the shapefile for all OpenStreetMap data related to a given country from Geofabrik’s website in zip files.


download_OSM(countries = "Romania")

To process data further, files related to roads need to be extracted from the zip file, which can conveniently be done with the dedicated function, Extract_roads.

Here you can see a preview of the data imported from OpenStreetMap on all streets in Romania:

Step 2: find out which streets are dedicated to either a man or a woman

There are various approaches for facilitating the identification of the gender starting from names, and most of them use the first name as a hint.1.

This is an option, but may lead to some mistakes. I propose a multi-layered approach that first tries to match the name of person to Wikidata (after all, if a person has been entitled a street it is quite likely that they will also have at least a Wikipedia page), and only if this does not work will try to predict gender based on first name. Besides, using Wikipedia should also allow to match cases when the database does not include the first name of the person to which it is dedicated (e.g. Mozartstraße). Finally, using Wikipedia as a reference should not only lead to less mistakes, but provides also a brief characterisation of the individual which can then be used at a later stage in the analysis.

First, however, we need to “clean” street names in order to keep only the part of the street name that may potentially refer to a person. This is language specific and may involve removing the last word (e.g. remove “street” from “James Joyce Street” in order to look for “James Joyce” on Wikipedia, which will correctly result in identifying the male Irish novelist and poet).

genderstreetnames includes a few functions to facilitate this process. Some of them are generic (e.g. remove the first or the last word) and can be applied to various languages, some of them will need to be language specific and may be added in future update (e.g. in German street names are merged with the name of the person, so a custom, language-specific solution will be needed to remove “straße” from “Mozartstraße” in order to correctly identify the male composer Wolfgang Amadeus Mozart).

In the case of Romania, it is quite easy to deal with this issue, as it suffices to remove the first word of the street names:

Now we are ready to feed the street names into Wikidata and see if it can reliably determine to whom a given street or square is dedicated and their gender. The language parameter sets in which version of Wikidata the query is run, as results vary. Let’s try first with a single name:

Mircea Eliade is correctly identified as a male writer. What this function does in the background is look for “Mircea Eliade” on Wikidata (which stores metadata of Wikipedia pages), get the resulting entity on the database - which can be seen at this link https://www.wikidata.org/wiki/Q41590 - and extract the “sex or gender” field.

By default, the function find_gender caches queries and results locally and waits a fraction of a second between each query, reducing the load on Wikidata’s server and speeding up the process for following runs. This also means that if you run this command with a long list of queries, you can interrupt the process and start it again in another session without effectively restarting anew.

Here is the output of querying the beginning of our dataset:

We can now process the whole dataset. Since this process is time consuming, it may be best to store the resulting output locally.

if (file.exists(file.path("data", "wiki_street_names", "romania", "wiki_street_names-romania.rds"))==FALSE) {
  wiki_street_names <- purrr::map_dfr(.x = street_names %>% pull(name_clean) %>% unique(),
                                    .f = find_gender,
                                    language = "ro",
                                    quietly = TRUE)

dir.create(path = file.path("data", "wiki_street_names"), showWarnings = FALSE)
dir.create(path = file.path("data", "wiki_street_names", "romania"), showWarnings = FALSE)
saveRDS(object = wiki_street_names, file = file.path("data", "wiki_street_names", "romania", "wiki_street_names-romania.rds"))
} else {
  wiki_street_names <- readRDS(file.path("data", "wiki_street_names", "romania", "wiki_street_names-romania.rds"))
}

Given the fact that OpenStreetMap may be far from complete in smaller villages, and that smaller villages sometimes dedicate streets to local figures, for the rest of this analyse we will focus on cities.

Let’s try with a single city, then see how to expand this to more.

For a start, we need the borders of the municipality we want to focus on. Let’s take the city of Sibiu, in Transylvania. This is how the borders of the municipality look:

city_boundary <- get_city_boundaries(city = "Sibiu", country = "Romania")

tmap::tm_shape(city_boundary) + 
  tmap::tm_polygons() 

We can query our dataset, and find out which of all the streets and squares in Romania are located within the municipality of Sibiu. These are all the streets located in the municipality of Sibiu:

Which of these streets are dedicated to women? Let’s have a look

city_roads <- subset_roads(boundary = city_boundary, roads = roads) %>% 
  mutate(name_clean = remove_first_word(name)) %>% 
  left_join(wiki_street_names %>% rename(name_clean = Query), by = "name_clean")
#> although coordinates are longitude/latitude, st_within assumes that they are planar


city_roads_gender <- city_roads %>% 
  st_set_geometry(NULL) %>% 
  distinct(name, .keep_all = TRUE) %>% 
  arrange(Gender) %>% 
  select(name, Gender, Description) 

city_roads_gender %>% 
  count(Gender) %>% 
  knitr::kable()
Gender n
NA 499
female 6
male 142

Not many streets dedicated to women, it seems.

Who are they?

Step 3: Manually fixing what’s not right

Unsurprisingly, this is not completely accurate. Let’s export this and fix mistakes in an interactive interface, which allows to quickly set the records straight.

When you’re done fixing, click on the “Download as .rds” button at the bottom of the table to save your changes. By default, changes are stored locally under data/gendered_street_names_fixed. Additonally, you are given the chance to keep them elsewhere.

Cities have usually just a few hundred streets, and since the vast majority are already correctly categorised it takes literally just a few minutes to fix any outstanding mistake.

dir.create(path = file.path("data", "gendered_street_names"), showWarnings = FALSE)
dir.create(path = file.path("data", "gendered_street_names", "romania"), showWarnings = FALSE)
saveRDS(object = city_roads_gender, file = file.path("data", "gendered_street_names", "romania", "city_roads_gender-sibiu.rds"))
fix_street_names(city = "Sibiu", country = "Romania")

Here is how the interface looks:

screenshot of fix_street_names shiny app

screenshot of fix_street_names shiny app

Fixed what was not right?

Now let’s reimport the updated data, and finally plot the maps.

Step 4: visualise on a map

fixed_roads <- readRDS(file = file.path("data", "gendered_street_names_fixed", "romania", "city_roads_gender_fixed-sibiu.rds"))

city_roads <- city_roads %>% select(-Gender) %>% mutate(name = as.character(name)) %>% 
  left_join(y = fixed_roads %>% rename(name = `Street_name`), by = "name") %>% 
  mutate(Gender = if_else(condition = Gender == "Other", true = as.character(NA), false = Gender))

dir.create(path = file.path("data", "gendered_street_names_fixed_geo"), showWarnings = FALSE)
dir.create(path = file.path("data", "gendered_street_names_fixed_geo", "romania"), showWarnings = FALSE)

saveRDS(object = city_roads, file = file.path("data", "gendered_street_names_fixed_geo", "romania", "city_roads_gender_fixed_geo-sibiu.rds"))
tmap::tmap_mode(mode = "plot")
#> tmap mode set to plotting

city_map <- tmap::tm_shape(city_boundary) + 
  tmap::tm_polygons() +
  tmap::tm_shape(city_roads) +
  tmap::tm_lines(col = "Gender", palette = "Set1", textNA = "other")

city_map

Let’s have a closer look with an interactive version of this map:

tmap::tmap_mode("view")
#> tmap mode set to interactive viewing

tmap::tm_shape(city_boundary) + 
  tmap::tm_polygons(col = "lightblue", alpha = 0.2, popup.vars = FALSE) +
  tmap::tm_shape(city_roads) +
  tmap::tm_lines(col = "Gender",
                 lwd = 2,
                 palette = "Set1",
                 textNA = "other",
                 id = "name",
                 popup.vars = c("Gender", "Description"))

After fixing miscategorised street names, it appears there are even less street names dedicated to women:

city_roads_gender_balance <- 
  city_roads %>% 
  st_set_geometry(NULL) %>% 
  distinct(name, .keep_all = TRUE) %>% 
  count(Gender)

knitr::kable(city_roads_gender_balance)
Gender n
NA 456
Female 4
Male 187

Among all the 647 streets in Sibiu that are dedicated to a person, it appears that only 4 are dedicated to a woman. In other words, less than 1 percent of the streets in Sibiu are dedicated to a woman (about 29% are dedicated to men).

Even if a couple of streets have been miscategorised, the overall picture is unlikely to change much.

For reference, here is the updated list of street names:

city_roads %>%
  arrange(Gender) %>% 
  sf::st_set_geometry(NULL) %>% 
  select(name, Gender, Description) %>% 
  distinct(name, .keep_all = TRUE) %>% 
  DT::datatable(rownames = FALSE)

A separate vignette will outline how to systematically repeat this procedure and systematically create relevant maps for multiple cities in a single go.


  1. See for example the R package gender - https://github.com/ropensci/gender - for an approach based on historical data. See also the Python package SexMachine - https://pypi.org/project/SexMachine/ - which is also based on a database of names