genderedstreetnames.Rmd
In order to find out how many street are dedicated to men, how many to women, it is necessary to:
The package ’genderedstreetnames` proposes an approach to deal with each of these steps.
At the current stage, the package addresses instances when a street is dedicated to a single person, and that person can be identified as either male or female. This reductionist approach is of course far from unproblematic, as it leaves out street names dedicated to more than a person, gendered groups of people, as well as people with non-binary gender identities, yet it may provide useful inputs for a public conversation around gender and toponymy. Future version of the pacakge may consider alternative approaches, and of course users can adapt these functions to other purposes.
You can install the package from GitHub:
Before starting, let’s load all required libraries
library("genderedstreetnames") # this package!
library("dplyr") # for data processing
library("purrr") # for serialising data processing
library("sf") # for processing geographic data
library("tmap") # for plotting maps
There are different possible sources for street names. Relevant open data are often made available by national or local authorities. However, in order to have geolocalised street names, the most straightforward source is OpenStreetMap.
The following commands facilitate downloading relevant data.
As an example, in this vignette I will process data on street names in Romania. You can make the same with other countries simply by changing the relevant parameter.
The following commands download the archive of OpenStreetMap data for a given countries, and extracts data related to streets.
Be aware that data will be stored in subfolders of the current working directory.
The following command downloads the shapefile for all OpenStreetMap data related to a given country from Geofabrik’s website in zip files.
To process data further, files related to roads need to be extracted from the zip file, which can conveniently be done with the dedicated function, Extract_roads
.
Here you can see a preview of the data imported from OpenStreetMap on all streets in Romania:
roads <- extract_roads(countries = "Romania")
#> Reading layer `gis_osm_roads_free_1' from data source `/home/g/Nextcloud/R_nc/genderedstreetnames/vignettes/data/roads_shp/romania' using driver `ESRI Shapefile'
#> Simple feature collection with 605603 features and 10 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 20.19862 ymin: 43.58925 xmax: 29.72713 ymax: 48.41649
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
head(roads)
#> Simple feature collection with 6 features and 10 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 26.08665 ymin: 44.41261 xmax: 27.44931 ymax: 44.74973
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> osm_id code fclass name ref oneway maxspeed
#> 1 1349 5115 tertiary Bulevardul Mircea Eliade <NA> B 50
#> 2 1467 5115 tertiary Bulevardul Primăverii <NA> F 30
#> 3 1759 5122 residential Strada Răcari <NA> B 30
#> 4 1760 5113 primary Calea Dorobanților <NA> F 60
#> 5 1915 5113 primary Bulevardul Camil Ressu <NA> F 60
#> 6 2160 5112 trunk <NA> DN21 B 100
#> layer bridge tunnel geometry
#> 1 0 F F LINESTRING (26.09352 44.469...
#> 2 0 F F LINESTRING (26.09298 44.469...
#> 3 0 F F LINESTRING (26.13535 44.415...
#> 4 0 F F LINESTRING (26.08665 44.465...
#> 5 0 F F LINESTRING (26.14636 44.417...
#> 6 0 F F LINESTRING (27.44931 44.749...
There are various approaches for facilitating the identification of the gender starting from names, and most of them use the first name as a hint.1.
This is an option, but may lead to some mistakes. I propose a multi-layered approach that first tries to match the name of person to Wikidata (after all, if a person has been entitled a street it is quite likely that they will also have at least a Wikipedia page), and only if this does not work will try to predict gender based on first name. Besides, using Wikipedia should also allow to match cases when the database does not include the first name of the person to which it is dedicated (e.g. Mozartstraße). Finally, using Wikipedia as a reference should not only lead to less mistakes, but provides also a brief characterisation of the individual which can then be used at a later stage in the analysis.
First, however, we need to “clean” street names in order to keep only the part of the street name that may potentially refer to a person. This is language specific and may involve removing the last word (e.g. remove “street” from “James Joyce Street” in order to look for “James Joyce” on Wikipedia, which will correctly result in identifying the male Irish novelist and poet).
genderstreetnames
includes a few functions to facilitate this process. Some of them are generic (e.g. remove the first or the last word) and can be applied to various languages, some of them will need to be language specific and may be added in future update (e.g. in German street names are merged with the name of the person, so a custom, language-specific solution will be needed to remove “straße” from “Mozartstraße” in order to correctly identify the male composer Wolfgang Amadeus Mozart).
In the case of Romania, it is quite easy to deal with this issue, as it suffices to remove the first word of the street names:
street_names <- roads %>%
st_set_geometry(NULL) %>%
transmute(name, name_clean = remove_first_word(name))
head(street_names)
#> name name_clean
#> 1 Bulevardul Mircea Eliade Mircea Eliade
#> 2 Bulevardul Primăverii Primăverii
#> 3 Strada Răcari Răcari
#> 4 Calea Dorobanților Dorobanților
#> 5 Bulevardul Camil Ressu Camil Ressu
#> 6 <NA> <NA>
Now we are ready to feed the street names into Wikidata and see if it can reliably determine to whom a given street or square is dedicated and their gender. The language parameter sets in which version of Wikidata the query is run, as results vary. Let’s try first with a single name:
find_gender(search = "Mircea Eliade", language = "ro")
#> # A tibble: 1 x 4
#> Query Gender Description WikidataID
#> <chr> <chr> <chr> <chr>
#> 1 Mircea Eli… male Romanian historian of religion, fiction wr… Q41590
Mircea Eliade is correctly identified as a male writer. What this function does in the background is look for “Mircea Eliade” on Wikidata (which stores metadata of Wikipedia pages), get the resulting entity on the database - which can be seen at this link https://www.wikidata.org/wiki/Q41590 - and extract the “sex or gender” field.
By default, the function find_gender
caches queries and results locally and waits a fraction of a second between each query, reducing the load on Wikidata’s server and speeding up the process for following runs. This also means that if you run this command with a long list of queries, you can interrupt the process and start it again in another session without effectively restarting anew.
Here is the output of querying the beginning of our dataset:
purrr::map_dfr(.x = street_names %>% pull(name_clean) %>% head(),
.f = find_gender,
language = "ro")
#> Warning in .f(.x[[i]], ...):
#> "Primăverii" does not match any person with either male or female gender.
#> Warning in .f(.x[[i]], ...):
#> "Răcari" does not match any person with either male or female gender.
#> Warning in .f(.x[[i]], ...):
#> "Dorobanților" does not match any person with either male or female gender.
#> Warning in .f(.x[[i]], ...):
#> "NA" does not match any person with either male or female gender.
#> # A tibble: 6 x 4
#> Query Gender Description WikidataID
#> <chr> <chr> <chr> <chr>
#> 1 Mircea Eli… male Romanian historian of religion, fiction wr… Q41590
#> 2 Primăverii <NA> <NA> <NA>
#> 3 Răcari <NA> <NA> <NA>
#> 4 Dorobanțil… <NA> <NA> <NA>
#> 5 Camil Ressu male Romanian painter and politician Q926472
#> 6 <NA> <NA> <NA> <NA>
We can now process the whole dataset. Since this process is time consuming, it may be best to store the resulting output locally.
if (file.exists(file.path("data", "wiki_street_names", "romania", "wiki_street_names-romania.rds"))==FALSE) {
wiki_street_names <- purrr::map_dfr(.x = street_names %>% pull(name_clean) %>% unique(),
.f = find_gender,
language = "ro",
quietly = TRUE)
dir.create(path = file.path("data", "wiki_street_names"), showWarnings = FALSE)
dir.create(path = file.path("data", "wiki_street_names", "romania"), showWarnings = FALSE)
saveRDS(object = wiki_street_names, file = file.path("data", "wiki_street_names", "romania", "wiki_street_names-romania.rds"))
} else {
wiki_street_names <- readRDS(file.path("data", "wiki_street_names", "romania", "wiki_street_names-romania.rds"))
}
Given the fact that OpenStreetMap may be far from complete in smaller villages, and that smaller villages sometimes dedicate streets to local figures, for the rest of this analyse we will focus on cities.
Let’s try with a single city, then see how to expand this to more.
For a start, we need the borders of the municipality we want to focus on. Let’s take the city of Sibiu, in Transylvania. This is how the borders of the municipality look:
city_boundary <- get_city_boundaries(city = "Sibiu", country = "Romania")
tmap::tm_shape(city_boundary) +
tmap::tm_polygons()
We can query our dataset, and find out which of all the streets and squares in Romania are located within the municipality of Sibiu. These are all the streets located in the municipality of Sibiu:
city_roads <- subset_roads(boundary = city_boundary, roads = roads)
#> although coordinates are longitude/latitude, st_within assumes that they are planar
tmap::tm_shape(city_boundary) +
tmap::tm_polygons() +
tmap::tm_shape(city_roads) +
tmap::tm_lines()
Which of these streets are dedicated to women? Let’s have a look
city_roads <- subset_roads(boundary = city_boundary, roads = roads) %>%
mutate(name_clean = remove_first_word(name)) %>%
left_join(wiki_street_names %>% rename(name_clean = Query), by = "name_clean")
#> although coordinates are longitude/latitude, st_within assumes that they are planar
city_roads_gender <- city_roads %>%
st_set_geometry(NULL) %>%
distinct(name, .keep_all = TRUE) %>%
arrange(Gender) %>%
select(name, Gender, Description)
city_roads_gender %>%
count(Gender) %>%
knitr::kable()
Gender | n |
---|---|
NA | 499 |
female | 6 |
male | 142 |
Not many streets dedicated to women, it seems.
Who are they?
Unsurprisingly, this is not completely accurate. Let’s export this and fix mistakes in an interactive interface, which allows to quickly set the records straight.
When you’re done fixing, click on the “Download as .rds” button at the bottom of the table to save your changes. By default, changes are stored locally under data/gendered_street_names_fixed
. Additonally, you are given the chance to keep them elsewhere.
Cities have usually just a few hundred streets, and since the vast majority are already correctly categorised it takes literally just a few minutes to fix any outstanding mistake.
dir.create(path = file.path("data", "gendered_street_names"), showWarnings = FALSE)
dir.create(path = file.path("data", "gendered_street_names", "romania"), showWarnings = FALSE)
saveRDS(object = city_roads_gender, file = file.path("data", "gendered_street_names", "romania", "city_roads_gender-sibiu.rds"))
Here is how the interface looks:
Fixed what was not right?
Now let’s reimport the updated data, and finally plot the maps.
fixed_roads <- readRDS(file = file.path("data", "gendered_street_names_fixed", "romania", "city_roads_gender_fixed-sibiu.rds"))
city_roads <- city_roads %>% select(-Gender) %>% mutate(name = as.character(name)) %>%
left_join(y = fixed_roads %>% rename(name = `Street_name`), by = "name") %>%
mutate(Gender = if_else(condition = Gender == "Other", true = as.character(NA), false = Gender))
dir.create(path = file.path("data", "gendered_street_names_fixed_geo"), showWarnings = FALSE)
dir.create(path = file.path("data", "gendered_street_names_fixed_geo", "romania"), showWarnings = FALSE)
saveRDS(object = city_roads, file = file.path("data", "gendered_street_names_fixed_geo", "romania", "city_roads_gender_fixed_geo-sibiu.rds"))
tmap::tmap_mode(mode = "plot")
#> tmap mode set to plotting
city_map <- tmap::tm_shape(city_boundary) +
tmap::tm_polygons() +
tmap::tm_shape(city_roads) +
tmap::tm_lines(col = "Gender", palette = "Set1", textNA = "other")
city_map
Let’s have a closer look with an interactive version of this map:
tmap::tmap_mode("view")
#> tmap mode set to interactive viewing
tmap::tm_shape(city_boundary) +
tmap::tm_polygons(col = "lightblue", alpha = 0.2, popup.vars = FALSE) +
tmap::tm_shape(city_roads) +
tmap::tm_lines(col = "Gender",
lwd = 2,
palette = "Set1",
textNA = "other",
id = "name",
popup.vars = c("Gender", "Description"))
After fixing miscategorised street names, it appears there are even less street names dedicated to women:
city_roads_gender_balance <-
city_roads %>%
st_set_geometry(NULL) %>%
distinct(name, .keep_all = TRUE) %>%
count(Gender)
knitr::kable(city_roads_gender_balance)
Gender | n |
---|---|
NA | 456 |
Female | 4 |
Male | 187 |
Among all the 647 streets in Sibiu that are dedicated to a person, it appears that only 4 are dedicated to a woman. In other words, less than 1 percent of the streets in Sibiu are dedicated to a woman (about 29% are dedicated to men).
Even if a couple of streets have been miscategorised, the overall picture is unlikely to change much.
For reference, here is the updated list of street names:
city_roads %>%
arrange(Gender) %>%
sf::st_set_geometry(NULL) %>%
select(name, Gender, Description) %>%
distinct(name, .keep_all = TRUE) %>%
DT::datatable(rownames = FALSE)
A separate vignette will outline how to systematically repeat this procedure and systematically create relevant maps for multiple cities in a single go.
See for example the R package gender
- https://github.com/ropensci/gender - for an approach based on historical data. See also the Python package SexMachine
- https://pypi.org/project/SexMachine/ - which is also based on a database of names↩