Introduction to DemografixeR

API	R function	Estimated variable
https://genderize.io	`genderize(name)`	Gender
https://agify.io	`agify(name)`	Age
https://nationalize.io	`nationalize(name)`	Nationality

Get Started

Setup

First, we need to load the package:

library("DemografixeR")

API credentials

The following step is optional, it is only necessary if you plan to estimate gender, age or nationality for more than 1000 different names a day. To do so, you need to obtain an API key from the following link:

genderize.io store

To use the API key, simply save it only once with the save_api_key(key) and you’re all set. All the functions will automatically retrieve the key once saved:

save_api_key(key = "__YOUR_API_KEY__")

Please be careful when dealing with secrets/tokens/credentials and do not share them publicly. Yet, if you wish explicitly know which API key you’ve saved, retrieve it with the get_api_key() function. To fully remove the saved key use the remove_api_key() function.

Gender

We start by predicting the gender from our customers. For this we use the genderize(name) function:

customers_names <- c("Maria", "Ben", "Claudia", 
                     "Adam", "Hannah", "Robert")
customers_predicted_gender <- genderize(name = customers_names)
customers_predicted_gender # Print results
#> [1] "female" "male"   "female" "male"   "female" "male"

We see that genderize(name) returns the estimated gender for each name as a character vector:

class(customers_predicted_gender)
#> [1] "character"

Yet, it is also possible to obtain a detailed data.frame object with additional information. DemografixeR also allows to use ‘pipes’:

gender_df <- genderize(name = customers_names, simplify = FALSE)
customers_names %>% 
  genderize(simplify = FALSE) %>% 
  knitr::kable(row.names = FALSE)

name	type	gender	probability	count
Maria	gender	female	0.98	334287
Ben	gender	male	0.95	77991
Claudia	gender	female	0.98	118604
Adam	gender	male	0.98	116396
Hannah	gender	female	0.97	13198
Robert	gender	male	0.99	177418

Age

We continue with the age estimation of our customers. As with the genderize(name) function, the simplify parameter also works with the agify(name) function to retrieve a data.frame:

customers_predicted_age <- agify(name = customers_names, simplify = FALSE)

customers_names %>% 
  agify(simplify = FALSE) %>% 
  knitr::kable(row.names = FALSE)

name	type	age	count
Maria	age	21	517258
Ben	age	48	75632
Claudia	age	45	110105
Adam	age	34	110754
Hannah	age	27	12843
Robert	age	59	160915

Nationality

Last but not least, we finish with the nationality extrapolation. Equally as with the genderize(name) and agify(name) function, the simplify parameter also works with the nationalize(name) function to retrieve a data.frame:

customers_predicted_nationality <- nationalize(name = customers_names, simplify = FALSE)

customers_names %>% 
  nationalize(simplify = FALSE) %>% 
  knitr::kable(row.names = FALSE)

name	type	country_id	probability
Maria	nationality	CY	0.0550798
Ben	nationality	AU	0.0665534
Claudia	nationality	CL	0.0559340
Adam	nationality	PL	0.0905836
Hannah	nationality	SL	0.2673254
Robert	nationality	US	0.0909442

Other parameters

`country_id` parameter

Responses of names will in a lot of cases be more accurate if the data is narrowed to a specific country. Luckily, both the genderize(name) and agify(name) function support passing a country code parameter (following the common ISO 3166-1 alpha-2 country code convention). For obvious reasons the nationalize(name) does not:

us_customers_predicted_gender<-genderize(name = customers_names, 
                                         country_id = "US")
us_customers_predicted_gender
#> [1] "female" "male"   "female" "male"   "female" "male"

us_customers_predicted_age<-agify(name = customers_names,
                                  country_id = "US")
us_customers_predicted_age
#> [1] NA 67 69 65 54 70

To obtain a data.frame of all supported countries, use the supported_countries(type) function. Here’s an example of 5 countries:

supported_countries(type = "genderize") %>% 
  head(5) %>%
  knitr::kable(row.names = FALSE)

country_id	name	total
AD	Andorra	29783
AE	United Arab Emirates	145847
AF	Afghanistan	23531
AG	Antigua and Barbuda	1723
AI	Anguilla	1081

In this case the total column reflects the number of observations the API has for each country. The beauty of the country_id parameter lies in that it allows to pass a single character string or a character vector with the same length as the name parameter. An example illustrates this better:

agify(name = c("Hannah", "Ben"),
      country_id = c("US", "GB"),
      simplify = FALSE) %>%
  knitr::kable(row.names = FALSE)

name	type	age	count	country_id
Hannah	age	54	67	US
Ben	age	38	1980	GB

In this previous example we passed two names - Hannah & Ben - and two country codes - US & GB. Thus, the functions allow to pass vectorized vectors - this is especially useful for workflows where we are using a data.frame with a variable with names and another variable containing country codes.

`meta` parameter

All three functions have a parameter defined as meta, which returns information about the API itself, such as:

The amount of names available in the current time window
The number of names left in the current time window
Seconds remaining until a new time window opens

Here’s an example:

genderize(name = "Hannah", 
          simplify = FALSE, 
          meta = TRUE) %>% 
  knitr::kable(row.names = FALSE)

name	type	gender	probability	count	api_rate_limit	api_rate_remaining	api_rate_reset	api_request_timestamp
Hannah	gender	female	0.97	13198	1000	977	7218	2020-05-05 21:59:42

`sliced` parameter

The nationalize(name) function has the useful sliced parameter. Logically, names can have multiple estimated nationalities - and the nationalize(name) function automatically ranks them by probability. This logical parameter allows to ‘slice’/keep only the value with the highest probability to keep a single estimate for each name (one country per name) - and is set by default to TRUE. But you may wish to see all to potential countries a name can be associated to. For this simply set the parameter to FALSE:

nationalize(name = "Matthias", 
            simplify = FALSE, 
            sliced=FALSE) %>% 
  knitr::kable(row.names = FALSE)

name	type	country_id	probability
Matthias	nationality	DE	0.4161638
Matthias	nationality	AT	0.2650625
Matthias	nationality	CH	0.1106922

In the last example you see that instead of returning a single country code, it returns multiple country codes with their associated probability.

Introduction to DemografixeR

Matthias Brenninkmeijer

2020-05-06

Introduction

Get Started

Setup

API credentials

Gender

Age

Nationality

Other parameters

`country_id` parameter

`meta` parameter

`sliced` parameter

Customers example

Further information

Customers:	Maria	Ben	Claudia	Adam	Hannah	Robert
Estimated gender:	female	male	female	male	female	male
Estimated age:	21	48	45	34	27	59
Estimated nationality:	CY	AU	CL	PL	SL	US

Customers:	Maria	Ben	Claudia	Adam	Hannah	Robert
Estimated gender:	female	male	female	male	female	male
Estimated age:	21	48	45	34	27	59
Estimated nationality:	CY	AU	CL	PL	SL	US

Introduction to DemografixeR

Matthias Brenninkmeijer

2020-05-06

Introduction

Get Started

Setup

API credentials

Gender

Age

Nationality

Other parameters

country_id parameter

meta parameter

sliced parameter

Customers example

Further information

`country_id` parameter

`meta` parameter

`sliced` parameter