Hipster Names

Load a few packages

library(ggplot2)
library(readr)
library(dplyr) 
library(ggvis)

Read in my data

names <- read.csv("D:/NationalNames.csv", stringsAsFactors = FALSE)

I am looking for a name that is quintessential hipster name. Here are the criteria:

Very popular 100 years ago
Very unpopular 30 years ago
Becoming much more popular in the last five years

I started by looking for a name that seems to follow that pattern - Hazel.

hazel <- subset(names, Name == "Hazel" & Gender =="F")
hazel  %>%
select (Name, Year, Count) %>%
ggvis(~Year, ~Count, stroke = ~factor(Name)) %>%
layer_lines()

center

That’s a great start. Hazel reached it’s peak popularity in the late 1910’s and then declined sharply, now it’s back. Let’s try another guess to look for good count thresholds.

violet <- subset(names, Name == "Violet" & Gender =="F")
violet  %>%
select (Name, Year, Count) %>%
ggvis(~Year, ~Count, stroke = ~factor(Name)) %>%
layer_lines()

center

Both names seem to have the same trend.

At least 3000 at some point between 1915 and 1930
Less than 1000 around 1980
More than 2000 at any point after 2010.

That’s the criteria I want to use to look for those names.

df1 <- subset(names , Gender =="F" & Year >= 1915 & Year <= 1935 & Count > 3000)
df2 <- subset(names, Gender == "F" & Year == 1980 & Count <= 1000)
df3 <- subset(names , Gender == "F" & Year >= 2010 & Year <= 2014 & Count > 2000)

Created three data frames to match my criteria.

Now, let’s see how many names will be charted

names   %>%
filter(Gender == 'F', Name %in% df1$Name, Name %in% df2$Name, Name %in% df3$Name)  %>%
select (Name, Year, Count) %>%
ggvis(~Year, ~Count, stroke = ~factor(Name)) %>%
layer_lines()

center

That’s a solid set of names and many seem to fit into the mold that I was envisioning.

Let’s try the same with boy’s names.

df4 <- subset(names , Gender =="M" & Year >= 1915 & Year <= 1935 & Count > 3000)
df5 <- subset(names, Gender == "M" & Year == 1980 & Count <= 1000)
df6 <- subset(names , Gender == "M" & Year >= 2010 & Year <= 2014 & Count > 2000)
names   %>%
filter(Gender == 'M', Name %in% df4$Name, Name %in% df5$Name, Name %in% df6$Name)  %>%
select (Name, Year, Count) %>%
ggvis(~Year, ~Count, stroke = ~factor(Name)) %>%
layer_lines()

center

Well, that’s a much different outcome than I was looking for. Just one name.

I tried to add more names by changing several of each Count threshold.

df4 <- subset(names , Gender =="M" & Year >= 1915 & Year <= 1935 & Count > 1500)
df5 <- subset(names, Gender == "M" & Year == 1980 & Count <= 1000)
df6 <- subset(names , Gender == "M" & Year >= 2010 & Year <= 2014 & Count > 1000)
names   %>%
filter(Gender == 'M', Name %in% df4$Name, Name %in% df5$Name, Name %in% df6$Name)  %>%
select (Name, Year, Count) %>%
ggvis(~Year, ~Count, stroke = ~factor(Name)) %>%
layer_lines()

center

Thanks to a nice comment from FlorianGD, I realized that by halving the lower threshold in df5 I actually made it harder rather than easier for a name to show up. This moved my total number of boys names from 3 to 6. That’s reflected in the new figure above.

Social Science + Data Science

Code Snippets

Hipster Names