Lesson 4 half
This commit is contained in:
parent
56f9478edc
commit
99092a5dc4
@ -12,7 +12,11 @@ Notes:
|
|||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
```{r Scatterplots}
|
```{r Scatterplots}
|
||||||
|
library(ggplot2)
|
||||||
|
pf <- read.csv('pseudo_facebook.tsv', sep = '\t')
|
||||||
|
|
||||||
|
ggplot(aes(x = age, y = friend_count), data = pf) +
|
||||||
|
geom_point()
|
||||||
```
|
```
|
||||||
|
|
||||||
***
|
***
|
||||||
@ -20,28 +24,34 @@ Notes:
|
|||||||
#### What are some things that you notice right away?
|
#### What are some things that you notice right away?
|
||||||
Response:
|
Response:
|
||||||
|
|
||||||
***
|
All of the data points are grouped into vertical lines and that the younger the age the more likely they are to have more friends.
|
||||||
|
|
||||||
### ggplot Syntax
|
### ggplot Syntax
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
```{r ggplot Syntax}
|
```{r ggplot Syntax}
|
||||||
|
ggplot(aes(x = age, y = friend_count), data = pf) +
|
||||||
|
geom_point() +
|
||||||
|
xlim(13, 90)
|
||||||
|
|
||||||
|
summary(pf$age)
|
||||||
```
|
```
|
||||||
|
|
||||||
***
|
Build one layer at a time to find errors easier
|
||||||
|
|
||||||
### Overplotting
|
### Overplotting
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
```{r Overplotting}
|
```{r Overplotting}
|
||||||
|
ggplot(aes(x = age, y = friend_count), data = pf) +
|
||||||
|
geom_jitter(alpha = 1/20) +
|
||||||
|
xlim(13, 90)
|
||||||
```
|
```
|
||||||
|
|
||||||
#### What do you notice in the plot?
|
#### What do you notice in the plot?
|
||||||
Response:
|
Response:
|
||||||
|
|
||||||
***
|
The bar for 69 is still clearly visible and it is more obvious that the number generally decreases as the age increases.
|
||||||
|
|
||||||
### Coord_trans()
|
### Coord_trans()
|
||||||
Notes:
|
Notes:
|
||||||
@ -53,18 +63,34 @@ Notes:
|
|||||||
#### Look up the documentation for coord_trans() and add a layer to the plot that transforms friend_count using the square root function. Create your plot!
|
#### Look up the documentation for coord_trans() and add a layer to the plot that transforms friend_count using the square root function. Create your plot!
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
ggplot(aes(x = age, y = friend_count), data = pf) +
|
||||||
|
geom_point(alpha = 1/20) +
|
||||||
|
xlim(13, 90) +
|
||||||
|
coord_trans(y = "sqrt")
|
||||||
```
|
```
|
||||||
|
|
||||||
#### What do you notice?
|
#### What do you notice?
|
||||||
|
|
||||||
***
|
First off coord_trans does not work with geom_jitter, second the datapoints near the bottom are more spread out vertically to present them as more of a focus.
|
||||||
|
|
||||||
|
To use jitter you need more advanced syntax to only jitter the ages, also to prevent possible negatives if 0 is jittered.
|
||||||
|
To do this in `geom_point()` pass `position = position_jitter(h = 0)`
|
||||||
|
|
||||||
|
```{r coord_trans_advanced}
|
||||||
|
ggplot(aes(x = age, y = friend_count), data = pf) +
|
||||||
|
geom_point(alpha = 1/20, position = position_jitter(h = 0)) +
|
||||||
|
xlim(13, 90) +
|
||||||
|
coord_trans(y = "sqrt")
|
||||||
|
```
|
||||||
|
|
||||||
### Alpha and Jitter
|
### Alpha and Jitter
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
```{r Alpha and Jitter}
|
```{r Alpha and Jitter}
|
||||||
|
ggplot(aes(x = age, y = friendships_initiated, color = gender), data = pf) +
|
||||||
|
geom_point(alpha = 1/10, position = position_jitter(h = 0)) +
|
||||||
|
xlim(13, 90) +
|
||||||
|
coord_trans(y = "sqrt")
|
||||||
```
|
```
|
||||||
|
|
||||||
***
|
***
|
||||||
@ -72,34 +98,46 @@ Notes:
|
|||||||
### Overplotting and Domain Knowledge
|
### Overplotting and Domain Knowledge
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
***
|
plotting as a percentage of the whole
|
||||||
|
|
||||||
### Conditional Means
|
### Conditional Means
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
```{r Conditional Means}
|
```{r Conditional Means}
|
||||||
|
library(dplyr)
|
||||||
|
|
||||||
|
age_groups <- group_by(pf, age)
|
||||||
|
pf.fc_by_age <- summarise(age_groups,
|
||||||
|
friend_count_mean = mean(friend_count),
|
||||||
|
friend_count_median = median(friend_count),
|
||||||
|
n = n())
|
||||||
|
pf.fc_by_age <- arrange(pf.fc_by_age, age)
|
||||||
|
|
||||||
|
ggplot(aes(x = age, y = friend_count_mean), data = pf.fc_by_age) +
|
||||||
|
geom_line() +
|
||||||
|
xlim(13,90)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Create your plot!
|
|
||||||
|
|
||||||
```{r Conditional Means Plot}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
***
|
|
||||||
|
|
||||||
### Overlaying Summaries with Raw Data
|
### Overlaying Summaries with Raw Data
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
```{r Overlaying Summaries with Raw Data}
|
```{r Overlaying Summaries with Raw Data}
|
||||||
|
ggplot(aes(x = age, y = friendships_initiated), data = pf) +
|
||||||
|
geom_point(alpha = 1/10, position = position_jitter(h = 0), color = 'orange') +
|
||||||
|
xlim(13, 90) +
|
||||||
|
coord_trans(y = "sqrt") +
|
||||||
|
geom_line(stat = 'summary', fun.y = mean) +
|
||||||
|
geom_line(stat = 'summary', fun.y = median, color = 'blue') +
|
||||||
|
geom_line(stat = 'summary', fun.y = quantile, fun.args = list(probs = 0.1), color = 'red', linetype = 2) +
|
||||||
|
geom_line(stat = 'summary', fun.y = quantile, fun.args = list(probs = 0.9), color = 'red', linetype = 2) +
|
||||||
|
coord_cartesian(xlim = c(13,70), ylim = c(0,1000))
|
||||||
```
|
```
|
||||||
|
|
||||||
#### What are some of your observations of the plot?
|
#### What are some of your observations of the plot?
|
||||||
Response:
|
Response:
|
||||||
|
|
||||||
***
|
I notice that the median is always lower than the mean and that the median is closer to the center of the main body of datapoints. It appears that the data is long tailed towards the high friend counts which pulls the mean upwards.
|
||||||
|
|
||||||
### Moira: Histogram Summary and Scatterplot
|
### Moira: Histogram Summary and Scatterplot
|
||||||
See the Instructor Notes of this video to download Moira's paper on perceived audience size and to see the final plot.
|
See the Instructor Notes of this video to download Moira's paper on perceived audience size and to see the final plot.
|
||||||
|
|||||||
99004
lesson4/pseudo_facebook.tsv
Normal file
99004
lesson4/pseudo_facebook.tsv
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user