290 lines
6.4 KiB
Plaintext
290 lines
6.4 KiB
Plaintext
Lesson 6
|
||
========================================================
|
||
|
||
### Welcome
|
||
Notes:
|
||
|
||
***
|
||
|
||
### Scatterplot Review
|
||
|
||
```{r Scatterplot Review}
|
||
|
||
```
|
||
|
||
***
|
||
|
||
### Price and Carat Relationship
|
||
Response:
|
||
|
||
***
|
||
|
||
### Frances Gerety
|
||
Notes:
|
||
|
||
#### A diamonds is
|
||
|
||
|
||
***
|
||
|
||
### The Rise of Diamonds
|
||
Notes:
|
||
|
||
***
|
||
|
||
### ggpairs Function
|
||
Notes:
|
||
|
||
```{r ggpairs Function}
|
||
# install these if necessary
|
||
install.packages('GGally')
|
||
install.packages('scales')
|
||
install.packages('memisc')
|
||
install.packages('lattice')
|
||
install.packages('MASS')
|
||
install.packages('car')
|
||
install.packages('reshape')
|
||
install.packages('plyr')
|
||
|
||
# load the ggplot graphics package and the others
|
||
library(ggplot2)
|
||
library(GGally)
|
||
library(scales)
|
||
library(memisc)
|
||
|
||
# sample 10,000 diamonds from the data set
|
||
set.seed(20022012)
|
||
diamond_samp <- diamonds[sample(1:length(diamonds$price), 10000), ]
|
||
ggpairs(diamond_samp, params = c(shape = I('.'), outlier.shape = I('.')))
|
||
```
|
||
|
||
What are some things you notice in the ggpairs output?
|
||
Response:
|
||
|
||
***
|
||
|
||
### The Demand of Diamonds
|
||
Notes:
|
||
|
||
```{r The Demand of Diamonds}
|
||
|
||
```
|
||
|
||
***
|
||
|
||
### Connecting Demand and Price Distributions
|
||
Notes:
|
||
|
||
***
|
||
|
||
### Scatterplot Transformation
|
||
|
||
```{r Scatterplot Transformation}
|
||
|
||
```
|
||
|
||
|
||
### Create a new function to transform the carat variable
|
||
|
||
```{r cuberoot transformation}
|
||
cuberoot_trans = function() trans_new('cuberoot', transform = function(x) x^(1/3),
|
||
inverse = function(x) x^3)
|
||
```
|
||
|
||
#### Use the cuberoot_trans function
|
||
```{r Use cuberoot_trans}
|
||
ggplot(aes(carat, price), data = diamonds) +
|
||
geom_point() +
|
||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||
ggtitle('Price (log10) by Cube-Root of Carat')
|
||
```
|
||
|
||
***
|
||
|
||
### Overplotting Revisited
|
||
|
||
```{r Sort and Head Tables}
|
||
|
||
```
|
||
|
||
|
||
```{r Overplotting Revisited}
|
||
ggplot(aes(carat, price), data = diamonds) +
|
||
geom_point() +
|
||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||
ggtitle('Price (log10) by Cube-Root of Carat')
|
||
```
|
||
|
||
***
|
||
|
||
### Other Qualitative Factors
|
||
Notes:
|
||
|
||
***
|
||
|
||
### Price vs. Carat and Clarity
|
||
|
||
Alter the code below.
|
||
```{r Price vs. Carat and Clarity}
|
||
# install and load the RColorBrewer package
|
||
install.packages('RColorBrewer')
|
||
library(RColorBrewer)
|
||
|
||
ggplot(aes(x = carat, y = price), data = diamonds) +
|
||
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||
scale_color_brewer(type = 'div',
|
||
guide = guide_legend(title = 'Clarity', reverse = T,
|
||
override.aes = list(alpha = 1, size = 2))) +
|
||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||
ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
|
||
```
|
||
|
||
***
|
||
|
||
### Clarity and Price
|
||
Response:
|
||
|
||
***
|
||
|
||
### Price vs. Carat and Cut
|
||
|
||
Alter the code below.
|
||
```{r Price vs. Carat and Cut}
|
||
ggplot(aes(x = carat, y = price, color = clarity), data = diamonds) +
|
||
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||
scale_color_brewer(type = 'div',
|
||
guide = guide_legend(title = 'Clarity', reverse = T,
|
||
override.aes = list(alpha = 1, size = 2))) +
|
||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||
ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
|
||
```
|
||
|
||
***
|
||
|
||
### Cut and Price
|
||
Response:
|
||
|
||
***
|
||
|
||
### Price vs. Carat and Color
|
||
|
||
Alter the code below.
|
||
```{r Price vs. Carat and Color}
|
||
ggplot(aes(x = carat, y = price, color = cut), data = diamonds) +
|
||
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||
scale_color_brewer(type = 'div',
|
||
guide = guide_legend(title = Cut, reverse = T,
|
||
override.aes = list(alpha = 1, size = 2))) +
|
||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||
ggtitle('Price (log10) by Cube-Root of Carat and Cut')
|
||
```
|
||
|
||
***
|
||
|
||
### Color and Price
|
||
Response:
|
||
|
||
***
|
||
|
||
### Linear Models in R
|
||
Notes:
|
||
|
||
Response:
|
||
|
||
***
|
||
|
||
### Building the Linear Model
|
||
Notes:
|
||
|
||
```{r Building the Linear Model}
|
||
m1 <- lm(I(log(price)) ~ I(carat^(1/3)), data = diamonds)
|
||
m2 <- update(m1, ~ . + carat)
|
||
m3 <- update(m2, ~ . + cut)
|
||
m4 <- update(m3, ~ . + color)
|
||
m5 <- update(m4, ~ . + clarity)
|
||
mtable(m1, m2, m3, m4, m5)
|
||
```
|
||
|
||
Notice how adding cut to our model does not help explain much of the variance
|
||
in the price of diamonds. This fits with out exploration earlier.
|
||
|
||
***
|
||
|
||
### Model Problems
|
||
Video Notes:
|
||
|
||
Research:
|
||
(Take some time to come up with 2-4 problems for the model)
|
||
(You should 10-20 min on this)
|
||
|
||
Response:
|
||
|
||
***
|
||
|
||
### A Bigger, Better Data Set
|
||
Notes:
|
||
|
||
```{r A Bigger, Better Data Set}
|
||
install.package('bitops')
|
||
install.packages('RCurl')
|
||
library('bitops')
|
||
library('RCurl')
|
||
|
||
diamondsurl = getBinaryURL("https://raw.github.com/solomonm/diamonds-data/master/BigDiamonds.Rda")
|
||
load(rawConnection(diamondsurl))
|
||
```
|
||
|
||
The code used to obtain the data is available here:
|
||
https://github.com/solomonm/diamonds-data
|
||
|
||
## Building a Model Using the Big Diamonds Data Set
|
||
Notes:
|
||
|
||
```{r Building a Model Using the Big Diamonds Data Set}
|
||
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## Predictions
|
||
|
||
Example Diamond from BlueNile:
|
||
Round 1.00 Very Good I VS1 $5,601
|
||
|
||
```{r}
|
||
#Be sure you’ve loaded the library memisc and have m5 saved as an object in your workspace.
|
||
thisDiamond = data.frame(carat = 1.00, cut = "V.Good",
|
||
color = "I", clarity="VS1")
|
||
modelEstimate = predict(m5, newdata = thisDiamond,
|
||
interval="prediction", level = .95)
|
||
```
|
||
|
||
Evaluate how well the model predicts the BlueNile diamond's price. Think about the fitted point estimate as well as the 95% CI.
|
||
|
||
***
|
||
|
||
## Final Thoughts
|
||
Notes:
|
||
|
||
***
|
||
|
||
Click **KnitHTML** to see all of your hard work and to have an html
|
||
page of this lesson, your answers, and your notes!
|
||
|