Initial Commit with Project Code
This commit is contained in:
parent
bea57818a2
commit
6f865b5ff5
12
lesson2/What_is_a_RMD_file.Rmd
Normal file
12
lesson2/What_is_a_RMD_file.Rmd
Normal file
@ -0,0 +1,12 @@
|
||||
Title
|
||||
========================================================
|
||||
|
||||
This is an R Markdown document or RMD. Markdown is a simple formatting syntax for authoring web pages (click the **Help** toolbar button for more details on using R Markdown).
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
When you click the **Knit HTML** button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document.
|
||||
261
lesson2/demystifying.R
Normal file
261
lesson2/demystifying.R
Normal file
@ -0,0 +1,261 @@
|
||||
# The goal of this file is to introduce you to the
|
||||
# R programming language. Let's start with by unraveling a
|
||||
# little mystery!
|
||||
|
||||
# 1. Run the code below to create the vector 'udacious'.
|
||||
# You need to highlight all of the lines of the code and then
|
||||
# run it. You should see "udacious" appear in the workspace.
|
||||
|
||||
udacious <- c("Chris Saden", "Lauren Castellano",
|
||||
"Sarah Spikes","Dean Eckles",
|
||||
"Andy Brown", "Moira Burke",
|
||||
"Kunal Chawla")
|
||||
|
||||
# You should see something like "chr[1:7]" in the 'Environment'
|
||||
# or 'Workspace' tab. This is because you created a 'vector' with
|
||||
# 7 names that have a 'type' of character. The arrow-like
|
||||
# '<-' symbol is the assignment operator in R, similar to the
|
||||
# equal sign '=' in other programming languages. The c() is a
|
||||
# generic function that combines arguments, in this case the
|
||||
# names of people, to form a vector.
|
||||
|
||||
# A 'vector' is one of the data types in R. Vectors must contain
|
||||
# the same type of data, that is the entries must all be of the
|
||||
# same type: character (most programmers call these strings),
|
||||
# logical (TRUE or FALSE), or numeric.
|
||||
|
||||
# Print out the vector udacious by running this next line of code.
|
||||
|
||||
udacious
|
||||
|
||||
# Notice how there are numbers next to the output.
|
||||
# Each number corresponds to the index of the entry in the vector.
|
||||
# Chris Saden is the first entry so [1]
|
||||
# Dean Eckles is the fourth entry so [4]
|
||||
# Kunal Chawla is the seventh entry so [7]
|
||||
|
||||
# Depending on the size of you window you may see different numbers
|
||||
# in the output.
|
||||
|
||||
# ANOTHER HELPFUL TIP: You can add values to a vector.
|
||||
# Run each line of code one at a time below to see what is happening.
|
||||
|
||||
numbers <- c(1:10)
|
||||
|
||||
numbers
|
||||
|
||||
numbers <- c(numbers, 11:20)
|
||||
|
||||
numbers
|
||||
|
||||
|
||||
# 2. Replace YOUR_NAME with your actual name in the vector
|
||||
# 'udacious' and run the code. Be sure to use quotes around it.
|
||||
|
||||
udacious <- c("Chris Saden", "Lauren Castellano",
|
||||
"Sarah Spikes","Dean Eckles",
|
||||
"Andy Brown", "Moira Burke",
|
||||
"Kunal Chawla", YOUR_NAME)
|
||||
|
||||
# Notice how R updates 'udacious' in the workspace.
|
||||
# It should now say something like 'chr[1:8]'.
|
||||
|
||||
# 3. Run the following two lines of code. You can highlight both lines
|
||||
# of code and run them.
|
||||
|
||||
mystery = nchar(udacious)
|
||||
mystery
|
||||
|
||||
# You just created a new vector called mystery. What do you
|
||||
# think is in this vector? (scroll down for the answer)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Mystery is a vector that contains the number of characters
|
||||
# for each of the names in udacious, including your name.
|
||||
|
||||
# 4. Run this next line of code.
|
||||
|
||||
mystery == 11
|
||||
|
||||
# Here we get a logical (or boolean) vector that tells us
|
||||
# which locations or indices in the vector contain a name
|
||||
# that has exactly 11 characters.
|
||||
|
||||
# 5. Let's use this boolean vector, mystery, to subset our
|
||||
# udacious vector. What do you think the result will be when
|
||||
# running the line of code below?
|
||||
|
||||
# Think about the output before you run this next line of code.
|
||||
# Notice how there are brackets in the code. Brackets are often
|
||||
# used in R for subsetting.
|
||||
|
||||
udacious[mystery == 11]
|
||||
|
||||
|
||||
# Scroll down for the answer
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# It's your Udacious Instructors for the course!
|
||||
# (and you may be in the output if you're lucky enough
|
||||
# to have 11 characters in YOUR_NAME) Either way, we
|
||||
# think you're pretty udacious for taking this course.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# 6. Alright, all mystery aside...let's dive into some data!
|
||||
# The R installation has a few datasets already built into it
|
||||
# that you can play with. Right now, you'll load one of these,
|
||||
# which is named mtcars.
|
||||
|
||||
# Run this next command to load the mtcars data.
|
||||
|
||||
data(mtcars)
|
||||
|
||||
|
||||
# You should see mtcars appear in the 'Environment' tab with
|
||||
# <Promise> listed next to it.
|
||||
|
||||
# The object (mtcars) appears as a 'Promise' object in the
|
||||
# workspace until we run some code that uses the object.
|
||||
|
||||
# R has stored the mtcars data into a spreadsheet-like object
|
||||
# called a data frame. Run the next command to see what variables
|
||||
# are in the data set and to fully load the data set as an
|
||||
# object in R. You should see <Promise> disappear when you
|
||||
# run the next line of code.
|
||||
|
||||
# Visit http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Promise-objects
|
||||
# if you want the expert insight on Promise objects. You won't
|
||||
# need to the info on Promise objects to be successful in this course.
|
||||
|
||||
names(mtcars)
|
||||
|
||||
# names(mtcars) should output all the variable
|
||||
# names in the data set. You might notice that the car names
|
||||
# are not a variable in the data set. The car names have been saved
|
||||
# as row names. More on this later.
|
||||
|
||||
# You should also see how many observations (obs.) are in the
|
||||
# the data frame and the number of variables on each observation.
|
||||
|
||||
# 7. To get more information on the data set and the variables
|
||||
# run the this next line of code.
|
||||
|
||||
?mtcars
|
||||
|
||||
# You can type a '?' before any command or a data set to learn
|
||||
# more about it. The details and documentation will appear in
|
||||
# the 'Help' tab.
|
||||
|
||||
|
||||
# 8. To print out the data, run this next line as code.
|
||||
|
||||
mtcars
|
||||
|
||||
# Scroll up and down in the console to check out the data.
|
||||
# This is the entire data frame printed out.
|
||||
|
||||
# 9. Run these next two functions, one at a time,
|
||||
# and see if you can figure out what they do.
|
||||
|
||||
str(mtcars)
|
||||
|
||||
dim(mtcars)
|
||||
|
||||
# Scroll down for the answer.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# The first command, str(mtcars), gives us the structure of the
|
||||
# data frame. It lists the variable names, the type of each variable
|
||||
# (all of these variables are numerics) and some values for each
|
||||
# variable.
|
||||
|
||||
|
||||
# The second command, dim(mtcars), should output '[1] 32 11'
|
||||
# to the console. The [1] indicates that 32 is the first value
|
||||
# in the output.
|
||||
|
||||
# R uses 1 to start indexing (AND NOT ZERO BASED INDEXING as is true
|
||||
# of many other programming languages.)
|
||||
|
||||
# 10. Read the documentation for row.names if you're want to know more.
|
||||
?row.names
|
||||
|
||||
# Run this code to see the current row names in the data frame.
|
||||
row.names(mtcars)
|
||||
|
||||
# Run this code to change the row names of the cars to numbers.
|
||||
row.names(mtcars) <- c(1:32)
|
||||
|
||||
# Now print out the data frame by running the code below.
|
||||
mtcars
|
||||
|
||||
# It's tedious to relabel our data frame with the right car names
|
||||
# so let's reload the data set and print out the first ten rows.
|
||||
|
||||
data(mtcars)
|
||||
head(mtcars, 10)
|
||||
|
||||
# The head() function prints out the first six rows of a data frame
|
||||
# by default. Run the code below to see.
|
||||
head(mtcars)
|
||||
|
||||
# I think you'll know what this does.
|
||||
tail(mtcars, 3)
|
||||
|
||||
|
||||
# 11. We've run nine commands so far:
|
||||
# c, nchar, data, str, dim, names, row.names, head, and tail.
|
||||
|
||||
# All of these commands took some inputs or arguments.
|
||||
# To determine if a command takes more arguments or to learn
|
||||
# about any default settings, you can look up the documentation
|
||||
# using '?' before the command, much like you did to learn about
|
||||
# the mtcars data set and the row.names
|
||||
|
||||
|
||||
|
||||
# 12. Let's examine our car data more closely. We can access an
|
||||
# an individual variable (or column) from the data frame using
|
||||
# the '$' sign. Run the code below to print out the variable
|
||||
# miles per gallon. This is the mpg column in the data frame.
|
||||
|
||||
mtcars$mpg
|
||||
|
||||
# Print out any two other variables to the console.
|
||||
|
||||
|
||||
|
||||
# This is a vector containing the mpg (miles per gallon) of
|
||||
# the 32 cars. Run this next line of code to get the average mpg for
|
||||
# for all the cars. What is it?
|
||||
|
||||
# Enter this number for the quiz on the Udacity website.
|
||||
# https://www.udacity.com/course/viewer#!/c-ud651/l-729069797/e-804129314/m-830829287
|
||||
|
||||
mean(mtcars$mpg)
|
||||
|
||||
|
||||
|
||||
179
lesson2/demystifyingR2.Rmd
Normal file
179
lesson2/demystifyingR2.Rmd
Normal file
@ -0,0 +1,179 @@
|
||||
Demystifying R Part 2
|
||||
========================================================
|
||||
|
||||
You might see a warning message just above this file. Something like...
|
||||
"R Markdown requires the knitr package (version 1.2 or higher)"
|
||||
Don't worry about this for now. We'll address it at the end of this file.
|
||||
|
||||
1. Run the following command to see what it does.
|
||||
```{r}
|
||||
summary(mtcars)
|
||||
```
|
||||
|
||||
If you know about quantiles, then the output should look familiar.
|
||||
If not, you probably recognize the min (minimum), median, mean, and max (maximum).
|
||||
We'll go over quantiles in Lesson 3 so don't worry if the output seems overwhelming.
|
||||
|
||||
The str() and summary() functions are helpful commands when working with a new data set.
|
||||
The str() function gives us the variable names and their types.
|
||||
The summary() function gives us an idea of the values a variable can take on.
|
||||
|
||||
2. In 2013, the average mpg (miles per gallon) for a car was 23 mpg.
|
||||
The car models in the mtcars data set come from the year 1973-1974.
|
||||
Subset the data so that you create a new data frame that contains
|
||||
cars that get 23 or more mpg (miles per gallon). Save it to a new data
|
||||
frame called efficient.
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
3. How many cars get more than 23 mpg? Use one of the commands you
|
||||
learned in the demystifying.R to answer this question.
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
4. We can also use logical operators to find out which car(s) get greater
|
||||
than 30 miles per gallon (mpg) and have more than 100 raw horsepower.
|
||||
```{r}
|
||||
subset(mtcars, mpg > 30 & hp > 100)
|
||||
```
|
||||
|
||||
There's only one car that gets more than 30 mpg and 100 hp.
|
||||
|
||||
5. What do you think this code does? Scroll down for the answer.
|
||||
```{r}
|
||||
subset(mtcars, mpg < 14 | disp > 390)
|
||||
```
|
||||
|
||||
Note: You may be familiar with the || operator in Java. R uses one single & for the logical
|
||||
operator AND. It also uses one | for the logical operator OR.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
The command above creates a data frame of cars that have mpg less than 14
|
||||
OR a displacement of more than 390. Only one of the conditions for a car
|
||||
needs to be satisfied so that the car makes it into the subset. Any of the
|
||||
cars that fit the criteria are printed to the console.
|
||||
|
||||
Now you try some.
|
||||
|
||||
6. Print the cars that have a 1/4 mile time (qsec) less than or equal to
|
||||
16.90 seconds to the console.
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
7. Save the subset of cars that weigh under 2000 pounds (weight is measured in lb/1000)
|
||||
to a variable called lightCars. Print the numbers of cars and the subset to the console.
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
8. You can also create new variables in a data frame. Let's say you wanted
|
||||
to have the year of each car's model. We can create the variable
|
||||
mtcars$year. Here we'll assume that all of the models were from 1974.
|
||||
Run the code below.
|
||||
```{r}
|
||||
mtcars$year <- 1974
|
||||
```
|
||||
|
||||
Notice how the number of variables changed in the work space. You can
|
||||
also see the result by double clicking on mtcars in the workspace and
|
||||
examining the data in a table.
|
||||
|
||||
To drop a variable, subset the data frame and select the variable you
|
||||
want to drop with a negative sign in front of it.
|
||||
```{r}
|
||||
mtcars <- subset(mtcars, select = -year)
|
||||
```
|
||||
|
||||
Notice, we are back to 11 variables in the data frame.
|
||||
|
||||
9. What do you think this code does? Run it to find out.
|
||||
```{r}
|
||||
mtcars$year <- c(1973, 1974)
|
||||
```
|
||||
|
||||
Open the table of values to see what values year takes on.
|
||||
|
||||
Drop the year variable from the data set.
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
|
||||
10. Now you are going to get a preview of ifelse(). For those new
|
||||
to programming this example may be confusing. See if you can understand
|
||||
the code by running the commands one line at a time. Read the output and
|
||||
make sense of what the code is doing at each step.
|
||||
|
||||
If you are having trouble don't worry, we will review the ifelse statement
|
||||
at the end of Lesson 3. You won't be quizzed on it, and it's not essential
|
||||
to keep going in this course. We just want you to try to get familiar with
|
||||
more code.
|
||||
```{r}
|
||||
mtcars$wt
|
||||
cond <- mtcars$wt < 3
|
||||
cond
|
||||
mtcars$weight_class <- ifelse(cond, 'light', 'average')
|
||||
mtcars$weight_class
|
||||
cond <- mtcars$wt > 3.5
|
||||
mtcars$weight_class <- ifelse(cond, 'heavy', mtcars$weight_class)
|
||||
mtcars$weight_class
|
||||
```
|
||||
|
||||
You have some variables in your workspace or environment like 'cond' and
|
||||
efficient. You want to be careful that you don't bring in too much data
|
||||
into R at once since R will hold all the data in working memory. We have
|
||||
nothing to worry about here, but let's delete those variables from the
|
||||
work space.
|
||||
|
||||
```{r}
|
||||
rm(cond)
|
||||
rm(efficient)
|
||||
```
|
||||
|
||||
Save this file if you haven't done so yet.
|
||||
|
||||
|
||||
You'll have the opportunity to create one Rmd file for the final project in
|
||||
this class and submit the Rmd file and knitted output (or HTML file). You'll
|
||||
need the knitr package to do that so let's install that now. **Uncomment** the
|
||||
following two lines of code and run them.
|
||||
|
||||
```{r}
|
||||
# install.packages('knitr', dependencies = T)
|
||||
# library(knitr)
|
||||
```
|
||||
|
||||
Once you've installed knitr, **comment** out the two lines of code above.
|
||||
When you click the **Knit HTML** button a web page will be generated that
|
||||
includes both content (text and text formatting from Markdown) as well as
|
||||
the output of any embedded R code chunks within the document.
|
||||
|
||||
|
||||
You've reached the end of the file so now it's time to write some code to
|
||||
answer a question to continue on in Lesson 2.
|
||||
|
||||
Which car(s) have an mpg (miles per gallon) greater than or equal to 30
|
||||
OR hp (horsepower) less than 60? Create an R chunk of code to answer the question.
|
||||
|
||||
|
||||
|
||||
Once you have the answer, go the [Udacity website](https://www.udacity.com/course/viewer#!/c-ud651/l-729069797/e-804129319/m-811719066) to continue with Lesson 2.
|
||||
|
||||
Note: You use brackets around text followed by two parentheses to create a link.
|
||||
There must be no spaces between the brackets and the parentheses. Paste or type
|
||||
the link into the parentheses. This also works on the discussions!
|
||||
|
||||
And if you want to see all of your HARD WORK from this file, click
|
||||
the **KNIT HTML** button now. (You may or may not need to restart R).
|
||||
|
||||
# CONGRATULATIONS
|
||||
#### You'll be exploring data soon with your new knowledge of R.
|
||||
1
lesson2/reddit.csv
Normal file
1
lesson2/reddit.csv
Normal file
File diff suppressed because one or more lines are too long
51
lesson2/stateData.csv
Normal file
51
lesson2/stateData.csv
Normal file
@ -0,0 +1,51 @@
|
||||
"","state.abb","state.area","state.region","population","income","illiteracy","life.exp","murder","highSchoolGrad","frost","area"
|
||||
"Alabama","AL","51609","2","3615","3624","2.1","69.05","15.1","41.3","20","50708"
|
||||
"Alaska","AK","589757","4","365","6315","1.5","69.31","11.3","66.7","152","566432"
|
||||
"Arizona","AZ","113909","4","2212","4530","1.8","70.55","7.8","58.1","15","113417"
|
||||
"Arkansas","AR","53104","2","2110","3378","1.9","70.66","10.1","39.9","65","51945"
|
||||
"California","CA","158693","4","21198","5114","1.1","71.71","10.3","62.6","20","156361"
|
||||
"Colorado","CO","104247","4","2541","4884","0.7","72.06","6.8","63.9","166","103766"
|
||||
"Connecticut","CT","5009","1","3100","5348","1.1","72.48","3.1","56","139","4862"
|
||||
"Delaware","DE","2057","2","579","4809","0.9","70.06","6.2","54.6","103","1982"
|
||||
"Florida","FL","58560","2","8277","4815","1.3","70.66","10.7","52.6","11","54090"
|
||||
"Georgia","GA","58876","2","4931","4091","2","68.54","13.9","40.6","60","58073"
|
||||
"Hawaii","HI","6450","4","868","4963","1.9","73.6","6.2","61.9","0","6425"
|
||||
"Idaho","ID","83557","4","813","4119","0.6","71.87","5.3","59.5","126","82677"
|
||||
"Illinois","IL","56400","3","11197","5107","0.9","70.14","10.3","52.6","127","55748"
|
||||
"Indiana","IN","36291","3","5313","4458","0.7","70.88","7.1","52.9","122","36097"
|
||||
"Iowa","IA","56290","3","2861","4628","0.5","72.56","2.3","59","140","55941"
|
||||
"Kansas","KS","82264","3","2280","4669","0.6","72.58","4.5","59.9","114","81787"
|
||||
"Kentucky","KY","40395","2","3387","3712","1.6","70.1","10.6","38.5","95","39650"
|
||||
"Louisiana","LA","48523","2","3806","3545","2.8","68.76","13.2","42.2","12","44930"
|
||||
"Maine","ME","33215","1","1058","3694","0.7","70.39","2.7","54.7","161","30920"
|
||||
"Maryland","MD","10577","2","4122","5299","0.9","70.22","8.5","52.3","101","9891"
|
||||
"Massachusetts","MA","8257","1","5814","4755","1.1","71.83","3.3","58.5","103","7826"
|
||||
"Michigan","MI","58216","3","9111","4751","0.9","70.63","11.1","52.8","125","56817"
|
||||
"Minnesota","MN","84068","3","3921","4675","0.6","72.96","2.3","57.6","160","79289"
|
||||
"Mississippi","MS","47716","2","2341","3098","2.4","68.09","12.5","41","50","47296"
|
||||
"Missouri","MO","69686","3","4767","4254","0.8","70.69","9.3","48.8","108","68995"
|
||||
"Montana","MT","147138","4","746","4347","0.6","70.56","5","59.2","155","145587"
|
||||
"Nebraska","NE","77227","3","1544","4508","0.6","72.6","2.9","59.3","139","76483"
|
||||
"Nevada","NV","110540","4","590","5149","0.5","69.03","11.5","65.2","188","109889"
|
||||
"New Hampshire","NH","9304","1","812","4281","0.7","71.23","3.3","57.6","174","9027"
|
||||
"New Jersey","NJ","7836","1","7333","5237","1.1","70.93","5.2","52.5","115","7521"
|
||||
"New Mexico","NM","121666","4","1144","3601","2.2","70.32","9.7","55.2","120","121412"
|
||||
"New York","NY","49576","1","18076","4903","1.4","70.55","10.9","52.7","82","47831"
|
||||
"North Carolina","NC","52586","2","5441","3875","1.8","69.21","11.1","38.5","80","48798"
|
||||
"North Dakota","ND","70665","3","637","5087","0.8","72.78","1.4","50.3","186","69273"
|
||||
"Ohio","OH","41222","3","10735","4561","0.8","70.82","7.4","53.2","124","40975"
|
||||
"Oklahoma","OK","69919","2","2715","3983","1.1","71.42","6.4","51.6","82","68782"
|
||||
"Oregon","OR","96981","4","2284","4660","0.6","72.13","4.2","60","44","96184"
|
||||
"Pennsylvania","PA","45333","1","11860","4449","1","70.43","6.1","50.2","126","44966"
|
||||
"Rhode Island","RI","1214","1","931","4558","1.3","71.9","2.4","46.4","127","1049"
|
||||
"South Carolina","SC","31055","2","2816","3635","2.3","67.96","11.6","37.8","65","30225"
|
||||
"South Dakota","SD","77047","3","681","4167","0.5","72.08","1.7","53.3","172","75955"
|
||||
"Tennessee","TN","42244","2","4173","3821","1.7","70.11","11","41.8","70","41328"
|
||||
"Texas","TX","267339","2","12237","4188","2.2","70.9","12.2","47.4","35","262134"
|
||||
"Utah","UT","84916","4","1203","4022","0.6","72.9","4.5","67.3","137","82096"
|
||||
"Vermont","VT","9609","1","472","3907","0.6","71.64","5.5","57.1","168","9267"
|
||||
"Virginia","VA","40815","2","4981","4701","1.4","70.08","9.5","47.8","85","39780"
|
||||
"Washington","WA","68192","4","3559","4864","0.6","71.72","4.3","63.5","32","66570"
|
||||
"West Virginia","WV","24181","2","1799","3617","1.4","69.48","6.7","41.6","100","24070"
|
||||
"Wisconsin","WI","56154","3","4589","4468","0.7","72.48","3","54.5","149","54464"
|
||||
"Wyoming","WY","97914","4","376","4566","0.6","70.29","6.9","62.9","173","97203"
|
||||
|
283
lesson3/lesson3_student.rmd
Normal file
283
lesson3/lesson3_student.rmd
Normal file
@ -0,0 +1,283 @@
|
||||
Lesson 3
|
||||
========================================================
|
||||
|
||||
***
|
||||
|
||||
### What to Do First?
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Pseudo-Facebook User Data
|
||||
Notes:
|
||||
|
||||
```{r Pseudo-Facebook User Data}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Histogram of Users' Birthdays
|
||||
Notes:
|
||||
|
||||
```{r Histogram of Users\' Birthdays}
|
||||
install.packages('ggplot2')
|
||||
library(ggplot2)
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
#### What are some things that you notice about this histogram?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Moira's Investigation
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Estimating Your Audience Size
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
#### Think about a time when you posted a specific message or shared a photo on Facebook. What was it?
|
||||
Response:
|
||||
|
||||
#### How many of your friends do you think saw that post?
|
||||
Response:
|
||||
|
||||
#### Think about what percent of your friends on Facebook see any posts or comments that you make in a month. What percent do you think that is?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Perceived Audience Size
|
||||
Notes:
|
||||
|
||||
***
|
||||
### Faceting
|
||||
Notes:
|
||||
|
||||
```{r Faceting}
|
||||
|
||||
```
|
||||
|
||||
#### Let’s take another look at our plot. What stands out to you here?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Be Skeptical - Outliers and Anomalies
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Moira's Outlier
|
||||
Notes:
|
||||
#### Which case do you think applies to Moira’s outlier?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Friend Count
|
||||
Notes:
|
||||
|
||||
#### What code would you enter to create a histogram of friend counts?
|
||||
|
||||
```{r Friend Count}
|
||||
|
||||
```
|
||||
|
||||
#### How is this plot similar to Moira's first plot?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Limiting the Axes
|
||||
Notes:
|
||||
|
||||
```{r Limiting the Axes}
|
||||
|
||||
```
|
||||
|
||||
### Exploring with Bin Width
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Adjusting the Bin Width
|
||||
Notes:
|
||||
|
||||
### Faceting Friend Count
|
||||
```{r Faceting Friend Count}
|
||||
# What code would you add to create a facet the histogram by gender?
|
||||
# Add it to the code below.
|
||||
qplot(x = friend_count, data = pf, binwidth = 10) +
|
||||
scale_x_continuous(limits = c(0, 1000),
|
||||
breaks = seq(0, 1000, 50))
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Omitting NA Values
|
||||
Notes:
|
||||
|
||||
```{r Omitting NA Values}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Statistics 'by' Gender
|
||||
Notes:
|
||||
|
||||
```{r Statistics \'by\' Gender}
|
||||
|
||||
```
|
||||
|
||||
#### Who on average has more friends: men or women?
|
||||
Response:
|
||||
|
||||
#### What's the difference between the median friend count for women and men?
|
||||
Response:
|
||||
|
||||
#### Why would the median be a better measure than the mean?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Tenure
|
||||
Notes:
|
||||
|
||||
```{r Tenure}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
#### How would you create a histogram of tenure by year?
|
||||
|
||||
```{r Tenure Histogram by Year}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Labeling Plots
|
||||
Notes:
|
||||
|
||||
```{r Labeling Plots}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### User Ages
|
||||
Notes:
|
||||
|
||||
```{r User Ages}
|
||||
|
||||
```
|
||||
|
||||
#### What do you notice?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### The Spread of Memes
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Lada's Money Bag Meme
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Transforming Data
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Add a Scaling Layer
|
||||
Notes:
|
||||
|
||||
```{r Add a Scaling Layer}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
|
||||
### Frequency Polygons
|
||||
|
||||
```{r Frequency Polygons}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Likes on the Web
|
||||
Notes:
|
||||
|
||||
```{r Likes on the Web}
|
||||
|
||||
```
|
||||
|
||||
|
||||
***
|
||||
|
||||
### Box Plots
|
||||
Notes:
|
||||
|
||||
```{r Box Plots}
|
||||
|
||||
```
|
||||
|
||||
#### Adjust the code to focus on users who have friend counts between 0 and 1000.
|
||||
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Box Plots, Quartiles, and Friendships
|
||||
Notes:
|
||||
|
||||
```{r Box Plots, Quartiles, and Friendships}
|
||||
|
||||
```
|
||||
|
||||
#### On average, who initiated more friendships in our sample: men or women?
|
||||
Response:
|
||||
#### Write about some ways that you can verify your answer.
|
||||
Response:
|
||||
```{r Friend Requests by Gender}
|
||||
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Getting Logical
|
||||
Notes:
|
||||
|
||||
```{r Getting Logical}
|
||||
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Analyzing One Variable
|
||||
Reflection:
|
||||
|
||||
***
|
||||
|
||||
Click **KnitHTML** to see all of your hard work and to have an html
|
||||
page of this lesson, your answers, and your notes!
|
||||
99004
lesson3/pseudo_facebook.tsv
Normal file
99004
lesson3/pseudo_facebook.tsv
Normal file
File diff suppressed because it is too large
Load Diff
BIN
lesson4/correlation_images.jpeg
Normal file
BIN
lesson4/correlation_images.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 166 KiB |
268
lesson4/lesson4_student.rmd
Normal file
268
lesson4/lesson4_student.rmd
Normal file
@ -0,0 +1,268 @@
|
||||
Lesson 4
|
||||
========================================================
|
||||
|
||||
***
|
||||
|
||||
### Scatterplots and Perceived Audience Size
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Scatterplots
|
||||
Notes:
|
||||
|
||||
```{r Scatterplots}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
#### What are some things that you notice right away?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### ggplot Syntax
|
||||
Notes:
|
||||
|
||||
```{r ggplot Syntax}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Overplotting
|
||||
Notes:
|
||||
|
||||
```{r Overplotting}
|
||||
|
||||
```
|
||||
|
||||
#### What do you notice in the plot?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Coord_trans()
|
||||
Notes:
|
||||
|
||||
```{r Coord_trans()}
|
||||
|
||||
```
|
||||
|
||||
#### Look up the documentation for coord_trans() and add a layer to the plot that transforms friend_count using the square root function. Create your plot!
|
||||
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
#### What do you notice?
|
||||
|
||||
***
|
||||
|
||||
### Alpha and Jitter
|
||||
Notes:
|
||||
|
||||
```{r Alpha and Jitter}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Overplotting and Domain Knowledge
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Conditional Means
|
||||
Notes:
|
||||
|
||||
```{r Conditional Means}
|
||||
|
||||
```
|
||||
|
||||
Create your plot!
|
||||
|
||||
```{r Conditional Means Plot}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Overlaying Summaries with Raw Data
|
||||
Notes:
|
||||
|
||||
```{r Overlaying Summaries with Raw Data}
|
||||
|
||||
```
|
||||
|
||||
#### What are some of your observations of the plot?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Moira: Histogram Summary and Scatterplot
|
||||
See the Instructor Notes of this video to download Moira's paper on perceived audience size and to see the final plot.
|
||||
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Correlation
|
||||
Notes:
|
||||
|
||||
```{r Correlation}
|
||||
|
||||
```
|
||||
|
||||
Look up the documentation for the cor.test function.
|
||||
|
||||
What's the correlation between age and friend count? Round to three decimal places.
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Correlation on Subsets
|
||||
Notes:
|
||||
|
||||
```{r Correlation on Subsets}
|
||||
with( , cor.test(age, friend_count))
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Correlation Methods
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
## Create Scatterplots
|
||||
Notes:
|
||||
|
||||
```{r}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Strong Correlations
|
||||
Notes:
|
||||
|
||||
```{r Strong Correlations}
|
||||
|
||||
```
|
||||
|
||||
What's the correlation betwen the two variables? Include the top 5% of values for the variable in the calculation and round to 3 decimal places.
|
||||
|
||||
```{r Correlation Calcuation}
|
||||
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Moira on Correlation
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### More Caution with Correlation
|
||||
Notes:
|
||||
|
||||
```{r More Caution With Correlation}
|
||||
install.packages('alr3')
|
||||
library(alr3)
|
||||
```
|
||||
|
||||
Create your plot!
|
||||
|
||||
```{r Temp vs Month}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Noisy Scatterplots
|
||||
a. Take a guess for the correlation coefficient for the scatterplot.
|
||||
|
||||
b. What is the actual correlation of the two variables?
|
||||
(Round to the thousandths place)
|
||||
|
||||
```{r Noisy Scatterplots}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Making Sense of Data
|
||||
Notes:
|
||||
|
||||
```{r Making Sense of Data}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### A New Perspective
|
||||
|
||||
What do you notice?
|
||||
Response:
|
||||
|
||||
Watch the solution video and check out the Instructor Notes!
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Understanding Noise: Age to Age Months
|
||||
Notes:
|
||||
|
||||
```{r Understanding Noise: Age to Age Months}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Age with Months Means
|
||||
|
||||
```{r Age with Months Means}
|
||||
|
||||
```
|
||||
|
||||
Programming Assignment
|
||||
```{r Programming Assignment}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Noise in Conditional Means
|
||||
|
||||
```{r Noise in Conditional Means}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Smoothing Conditional Means
|
||||
Notes:
|
||||
|
||||
```{r Smoothing Conditional Means}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Which Plot to Choose?
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Analyzing Two Variables
|
||||
Reflection:
|
||||
|
||||
***
|
||||
|
||||
Click **KnitHTML** to see all of your hard work and to have an html
|
||||
page of this lesson, your answers, and your notes!
|
||||
|
||||
253
lesson5/lesson5_student.rmd
Normal file
253
lesson5/lesson5_student.rmd
Normal file
@ -0,0 +1,253 @@
|
||||
Lesson 5
|
||||
========================================================
|
||||
|
||||
### Multivariate Data
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Moira Perceived Audience Size Colored by Age
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Third Qualitative Variable
|
||||
Notes:
|
||||
|
||||
```{r Third Qualitative Variable}
|
||||
ggplot(aes(x = gender, y = age),
|
||||
data = subset(pf, !is.na(gender))) + geom_histogram()
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Plotting Conditional Summaries
|
||||
Notes:
|
||||
|
||||
```{r Plotting Conditional Summaries}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Thinking in Ratios
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Wide and Long Format
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Reshaping Data
|
||||
Notes:
|
||||
|
||||
```{r}
|
||||
install.packages('reshape2')
|
||||
library(reshape2)
|
||||
```
|
||||
|
||||
|
||||
***
|
||||
|
||||
### Ratio Plot
|
||||
Notes:
|
||||
|
||||
```{r Ratio Plot}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Third Quantitative Variable
|
||||
Notes:
|
||||
|
||||
```{r Third Quantitative Variable}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Cut a Variable
|
||||
Notes:
|
||||
|
||||
```{r Cut a Variable}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Plotting it All Together
|
||||
Notes:
|
||||
|
||||
```{r Plotting it All Together}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Plot the Grand Mean
|
||||
Notes:
|
||||
|
||||
```{r Plot the Grand Mean}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Friending Rate
|
||||
Notes:
|
||||
|
||||
```{r Friending Rate}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Friendships Initiated
|
||||
Notes:
|
||||
|
||||
What is the median friend rate?
|
||||
|
||||
What is the maximum friend rate?
|
||||
|
||||
```{r Friendships Initiated}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Bias-Variance Tradeoff Revisited
|
||||
Notes:
|
||||
|
||||
```{r Bias-Variance Tradeoff Revisited}
|
||||
|
||||
ggplot(aes(x = tenure, y = friendships_initiated / tenure),
|
||||
data = subset(pf, tenure >= 1)) +
|
||||
geom_line(aes(color = year_joined.bucket),
|
||||
stat = 'summary',
|
||||
fun.y = mean)
|
||||
|
||||
ggplot(aes(x = 7 * round(tenure / 7), y = friendships_initiated / tenure),
|
||||
data = subset(pf, tenure > 0)) +
|
||||
geom_line(aes(color = year_joined.bucket),
|
||||
stat = "summary",
|
||||
fun.y = mean)
|
||||
|
||||
ggplot(aes(x = 30 * round(tenure / 30), y = friendships_initiated / tenure),
|
||||
data = subset(pf, tenure > 0)) +
|
||||
geom_line(aes(color = year_joined.bucket),
|
||||
stat = "summary",
|
||||
fun.y = mean)
|
||||
|
||||
ggplot(aes(x = 90 * round(tenure / 90), y = friendships_initiated / tenure),
|
||||
data = subset(pf, tenure > 0)) +
|
||||
geom_line(aes(color = year_joined.bucket),
|
||||
stat = "summary",
|
||||
fun.y = mean)
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Sean's NFL Fan Sentiment Study
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Introducing the Yogurt Data Set
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Histograms Revisited
|
||||
Notes:
|
||||
|
||||
```{r Histograms Revisited}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Number of Purchases
|
||||
Notes:
|
||||
|
||||
```{r Number of Purchases}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Prices over Time
|
||||
Notes:
|
||||
|
||||
```{r Prices over Time}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Sampling Observations
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Looking at Samples of Households
|
||||
|
||||
```{r Looking at Sample of Households}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### The Limits of Cross Sectional Data
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Many Variables
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Scatterplot Matrix
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Even More Variables
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Heat Maps
|
||||
Notes:
|
||||
|
||||
```{r}
|
||||
nci <- read.table("nci.tsv")
|
||||
colnames(nci) <- c(1:64)
|
||||
```
|
||||
|
||||
```{r}
|
||||
nci.long.samp <- melt(as.matrix(nci[1:200,]))
|
||||
names(nci.long.samp) <- c("gene", "case", "value")
|
||||
head(nci.long.samp)
|
||||
|
||||
ggplot(aes(y = gene, x = case, fill = value),
|
||||
data = nci.long.samp) +
|
||||
geom_tile() +
|
||||
scale_fill_gradientn(colours = colorRampPalette(c("blue", "red"))(100))
|
||||
```
|
||||
|
||||
|
||||
***
|
||||
|
||||
### Analyzing Three of More Variables
|
||||
Reflection:
|
||||
|
||||
***
|
||||
|
||||
Click **KnitHTML** to see all of your hard work and to have an html
|
||||
page of this lesson, your answers, and your notes!
|
||||
|
||||
6830
lesson5/nci.tsv
Normal file
6830
lesson5/nci.tsv
Normal file
File diff suppressed because it is too large
Load Diff
BIN
lesson5/scatterplotMatrix.pdf
Normal file
BIN
lesson5/scatterplotMatrix.pdf
Normal file
Binary file not shown.
2381
lesson5/yogurt.csv
Normal file
2381
lesson5/yogurt.csv
Normal file
File diff suppressed because it is too large
Load Diff
598025
lesson6/diamondsbig.csv
Normal file
598025
lesson6/diamondsbig.csv
Normal file
File diff suppressed because it is too large
Load Diff
BIN
lesson6/ggpairs_landscape.pdf
Normal file
BIN
lesson6/ggpairs_landscape.pdf
Normal file
Binary file not shown.
289
lesson6/lesson6_student.rmd
Normal file
289
lesson6/lesson6_student.rmd
Normal file
@ -0,0 +1,289 @@
|
||||
Lesson 6
|
||||
========================================================
|
||||
|
||||
### Welcome
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Scatterplot Review
|
||||
|
||||
```{r Scatterplot Review}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Price and Carat Relationship
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Frances Gerety
|
||||
Notes:
|
||||
|
||||
#### A diamonds is
|
||||
|
||||
|
||||
***
|
||||
|
||||
### The Rise of Diamonds
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### ggpairs Function
|
||||
Notes:
|
||||
|
||||
```{r ggpairs Function}
|
||||
# install these if necessary
|
||||
install.packages('GGally')
|
||||
install.packages('scales')
|
||||
install.packages('memisc')
|
||||
install.packages('lattice')
|
||||
install.packages('MASS')
|
||||
install.packages('car')
|
||||
install.packages('reshape')
|
||||
install.packages('plyr')
|
||||
|
||||
# load the ggplot graphics package and the others
|
||||
library(ggplot2)
|
||||
library(GGally)
|
||||
library(scales)
|
||||
library(memisc)
|
||||
|
||||
# sample 10,000 diamonds from the data set
|
||||
set.seed(20022012)
|
||||
diamond_samp <- diamonds[sample(1:length(diamonds$price), 10000), ]
|
||||
ggpairs(diamond_samp, params = c(shape = I('.'), outlier.shape = I('.')))
|
||||
```
|
||||
|
||||
What are some things you notice in the ggpairs output?
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### The Demand of Diamonds
|
||||
Notes:
|
||||
|
||||
```{r The Demand of Diamonds}
|
||||
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Connecting Demand and Price Distributions
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Scatterplot Transformation
|
||||
|
||||
```{r Scatterplot Transformation}
|
||||
|
||||
```
|
||||
|
||||
|
||||
### Create a new function to transform the carat variable
|
||||
|
||||
```{r cuberoot transformation}
|
||||
cuberoot_trans = function() trans_new('cuberoot', transform = function(x) x^(1/3),
|
||||
inverse = function(x) x^3)
|
||||
```
|
||||
|
||||
#### Use the cuberoot_trans function
|
||||
```{r Use cuberoot_trans}
|
||||
ggplot(aes(carat, price), data = diamonds) +
|
||||
geom_point() +
|
||||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||
ggtitle('Price (log10) by Cube-Root of Carat')
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Overplotting Revisited
|
||||
|
||||
```{r Sort and Head Tables}
|
||||
|
||||
```
|
||||
|
||||
|
||||
```{r Overplotting Revisited}
|
||||
ggplot(aes(carat, price), data = diamonds) +
|
||||
geom_point() +
|
||||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||
ggtitle('Price (log10) by Cube-Root of Carat')
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Other Qualitative Factors
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
### Price vs. Carat and Clarity
|
||||
|
||||
Alter the code below.
|
||||
```{r Price vs. Carat and Clarity}
|
||||
# install and load the RColorBrewer package
|
||||
install.packages('RColorBrewer')
|
||||
library(RColorBrewer)
|
||||
|
||||
ggplot(aes(x = carat, y = price), data = diamonds) +
|
||||
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||||
scale_color_brewer(type = 'div',
|
||||
guide = guide_legend(title = 'Clarity', reverse = T,
|
||||
override.aes = list(alpha = 1, size = 2))) +
|
||||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||
ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Clarity and Price
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Price vs. Carat and Cut
|
||||
|
||||
Alter the code below.
|
||||
```{r Price vs. Carat and Cut}
|
||||
ggplot(aes(x = carat, y = price, color = clarity), data = diamonds) +
|
||||
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||||
scale_color_brewer(type = 'div',
|
||||
guide = guide_legend(title = 'Clarity', reverse = T,
|
||||
override.aes = list(alpha = 1, size = 2))) +
|
||||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||
ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Cut and Price
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Price vs. Carat and Color
|
||||
|
||||
Alter the code below.
|
||||
```{r Price vs. Carat and Color}
|
||||
ggplot(aes(x = carat, y = price, color = cut), data = diamonds) +
|
||||
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||||
scale_color_brewer(type = 'div',
|
||||
guide = guide_legend(title = Cut, reverse = T,
|
||||
override.aes = list(alpha = 1, size = 2))) +
|
||||
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||
ggtitle('Price (log10) by Cube-Root of Carat and Cut')
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
### Color and Price
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Linear Models in R
|
||||
Notes:
|
||||
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### Building the Linear Model
|
||||
Notes:
|
||||
|
||||
```{r Building the Linear Model}
|
||||
m1 <- lm(I(log(price)) ~ I(carat^(1/3)), data = diamonds)
|
||||
m2 <- update(m1, ~ . + carat)
|
||||
m3 <- update(m2, ~ . + cut)
|
||||
m4 <- update(m3, ~ . + color)
|
||||
m5 <- update(m4, ~ . + clarity)
|
||||
mtable(m1, m2, m3, m4, m5)
|
||||
```
|
||||
|
||||
Notice how adding cut to our model does not help explain much of the variance
|
||||
in the price of diamonds. This fits with out exploration earlier.
|
||||
|
||||
***
|
||||
|
||||
### Model Problems
|
||||
Video Notes:
|
||||
|
||||
Research:
|
||||
(Take some time to come up with 2-4 problems for the model)
|
||||
(You should 10-20 min on this)
|
||||
|
||||
Response:
|
||||
|
||||
***
|
||||
|
||||
### A Bigger, Better Data Set
|
||||
Notes:
|
||||
|
||||
```{r A Bigger, Better Data Set}
|
||||
install.package('bitops')
|
||||
install.packages('RCurl')
|
||||
library('bitops')
|
||||
library('RCurl')
|
||||
|
||||
diamondsurl = getBinaryURL("https://raw.github.com/solomonm/diamonds-data/master/BigDiamonds.Rda")
|
||||
load(rawConnection(diamondsurl))
|
||||
```
|
||||
|
||||
The code used to obtain the data is available here:
|
||||
https://github.com/solomonm/diamonds-data
|
||||
|
||||
## Building a Model Using the Big Diamonds Data Set
|
||||
Notes:
|
||||
|
||||
```{r Building a Model Using the Big Diamonds Data Set}
|
||||
|
||||
```
|
||||
|
||||
|
||||
***
|
||||
|
||||
## Predictions
|
||||
|
||||
Example Diamond from BlueNile:
|
||||
Round 1.00 Very Good I VS1 $5,601
|
||||
|
||||
```{r}
|
||||
#Be sure you’ve loaded the library memisc and have m5 saved as an object in your workspace.
|
||||
thisDiamond = data.frame(carat = 1.00, cut = "V.Good",
|
||||
color = "I", clarity="VS1")
|
||||
modelEstimate = predict(m5, newdata = thisDiamond,
|
||||
interval="prediction", level = .95)
|
||||
```
|
||||
|
||||
Evaluate how well the model predicts the BlueNile diamond's price. Think about the fitted point estimate as well as the 95% CI.
|
||||
|
||||
***
|
||||
|
||||
## Final Thoughts
|
||||
Notes:
|
||||
|
||||
***
|
||||
|
||||
Click **KnitHTML** to see all of your hard work and to have an html
|
||||
page of this lesson, your answers, and your notes!
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user