Initial Commit with Project Code
This commit is contained in:
parent
bea57818a2
commit
6f865b5ff5
12
lesson2/What_is_a_RMD_file.Rmd
Normal file
12
lesson2/What_is_a_RMD_file.Rmd
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
Title
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
This is an R Markdown document or RMD. Markdown is a simple formatting syntax for authoring web pages (click the **Help** toolbar button for more details on using R Markdown).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
When you click the **Knit HTML** button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document.
|
||||||
261
lesson2/demystifying.R
Normal file
261
lesson2/demystifying.R
Normal file
@ -0,0 +1,261 @@
|
|||||||
|
# The goal of this file is to introduce you to the
|
||||||
|
# R programming language. Let's start with by unraveling a
|
||||||
|
# little mystery!
|
||||||
|
|
||||||
|
# 1. Run the code below to create the vector 'udacious'.
|
||||||
|
# You need to highlight all of the lines of the code and then
|
||||||
|
# run it. You should see "udacious" appear in the workspace.
|
||||||
|
|
||||||
|
udacious <- c("Chris Saden", "Lauren Castellano",
|
||||||
|
"Sarah Spikes","Dean Eckles",
|
||||||
|
"Andy Brown", "Moira Burke",
|
||||||
|
"Kunal Chawla")
|
||||||
|
|
||||||
|
# You should see something like "chr[1:7]" in the 'Environment'
|
||||||
|
# or 'Workspace' tab. This is because you created a 'vector' with
|
||||||
|
# 7 names that have a 'type' of character. The arrow-like
|
||||||
|
# '<-' symbol is the assignment operator in R, similar to the
|
||||||
|
# equal sign '=' in other programming languages. The c() is a
|
||||||
|
# generic function that combines arguments, in this case the
|
||||||
|
# names of people, to form a vector.
|
||||||
|
|
||||||
|
# A 'vector' is one of the data types in R. Vectors must contain
|
||||||
|
# the same type of data, that is the entries must all be of the
|
||||||
|
# same type: character (most programmers call these strings),
|
||||||
|
# logical (TRUE or FALSE), or numeric.
|
||||||
|
|
||||||
|
# Print out the vector udacious by running this next line of code.
|
||||||
|
|
||||||
|
udacious
|
||||||
|
|
||||||
|
# Notice how there are numbers next to the output.
|
||||||
|
# Each number corresponds to the index of the entry in the vector.
|
||||||
|
# Chris Saden is the first entry so [1]
|
||||||
|
# Dean Eckles is the fourth entry so [4]
|
||||||
|
# Kunal Chawla is the seventh entry so [7]
|
||||||
|
|
||||||
|
# Depending on the size of you window you may see different numbers
|
||||||
|
# in the output.
|
||||||
|
|
||||||
|
# ANOTHER HELPFUL TIP: You can add values to a vector.
|
||||||
|
# Run each line of code one at a time below to see what is happening.
|
||||||
|
|
||||||
|
numbers <- c(1:10)
|
||||||
|
|
||||||
|
numbers
|
||||||
|
|
||||||
|
numbers <- c(numbers, 11:20)
|
||||||
|
|
||||||
|
numbers
|
||||||
|
|
||||||
|
|
||||||
|
# 2. Replace YOUR_NAME with your actual name in the vector
|
||||||
|
# 'udacious' and run the code. Be sure to use quotes around it.
|
||||||
|
|
||||||
|
udacious <- c("Chris Saden", "Lauren Castellano",
|
||||||
|
"Sarah Spikes","Dean Eckles",
|
||||||
|
"Andy Brown", "Moira Burke",
|
||||||
|
"Kunal Chawla", YOUR_NAME)
|
||||||
|
|
||||||
|
# Notice how R updates 'udacious' in the workspace.
|
||||||
|
# It should now say something like 'chr[1:8]'.
|
||||||
|
|
||||||
|
# 3. Run the following two lines of code. You can highlight both lines
|
||||||
|
# of code and run them.
|
||||||
|
|
||||||
|
mystery = nchar(udacious)
|
||||||
|
mystery
|
||||||
|
|
||||||
|
# You just created a new vector called mystery. What do you
|
||||||
|
# think is in this vector? (scroll down for the answer)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# Mystery is a vector that contains the number of characters
|
||||||
|
# for each of the names in udacious, including your name.
|
||||||
|
|
||||||
|
# 4. Run this next line of code.
|
||||||
|
|
||||||
|
mystery == 11
|
||||||
|
|
||||||
|
# Here we get a logical (or boolean) vector that tells us
|
||||||
|
# which locations or indices in the vector contain a name
|
||||||
|
# that has exactly 11 characters.
|
||||||
|
|
||||||
|
# 5. Let's use this boolean vector, mystery, to subset our
|
||||||
|
# udacious vector. What do you think the result will be when
|
||||||
|
# running the line of code below?
|
||||||
|
|
||||||
|
# Think about the output before you run this next line of code.
|
||||||
|
# Notice how there are brackets in the code. Brackets are often
|
||||||
|
# used in R for subsetting.
|
||||||
|
|
||||||
|
udacious[mystery == 11]
|
||||||
|
|
||||||
|
|
||||||
|
# Scroll down for the answer
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# It's your Udacious Instructors for the course!
|
||||||
|
# (and you may be in the output if you're lucky enough
|
||||||
|
# to have 11 characters in YOUR_NAME) Either way, we
|
||||||
|
# think you're pretty udacious for taking this course.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# 6. Alright, all mystery aside...let's dive into some data!
|
||||||
|
# The R installation has a few datasets already built into it
|
||||||
|
# that you can play with. Right now, you'll load one of these,
|
||||||
|
# which is named mtcars.
|
||||||
|
|
||||||
|
# Run this next command to load the mtcars data.
|
||||||
|
|
||||||
|
data(mtcars)
|
||||||
|
|
||||||
|
|
||||||
|
# You should see mtcars appear in the 'Environment' tab with
|
||||||
|
# <Promise> listed next to it.
|
||||||
|
|
||||||
|
# The object (mtcars) appears as a 'Promise' object in the
|
||||||
|
# workspace until we run some code that uses the object.
|
||||||
|
|
||||||
|
# R has stored the mtcars data into a spreadsheet-like object
|
||||||
|
# called a data frame. Run the next command to see what variables
|
||||||
|
# are in the data set and to fully load the data set as an
|
||||||
|
# object in R. You should see <Promise> disappear when you
|
||||||
|
# run the next line of code.
|
||||||
|
|
||||||
|
# Visit http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Promise-objects
|
||||||
|
# if you want the expert insight on Promise objects. You won't
|
||||||
|
# need to the info on Promise objects to be successful in this course.
|
||||||
|
|
||||||
|
names(mtcars)
|
||||||
|
|
||||||
|
# names(mtcars) should output all the variable
|
||||||
|
# names in the data set. You might notice that the car names
|
||||||
|
# are not a variable in the data set. The car names have been saved
|
||||||
|
# as row names. More on this later.
|
||||||
|
|
||||||
|
# You should also see how many observations (obs.) are in the
|
||||||
|
# the data frame and the number of variables on each observation.
|
||||||
|
|
||||||
|
# 7. To get more information on the data set and the variables
|
||||||
|
# run the this next line of code.
|
||||||
|
|
||||||
|
?mtcars
|
||||||
|
|
||||||
|
# You can type a '?' before any command or a data set to learn
|
||||||
|
# more about it. The details and documentation will appear in
|
||||||
|
# the 'Help' tab.
|
||||||
|
|
||||||
|
|
||||||
|
# 8. To print out the data, run this next line as code.
|
||||||
|
|
||||||
|
mtcars
|
||||||
|
|
||||||
|
# Scroll up and down in the console to check out the data.
|
||||||
|
# This is the entire data frame printed out.
|
||||||
|
|
||||||
|
# 9. Run these next two functions, one at a time,
|
||||||
|
# and see if you can figure out what they do.
|
||||||
|
|
||||||
|
str(mtcars)
|
||||||
|
|
||||||
|
dim(mtcars)
|
||||||
|
|
||||||
|
# Scroll down for the answer.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# The first command, str(mtcars), gives us the structure of the
|
||||||
|
# data frame. It lists the variable names, the type of each variable
|
||||||
|
# (all of these variables are numerics) and some values for each
|
||||||
|
# variable.
|
||||||
|
|
||||||
|
|
||||||
|
# The second command, dim(mtcars), should output '[1] 32 11'
|
||||||
|
# to the console. The [1] indicates that 32 is the first value
|
||||||
|
# in the output.
|
||||||
|
|
||||||
|
# R uses 1 to start indexing (AND NOT ZERO BASED INDEXING as is true
|
||||||
|
# of many other programming languages.)
|
||||||
|
|
||||||
|
# 10. Read the documentation for row.names if you're want to know more.
|
||||||
|
?row.names
|
||||||
|
|
||||||
|
# Run this code to see the current row names in the data frame.
|
||||||
|
row.names(mtcars)
|
||||||
|
|
||||||
|
# Run this code to change the row names of the cars to numbers.
|
||||||
|
row.names(mtcars) <- c(1:32)
|
||||||
|
|
||||||
|
# Now print out the data frame by running the code below.
|
||||||
|
mtcars
|
||||||
|
|
||||||
|
# It's tedious to relabel our data frame with the right car names
|
||||||
|
# so let's reload the data set and print out the first ten rows.
|
||||||
|
|
||||||
|
data(mtcars)
|
||||||
|
head(mtcars, 10)
|
||||||
|
|
||||||
|
# The head() function prints out the first six rows of a data frame
|
||||||
|
# by default. Run the code below to see.
|
||||||
|
head(mtcars)
|
||||||
|
|
||||||
|
# I think you'll know what this does.
|
||||||
|
tail(mtcars, 3)
|
||||||
|
|
||||||
|
|
||||||
|
# 11. We've run nine commands so far:
|
||||||
|
# c, nchar, data, str, dim, names, row.names, head, and tail.
|
||||||
|
|
||||||
|
# All of these commands took some inputs or arguments.
|
||||||
|
# To determine if a command takes more arguments or to learn
|
||||||
|
# about any default settings, you can look up the documentation
|
||||||
|
# using '?' before the command, much like you did to learn about
|
||||||
|
# the mtcars data set and the row.names
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# 12. Let's examine our car data more closely. We can access an
|
||||||
|
# an individual variable (or column) from the data frame using
|
||||||
|
# the '$' sign. Run the code below to print out the variable
|
||||||
|
# miles per gallon. This is the mpg column in the data frame.
|
||||||
|
|
||||||
|
mtcars$mpg
|
||||||
|
|
||||||
|
# Print out any two other variables to the console.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# This is a vector containing the mpg (miles per gallon) of
|
||||||
|
# the 32 cars. Run this next line of code to get the average mpg for
|
||||||
|
# for all the cars. What is it?
|
||||||
|
|
||||||
|
# Enter this number for the quiz on the Udacity website.
|
||||||
|
# https://www.udacity.com/course/viewer#!/c-ud651/l-729069797/e-804129314/m-830829287
|
||||||
|
|
||||||
|
mean(mtcars$mpg)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
179
lesson2/demystifyingR2.Rmd
Normal file
179
lesson2/demystifyingR2.Rmd
Normal file
@ -0,0 +1,179 @@
|
|||||||
|
Demystifying R Part 2
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
You might see a warning message just above this file. Something like...
|
||||||
|
"R Markdown requires the knitr package (version 1.2 or higher)"
|
||||||
|
Don't worry about this for now. We'll address it at the end of this file.
|
||||||
|
|
||||||
|
1. Run the following command to see what it does.
|
||||||
|
```{r}
|
||||||
|
summary(mtcars)
|
||||||
|
```
|
||||||
|
|
||||||
|
If you know about quantiles, then the output should look familiar.
|
||||||
|
If not, you probably recognize the min (minimum), median, mean, and max (maximum).
|
||||||
|
We'll go over quantiles in Lesson 3 so don't worry if the output seems overwhelming.
|
||||||
|
|
||||||
|
The str() and summary() functions are helpful commands when working with a new data set.
|
||||||
|
The str() function gives us the variable names and their types.
|
||||||
|
The summary() function gives us an idea of the values a variable can take on.
|
||||||
|
|
||||||
|
2. In 2013, the average mpg (miles per gallon) for a car was 23 mpg.
|
||||||
|
The car models in the mtcars data set come from the year 1973-1974.
|
||||||
|
Subset the data so that you create a new data frame that contains
|
||||||
|
cars that get 23 or more mpg (miles per gallon). Save it to a new data
|
||||||
|
frame called efficient.
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
3. How many cars get more than 23 mpg? Use one of the commands you
|
||||||
|
learned in the demystifying.R to answer this question.
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
4. We can also use logical operators to find out which car(s) get greater
|
||||||
|
than 30 miles per gallon (mpg) and have more than 100 raw horsepower.
|
||||||
|
```{r}
|
||||||
|
subset(mtcars, mpg > 30 & hp > 100)
|
||||||
|
```
|
||||||
|
|
||||||
|
There's only one car that gets more than 30 mpg and 100 hp.
|
||||||
|
|
||||||
|
5. What do you think this code does? Scroll down for the answer.
|
||||||
|
```{r}
|
||||||
|
subset(mtcars, mpg < 14 | disp > 390)
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: You may be familiar with the || operator in Java. R uses one single & for the logical
|
||||||
|
operator AND. It also uses one | for the logical operator OR.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
The command above creates a data frame of cars that have mpg less than 14
|
||||||
|
OR a displacement of more than 390. Only one of the conditions for a car
|
||||||
|
needs to be satisfied so that the car makes it into the subset. Any of the
|
||||||
|
cars that fit the criteria are printed to the console.
|
||||||
|
|
||||||
|
Now you try some.
|
||||||
|
|
||||||
|
6. Print the cars that have a 1/4 mile time (qsec) less than or equal to
|
||||||
|
16.90 seconds to the console.
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Save the subset of cars that weigh under 2000 pounds (weight is measured in lb/1000)
|
||||||
|
to a variable called lightCars. Print the numbers of cars and the subset to the console.
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
8. You can also create new variables in a data frame. Let's say you wanted
|
||||||
|
to have the year of each car's model. We can create the variable
|
||||||
|
mtcars$year. Here we'll assume that all of the models were from 1974.
|
||||||
|
Run the code below.
|
||||||
|
```{r}
|
||||||
|
mtcars$year <- 1974
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice how the number of variables changed in the work space. You can
|
||||||
|
also see the result by double clicking on mtcars in the workspace and
|
||||||
|
examining the data in a table.
|
||||||
|
|
||||||
|
To drop a variable, subset the data frame and select the variable you
|
||||||
|
want to drop with a negative sign in front of it.
|
||||||
|
```{r}
|
||||||
|
mtcars <- subset(mtcars, select = -year)
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice, we are back to 11 variables in the data frame.
|
||||||
|
|
||||||
|
9. What do you think this code does? Run it to find out.
|
||||||
|
```{r}
|
||||||
|
mtcars$year <- c(1973, 1974)
|
||||||
|
```
|
||||||
|
|
||||||
|
Open the table of values to see what values year takes on.
|
||||||
|
|
||||||
|
Drop the year variable from the data set.
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
10. Now you are going to get a preview of ifelse(). For those new
|
||||||
|
to programming this example may be confusing. See if you can understand
|
||||||
|
the code by running the commands one line at a time. Read the output and
|
||||||
|
make sense of what the code is doing at each step.
|
||||||
|
|
||||||
|
If you are having trouble don't worry, we will review the ifelse statement
|
||||||
|
at the end of Lesson 3. You won't be quizzed on it, and it's not essential
|
||||||
|
to keep going in this course. We just want you to try to get familiar with
|
||||||
|
more code.
|
||||||
|
```{r}
|
||||||
|
mtcars$wt
|
||||||
|
cond <- mtcars$wt < 3
|
||||||
|
cond
|
||||||
|
mtcars$weight_class <- ifelse(cond, 'light', 'average')
|
||||||
|
mtcars$weight_class
|
||||||
|
cond <- mtcars$wt > 3.5
|
||||||
|
mtcars$weight_class <- ifelse(cond, 'heavy', mtcars$weight_class)
|
||||||
|
mtcars$weight_class
|
||||||
|
```
|
||||||
|
|
||||||
|
You have some variables in your workspace or environment like 'cond' and
|
||||||
|
efficient. You want to be careful that you don't bring in too much data
|
||||||
|
into R at once since R will hold all the data in working memory. We have
|
||||||
|
nothing to worry about here, but let's delete those variables from the
|
||||||
|
work space.
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
rm(cond)
|
||||||
|
rm(efficient)
|
||||||
|
```
|
||||||
|
|
||||||
|
Save this file if you haven't done so yet.
|
||||||
|
|
||||||
|
|
||||||
|
You'll have the opportunity to create one Rmd file for the final project in
|
||||||
|
this class and submit the Rmd file and knitted output (or HTML file). You'll
|
||||||
|
need the knitr package to do that so let's install that now. **Uncomment** the
|
||||||
|
following two lines of code and run them.
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
# install.packages('knitr', dependencies = T)
|
||||||
|
# library(knitr)
|
||||||
|
```
|
||||||
|
|
||||||
|
Once you've installed knitr, **comment** out the two lines of code above.
|
||||||
|
When you click the **Knit HTML** button a web page will be generated that
|
||||||
|
includes both content (text and text formatting from Markdown) as well as
|
||||||
|
the output of any embedded R code chunks within the document.
|
||||||
|
|
||||||
|
|
||||||
|
You've reached the end of the file so now it's time to write some code to
|
||||||
|
answer a question to continue on in Lesson 2.
|
||||||
|
|
||||||
|
Which car(s) have an mpg (miles per gallon) greater than or equal to 30
|
||||||
|
OR hp (horsepower) less than 60? Create an R chunk of code to answer the question.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Once you have the answer, go the [Udacity website](https://www.udacity.com/course/viewer#!/c-ud651/l-729069797/e-804129319/m-811719066) to continue with Lesson 2.
|
||||||
|
|
||||||
|
Note: You use brackets around text followed by two parentheses to create a link.
|
||||||
|
There must be no spaces between the brackets and the parentheses. Paste or type
|
||||||
|
the link into the parentheses. This also works on the discussions!
|
||||||
|
|
||||||
|
And if you want to see all of your HARD WORK from this file, click
|
||||||
|
the **KNIT HTML** button now. (You may or may not need to restart R).
|
||||||
|
|
||||||
|
# CONGRATULATIONS
|
||||||
|
#### You'll be exploring data soon with your new knowledge of R.
|
||||||
1
lesson2/reddit.csv
Normal file
1
lesson2/reddit.csv
Normal file
File diff suppressed because one or more lines are too long
51
lesson2/stateData.csv
Normal file
51
lesson2/stateData.csv
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
"","state.abb","state.area","state.region","population","income","illiteracy","life.exp","murder","highSchoolGrad","frost","area"
|
||||||
|
"Alabama","AL","51609","2","3615","3624","2.1","69.05","15.1","41.3","20","50708"
|
||||||
|
"Alaska","AK","589757","4","365","6315","1.5","69.31","11.3","66.7","152","566432"
|
||||||
|
"Arizona","AZ","113909","4","2212","4530","1.8","70.55","7.8","58.1","15","113417"
|
||||||
|
"Arkansas","AR","53104","2","2110","3378","1.9","70.66","10.1","39.9","65","51945"
|
||||||
|
"California","CA","158693","4","21198","5114","1.1","71.71","10.3","62.6","20","156361"
|
||||||
|
"Colorado","CO","104247","4","2541","4884","0.7","72.06","6.8","63.9","166","103766"
|
||||||
|
"Connecticut","CT","5009","1","3100","5348","1.1","72.48","3.1","56","139","4862"
|
||||||
|
"Delaware","DE","2057","2","579","4809","0.9","70.06","6.2","54.6","103","1982"
|
||||||
|
"Florida","FL","58560","2","8277","4815","1.3","70.66","10.7","52.6","11","54090"
|
||||||
|
"Georgia","GA","58876","2","4931","4091","2","68.54","13.9","40.6","60","58073"
|
||||||
|
"Hawaii","HI","6450","4","868","4963","1.9","73.6","6.2","61.9","0","6425"
|
||||||
|
"Idaho","ID","83557","4","813","4119","0.6","71.87","5.3","59.5","126","82677"
|
||||||
|
"Illinois","IL","56400","3","11197","5107","0.9","70.14","10.3","52.6","127","55748"
|
||||||
|
"Indiana","IN","36291","3","5313","4458","0.7","70.88","7.1","52.9","122","36097"
|
||||||
|
"Iowa","IA","56290","3","2861","4628","0.5","72.56","2.3","59","140","55941"
|
||||||
|
"Kansas","KS","82264","3","2280","4669","0.6","72.58","4.5","59.9","114","81787"
|
||||||
|
"Kentucky","KY","40395","2","3387","3712","1.6","70.1","10.6","38.5","95","39650"
|
||||||
|
"Louisiana","LA","48523","2","3806","3545","2.8","68.76","13.2","42.2","12","44930"
|
||||||
|
"Maine","ME","33215","1","1058","3694","0.7","70.39","2.7","54.7","161","30920"
|
||||||
|
"Maryland","MD","10577","2","4122","5299","0.9","70.22","8.5","52.3","101","9891"
|
||||||
|
"Massachusetts","MA","8257","1","5814","4755","1.1","71.83","3.3","58.5","103","7826"
|
||||||
|
"Michigan","MI","58216","3","9111","4751","0.9","70.63","11.1","52.8","125","56817"
|
||||||
|
"Minnesota","MN","84068","3","3921","4675","0.6","72.96","2.3","57.6","160","79289"
|
||||||
|
"Mississippi","MS","47716","2","2341","3098","2.4","68.09","12.5","41","50","47296"
|
||||||
|
"Missouri","MO","69686","3","4767","4254","0.8","70.69","9.3","48.8","108","68995"
|
||||||
|
"Montana","MT","147138","4","746","4347","0.6","70.56","5","59.2","155","145587"
|
||||||
|
"Nebraska","NE","77227","3","1544","4508","0.6","72.6","2.9","59.3","139","76483"
|
||||||
|
"Nevada","NV","110540","4","590","5149","0.5","69.03","11.5","65.2","188","109889"
|
||||||
|
"New Hampshire","NH","9304","1","812","4281","0.7","71.23","3.3","57.6","174","9027"
|
||||||
|
"New Jersey","NJ","7836","1","7333","5237","1.1","70.93","5.2","52.5","115","7521"
|
||||||
|
"New Mexico","NM","121666","4","1144","3601","2.2","70.32","9.7","55.2","120","121412"
|
||||||
|
"New York","NY","49576","1","18076","4903","1.4","70.55","10.9","52.7","82","47831"
|
||||||
|
"North Carolina","NC","52586","2","5441","3875","1.8","69.21","11.1","38.5","80","48798"
|
||||||
|
"North Dakota","ND","70665","3","637","5087","0.8","72.78","1.4","50.3","186","69273"
|
||||||
|
"Ohio","OH","41222","3","10735","4561","0.8","70.82","7.4","53.2","124","40975"
|
||||||
|
"Oklahoma","OK","69919","2","2715","3983","1.1","71.42","6.4","51.6","82","68782"
|
||||||
|
"Oregon","OR","96981","4","2284","4660","0.6","72.13","4.2","60","44","96184"
|
||||||
|
"Pennsylvania","PA","45333","1","11860","4449","1","70.43","6.1","50.2","126","44966"
|
||||||
|
"Rhode Island","RI","1214","1","931","4558","1.3","71.9","2.4","46.4","127","1049"
|
||||||
|
"South Carolina","SC","31055","2","2816","3635","2.3","67.96","11.6","37.8","65","30225"
|
||||||
|
"South Dakota","SD","77047","3","681","4167","0.5","72.08","1.7","53.3","172","75955"
|
||||||
|
"Tennessee","TN","42244","2","4173","3821","1.7","70.11","11","41.8","70","41328"
|
||||||
|
"Texas","TX","267339","2","12237","4188","2.2","70.9","12.2","47.4","35","262134"
|
||||||
|
"Utah","UT","84916","4","1203","4022","0.6","72.9","4.5","67.3","137","82096"
|
||||||
|
"Vermont","VT","9609","1","472","3907","0.6","71.64","5.5","57.1","168","9267"
|
||||||
|
"Virginia","VA","40815","2","4981","4701","1.4","70.08","9.5","47.8","85","39780"
|
||||||
|
"Washington","WA","68192","4","3559","4864","0.6","71.72","4.3","63.5","32","66570"
|
||||||
|
"West Virginia","WV","24181","2","1799","3617","1.4","69.48","6.7","41.6","100","24070"
|
||||||
|
"Wisconsin","WI","56154","3","4589","4468","0.7","72.48","3","54.5","149","54464"
|
||||||
|
"Wyoming","WY","97914","4","376","4566","0.6","70.29","6.9","62.9","173","97203"
|
||||||
|
283
lesson3/lesson3_student.rmd
Normal file
283
lesson3/lesson3_student.rmd
Normal file
@ -0,0 +1,283 @@
|
|||||||
|
Lesson 3
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### What to Do First?
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Pseudo-Facebook User Data
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Pseudo-Facebook User Data}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Histogram of Users' Birthdays
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Histogram of Users\' Birthdays}
|
||||||
|
install.packages('ggplot2')
|
||||||
|
library(ggplot2)
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
#### What are some things that you notice about this histogram?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Moira's Investigation
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Estimating Your Audience Size
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
#### Think about a time when you posted a specific message or shared a photo on Facebook. What was it?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
#### How many of your friends do you think saw that post?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
#### Think about what percent of your friends on Facebook see any posts or comments that you make in a month. What percent do you think that is?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Perceived Audience Size
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
### Faceting
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Faceting}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Let’s take another look at our plot. What stands out to you here?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Be Skeptical - Outliers and Anomalies
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Moira's Outlier
|
||||||
|
Notes:
|
||||||
|
#### Which case do you think applies to Moira’s outlier?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Friend Count
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
#### What code would you enter to create a histogram of friend counts?
|
||||||
|
|
||||||
|
```{r Friend Count}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### How is this plot similar to Moira's first plot?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Limiting the Axes
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Limiting the Axes}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exploring with Bin Width
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Adjusting the Bin Width
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
### Faceting Friend Count
|
||||||
|
```{r Faceting Friend Count}
|
||||||
|
# What code would you add to create a facet the histogram by gender?
|
||||||
|
# Add it to the code below.
|
||||||
|
qplot(x = friend_count, data = pf, binwidth = 10) +
|
||||||
|
scale_x_continuous(limits = c(0, 1000),
|
||||||
|
breaks = seq(0, 1000, 50))
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Omitting NA Values
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Omitting NA Values}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Statistics 'by' Gender
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Statistics \'by\' Gender}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Who on average has more friends: men or women?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
#### What's the difference between the median friend count for women and men?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
#### Why would the median be a better measure than the mean?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Tenure
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Tenure}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
#### How would you create a histogram of tenure by year?
|
||||||
|
|
||||||
|
```{r Tenure Histogram by Year}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Labeling Plots
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Labeling Plots}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### User Ages
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r User Ages}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### What do you notice?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### The Spread of Memes
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Lada's Money Bag Meme
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Transforming Data
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Add a Scaling Layer
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Add a Scaling Layer}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
|
||||||
|
### Frequency Polygons
|
||||||
|
|
||||||
|
```{r Frequency Polygons}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Likes on the Web
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Likes on the Web}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Box Plots
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Box Plots}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Adjust the code to focus on users who have friend counts between 0 and 1000.
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Box Plots, Quartiles, and Friendships
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Box Plots, Quartiles, and Friendships}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### On average, who initiated more friendships in our sample: men or women?
|
||||||
|
Response:
|
||||||
|
#### Write about some ways that you can verify your answer.
|
||||||
|
Response:
|
||||||
|
```{r Friend Requests by Gender}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Getting Logical
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Getting Logical}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Analyzing One Variable
|
||||||
|
Reflection:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
Click **KnitHTML** to see all of your hard work and to have an html
|
||||||
|
page of this lesson, your answers, and your notes!
|
||||||
99004
lesson3/pseudo_facebook.tsv
Normal file
99004
lesson3/pseudo_facebook.tsv
Normal file
File diff suppressed because it is too large
Load Diff
BIN
lesson4/correlation_images.jpeg
Normal file
BIN
lesson4/correlation_images.jpeg
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 166 KiB |
268
lesson4/lesson4_student.rmd
Normal file
268
lesson4/lesson4_student.rmd
Normal file
@ -0,0 +1,268 @@
|
|||||||
|
Lesson 4
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Scatterplots and Perceived Audience Size
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Scatterplots
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Scatterplots}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
#### What are some things that you notice right away?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### ggplot Syntax
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r ggplot Syntax}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Overplotting
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Overplotting}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### What do you notice in the plot?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Coord_trans()
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Coord_trans()}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Look up the documentation for coord_trans() and add a layer to the plot that transforms friend_count using the square root function. Create your plot!
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### What do you notice?
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Alpha and Jitter
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Alpha and Jitter}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Overplotting and Domain Knowledge
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Conditional Means
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Conditional Means}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Create your plot!
|
||||||
|
|
||||||
|
```{r Conditional Means Plot}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Overlaying Summaries with Raw Data
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Overlaying Summaries with Raw Data}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### What are some of your observations of the plot?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Moira: Histogram Summary and Scatterplot
|
||||||
|
See the Instructor Notes of this video to download Moira's paper on perceived audience size and to see the final plot.
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Correlation
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Correlation}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Look up the documentation for the cor.test function.
|
||||||
|
|
||||||
|
What's the correlation between age and friend count? Round to three decimal places.
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Correlation on Subsets
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Correlation on Subsets}
|
||||||
|
with( , cor.test(age, friend_count))
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Correlation Methods
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## Create Scatterplots
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Strong Correlations
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Strong Correlations}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
What's the correlation betwen the two variables? Include the top 5% of values for the variable in the calculation and round to 3 decimal places.
|
||||||
|
|
||||||
|
```{r Correlation Calcuation}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Moira on Correlation
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### More Caution with Correlation
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r More Caution With Correlation}
|
||||||
|
install.packages('alr3')
|
||||||
|
library(alr3)
|
||||||
|
```
|
||||||
|
|
||||||
|
Create your plot!
|
||||||
|
|
||||||
|
```{r Temp vs Month}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Noisy Scatterplots
|
||||||
|
a. Take a guess for the correlation coefficient for the scatterplot.
|
||||||
|
|
||||||
|
b. What is the actual correlation of the two variables?
|
||||||
|
(Round to the thousandths place)
|
||||||
|
|
||||||
|
```{r Noisy Scatterplots}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Making Sense of Data
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Making Sense of Data}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### A New Perspective
|
||||||
|
|
||||||
|
What do you notice?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
Watch the solution video and check out the Instructor Notes!
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Understanding Noise: Age to Age Months
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Understanding Noise: Age to Age Months}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Age with Months Means
|
||||||
|
|
||||||
|
```{r Age with Months Means}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
Programming Assignment
|
||||||
|
```{r Programming Assignment}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Noise in Conditional Means
|
||||||
|
|
||||||
|
```{r Noise in Conditional Means}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Smoothing Conditional Means
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Smoothing Conditional Means}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Which Plot to Choose?
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Analyzing Two Variables
|
||||||
|
Reflection:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
Click **KnitHTML** to see all of your hard work and to have an html
|
||||||
|
page of this lesson, your answers, and your notes!
|
||||||
|
|
||||||
253
lesson5/lesson5_student.rmd
Normal file
253
lesson5/lesson5_student.rmd
Normal file
@ -0,0 +1,253 @@
|
|||||||
|
Lesson 5
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
### Multivariate Data
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Moira Perceived Audience Size Colored by Age
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Third Qualitative Variable
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Third Qualitative Variable}
|
||||||
|
ggplot(aes(x = gender, y = age),
|
||||||
|
data = subset(pf, !is.na(gender))) + geom_histogram()
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Plotting Conditional Summaries
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Plotting Conditional Summaries}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Thinking in Ratios
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Wide and Long Format
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Reshaping Data
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
install.packages('reshape2')
|
||||||
|
library(reshape2)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Ratio Plot
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Ratio Plot}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Third Quantitative Variable
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Third Quantitative Variable}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Cut a Variable
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Cut a Variable}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Plotting it All Together
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Plotting it All Together}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Plot the Grand Mean
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Plot the Grand Mean}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Friending Rate
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Friending Rate}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Friendships Initiated
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
What is the median friend rate?
|
||||||
|
|
||||||
|
What is the maximum friend rate?
|
||||||
|
|
||||||
|
```{r Friendships Initiated}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Bias-Variance Tradeoff Revisited
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Bias-Variance Tradeoff Revisited}
|
||||||
|
|
||||||
|
ggplot(aes(x = tenure, y = friendships_initiated / tenure),
|
||||||
|
data = subset(pf, tenure >= 1)) +
|
||||||
|
geom_line(aes(color = year_joined.bucket),
|
||||||
|
stat = 'summary',
|
||||||
|
fun.y = mean)
|
||||||
|
|
||||||
|
ggplot(aes(x = 7 * round(tenure / 7), y = friendships_initiated / tenure),
|
||||||
|
data = subset(pf, tenure > 0)) +
|
||||||
|
geom_line(aes(color = year_joined.bucket),
|
||||||
|
stat = "summary",
|
||||||
|
fun.y = mean)
|
||||||
|
|
||||||
|
ggplot(aes(x = 30 * round(tenure / 30), y = friendships_initiated / tenure),
|
||||||
|
data = subset(pf, tenure > 0)) +
|
||||||
|
geom_line(aes(color = year_joined.bucket),
|
||||||
|
stat = "summary",
|
||||||
|
fun.y = mean)
|
||||||
|
|
||||||
|
ggplot(aes(x = 90 * round(tenure / 90), y = friendships_initiated / tenure),
|
||||||
|
data = subset(pf, tenure > 0)) +
|
||||||
|
geom_line(aes(color = year_joined.bucket),
|
||||||
|
stat = "summary",
|
||||||
|
fun.y = mean)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Sean's NFL Fan Sentiment Study
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Introducing the Yogurt Data Set
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Histograms Revisited
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Histograms Revisited}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Number of Purchases
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Number of Purchases}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Prices over Time
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Prices over Time}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Sampling Observations
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Looking at Samples of Households
|
||||||
|
|
||||||
|
```{r Looking at Sample of Households}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### The Limits of Cross Sectional Data
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Many Variables
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Scatterplot Matrix
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Even More Variables
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Heat Maps
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
nci <- read.table("nci.tsv")
|
||||||
|
colnames(nci) <- c(1:64)
|
||||||
|
```
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
nci.long.samp <- melt(as.matrix(nci[1:200,]))
|
||||||
|
names(nci.long.samp) <- c("gene", "case", "value")
|
||||||
|
head(nci.long.samp)
|
||||||
|
|
||||||
|
ggplot(aes(y = gene, x = case, fill = value),
|
||||||
|
data = nci.long.samp) +
|
||||||
|
geom_tile() +
|
||||||
|
scale_fill_gradientn(colours = colorRampPalette(c("blue", "red"))(100))
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Analyzing Three of More Variables
|
||||||
|
Reflection:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
Click **KnitHTML** to see all of your hard work and to have an html
|
||||||
|
page of this lesson, your answers, and your notes!
|
||||||
|
|
||||||
6830
lesson5/nci.tsv
Normal file
6830
lesson5/nci.tsv
Normal file
File diff suppressed because it is too large
Load Diff
BIN
lesson5/scatterplotMatrix.pdf
Normal file
BIN
lesson5/scatterplotMatrix.pdf
Normal file
Binary file not shown.
2381
lesson5/yogurt.csv
Normal file
2381
lesson5/yogurt.csv
Normal file
File diff suppressed because it is too large
Load Diff
598025
lesson6/diamondsbig.csv
Normal file
598025
lesson6/diamondsbig.csv
Normal file
File diff suppressed because it is too large
Load Diff
BIN
lesson6/ggpairs_landscape.pdf
Normal file
BIN
lesson6/ggpairs_landscape.pdf
Normal file
Binary file not shown.
289
lesson6/lesson6_student.rmd
Normal file
289
lesson6/lesson6_student.rmd
Normal file
@ -0,0 +1,289 @@
|
|||||||
|
Lesson 6
|
||||||
|
========================================================
|
||||||
|
|
||||||
|
### Welcome
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Scatterplot Review
|
||||||
|
|
||||||
|
```{r Scatterplot Review}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Price and Carat Relationship
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Frances Gerety
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
#### A diamonds is
|
||||||
|
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### The Rise of Diamonds
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### ggpairs Function
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r ggpairs Function}
|
||||||
|
# install these if necessary
|
||||||
|
install.packages('GGally')
|
||||||
|
install.packages('scales')
|
||||||
|
install.packages('memisc')
|
||||||
|
install.packages('lattice')
|
||||||
|
install.packages('MASS')
|
||||||
|
install.packages('car')
|
||||||
|
install.packages('reshape')
|
||||||
|
install.packages('plyr')
|
||||||
|
|
||||||
|
# load the ggplot graphics package and the others
|
||||||
|
library(ggplot2)
|
||||||
|
library(GGally)
|
||||||
|
library(scales)
|
||||||
|
library(memisc)
|
||||||
|
|
||||||
|
# sample 10,000 diamonds from the data set
|
||||||
|
set.seed(20022012)
|
||||||
|
diamond_samp <- diamonds[sample(1:length(diamonds$price), 10000), ]
|
||||||
|
ggpairs(diamond_samp, params = c(shape = I('.'), outlier.shape = I('.')))
|
||||||
|
```
|
||||||
|
|
||||||
|
What are some things you notice in the ggpairs output?
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### The Demand of Diamonds
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r The Demand of Diamonds}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Connecting Demand and Price Distributions
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Scatterplot Transformation
|
||||||
|
|
||||||
|
```{r Scatterplot Transformation}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### Create a new function to transform the carat variable
|
||||||
|
|
||||||
|
```{r cuberoot transformation}
|
||||||
|
cuberoot_trans = function() trans_new('cuberoot', transform = function(x) x^(1/3),
|
||||||
|
inverse = function(x) x^3)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Use the cuberoot_trans function
|
||||||
|
```{r Use cuberoot_trans}
|
||||||
|
ggplot(aes(carat, price), data = diamonds) +
|
||||||
|
geom_point() +
|
||||||
|
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||||
|
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||||
|
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||||
|
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||||
|
ggtitle('Price (log10) by Cube-Root of Carat')
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Overplotting Revisited
|
||||||
|
|
||||||
|
```{r Sort and Head Tables}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
```{r Overplotting Revisited}
|
||||||
|
ggplot(aes(carat, price), data = diamonds) +
|
||||||
|
geom_point() +
|
||||||
|
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||||
|
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||||
|
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||||
|
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||||
|
ggtitle('Price (log10) by Cube-Root of Carat')
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Other Qualitative Factors
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Price vs. Carat and Clarity
|
||||||
|
|
||||||
|
Alter the code below.
|
||||||
|
```{r Price vs. Carat and Clarity}
|
||||||
|
# install and load the RColorBrewer package
|
||||||
|
install.packages('RColorBrewer')
|
||||||
|
library(RColorBrewer)
|
||||||
|
|
||||||
|
ggplot(aes(x = carat, y = price), data = diamonds) +
|
||||||
|
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||||||
|
scale_color_brewer(type = 'div',
|
||||||
|
guide = guide_legend(title = 'Clarity', reverse = T,
|
||||||
|
override.aes = list(alpha = 1, size = 2))) +
|
||||||
|
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||||
|
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||||
|
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||||
|
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||||
|
ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Clarity and Price
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Price vs. Carat and Cut
|
||||||
|
|
||||||
|
Alter the code below.
|
||||||
|
```{r Price vs. Carat and Cut}
|
||||||
|
ggplot(aes(x = carat, y = price, color = clarity), data = diamonds) +
|
||||||
|
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||||||
|
scale_color_brewer(type = 'div',
|
||||||
|
guide = guide_legend(title = 'Clarity', reverse = T,
|
||||||
|
override.aes = list(alpha = 1, size = 2))) +
|
||||||
|
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||||
|
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||||
|
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||||
|
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||||
|
ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Cut and Price
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Price vs. Carat and Color
|
||||||
|
|
||||||
|
Alter the code below.
|
||||||
|
```{r Price vs. Carat and Color}
|
||||||
|
ggplot(aes(x = carat, y = price, color = cut), data = diamonds) +
|
||||||
|
geom_point(alpha = 0.5, size = 1, position = 'jitter') +
|
||||||
|
scale_color_brewer(type = 'div',
|
||||||
|
guide = guide_legend(title = Cut, reverse = T,
|
||||||
|
override.aes = list(alpha = 1, size = 2))) +
|
||||||
|
scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
|
||||||
|
breaks = c(0.2, 0.5, 1, 2, 3)) +
|
||||||
|
scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
|
||||||
|
breaks = c(350, 1000, 5000, 10000, 15000)) +
|
||||||
|
ggtitle('Price (log10) by Cube-Root of Carat and Cut')
|
||||||
|
```
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Color and Price
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Linear Models in R
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Building the Linear Model
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Building the Linear Model}
|
||||||
|
m1 <- lm(I(log(price)) ~ I(carat^(1/3)), data = diamonds)
|
||||||
|
m2 <- update(m1, ~ . + carat)
|
||||||
|
m3 <- update(m2, ~ . + cut)
|
||||||
|
m4 <- update(m3, ~ . + color)
|
||||||
|
m5 <- update(m4, ~ . + clarity)
|
||||||
|
mtable(m1, m2, m3, m4, m5)
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice how adding cut to our model does not help explain much of the variance
|
||||||
|
in the price of diamonds. This fits with out exploration earlier.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### Model Problems
|
||||||
|
Video Notes:
|
||||||
|
|
||||||
|
Research:
|
||||||
|
(Take some time to come up with 2-4 problems for the model)
|
||||||
|
(You should 10-20 min on this)
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
### A Bigger, Better Data Set
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r A Bigger, Better Data Set}
|
||||||
|
install.package('bitops')
|
||||||
|
install.packages('RCurl')
|
||||||
|
library('bitops')
|
||||||
|
library('RCurl')
|
||||||
|
|
||||||
|
diamondsurl = getBinaryURL("https://raw.github.com/solomonm/diamonds-data/master/BigDiamonds.Rda")
|
||||||
|
load(rawConnection(diamondsurl))
|
||||||
|
```
|
||||||
|
|
||||||
|
The code used to obtain the data is available here:
|
||||||
|
https://github.com/solomonm/diamonds-data
|
||||||
|
|
||||||
|
## Building a Model Using the Big Diamonds Data Set
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
```{r Building a Model Using the Big Diamonds Data Set}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## Predictions
|
||||||
|
|
||||||
|
Example Diamond from BlueNile:
|
||||||
|
Round 1.00 Very Good I VS1 $5,601
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
#Be sure you’ve loaded the library memisc and have m5 saved as an object in your workspace.
|
||||||
|
thisDiamond = data.frame(carat = 1.00, cut = "V.Good",
|
||||||
|
color = "I", clarity="VS1")
|
||||||
|
modelEstimate = predict(m5, newdata = thisDiamond,
|
||||||
|
interval="prediction", level = .95)
|
||||||
|
```
|
||||||
|
|
||||||
|
Evaluate how well the model predicts the BlueNile diamond's price. Think about the fitted point estimate as well as the 95% CI.
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
## Final Thoughts
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
***
|
||||||
|
|
||||||
|
Click **KnitHTML** to see all of your hard work and to have an html
|
||||||
|
page of this lesson, your answers, and your notes!
|
||||||
|
|
||||||
Loading…
x
Reference in New Issue
Block a user