udacity_eda/lesson2/demystifying.R

# The goal of this file is to introduce you to the
# R programming language. Let's start with by unraveling a
# little mystery!

# 1. Run the code below to create the vector 'udacious'.
# You need to highlight all of the lines of the code and then
# run it. You should see "udacious" appear in the workspace.

udacious <- c("Chris Saden", "Lauren Castellano",
              "Sarah Spikes","Dean Eckles",
              "Andy Brown", "Moira Burke",
              "Kunal Chawla")

# You should see something like "chr[1:7]" in the 'Environment'
# or 'Workspace' tab. This is because you created a 'vector' with
# 7 names that have a 'type' of character. The arrow-like
# '<-' symbol is the assignment operator in R, similar to the
# equal sign '=' in other programming languages. The c() is a
# generic function that combines arguments, in this case the
# names of people, to form a vector.

# A 'vector' is one of the data types in R. Vectors must contain
# the same type of data, that is the entries must all be of the
# same type: character (most programmers call these strings),
# logical (TRUE or FALSE), or numeric.

# Print out the vector udacious by running this next line of code.

udacious

# Notice how there are numbers next to the output.
# Each number corresponds to the index of the entry in the vector.
# Chris Saden is the first entry so [1]
# Dean Eckles is the fourth entry so [4]
# Kunal Chawla is the seventh entry so [7]

# Depending on the size of you window you may see different numbers
# in the output.

# ANOTHER HELPFUL TIP: You can add values to a vector.
# Run each line of code one at a time below to see what is happening.

numbers <- c(1:10)

numbers

numbers <- c(numbers, 11:20)

numbers


# 2. Replace YOUR_NAME with your actual name in the vector
# 'udacious' and run the code. Be sure to use quotes around it.

udacious <- c("Chris Saden", "Lauren Castellano",
              "Sarah Spikes","Dean Eckles",
              "Andy Brown", "Moira Burke",
              "Kunal Chawla", "Dustin Pianalto")

# Notice how R updates 'udacious' in the workspace.
# It should now say something like 'chr[1:8]'.

# 3. Run the following two lines of code. You can highlight both lines
# of code and run them.

mystery = nchar(udacious)
mystery

# You just created a new vector called mystery. What do you
# think is in this vector? (scroll down for the answer)


# Mystery is a vector that contains the number of characters
# for each of the names in udacious, including your name.

# 4. Run this next line of code.

mystery == 11

# Here we get a logical (or boolean) vector that tells us
# which locations or indices in the vector contain a name
# that has exactly 11 characters.

# 5. Let's use this boolean vector, mystery, to subset our
# udacious vector. What do you think the result will be when
# running the line of code below?

# Think about the output before you run this next line of code.
# Notice how there are brackets in the code. Brackets are often
# used in R for subsetting.

udacious[mystery == 11]


# Scroll down for the answer


# It's your Udacious Instructors for the course!
# (and you may be in the output if you're lucky enough
# to have 11 characters in YOUR_NAME) Either way, we
# think you're pretty udacious for taking this course.


# 6. Alright, all mystery aside...let's dive into some data!
# The R installation has a few datasets already built into it
# that you can play with. Right now, you'll load one of these,
# which is named mtcars.

# Run this next command to load the mtcars data.

data(mtcars)


# You should see mtcars appear in the 'Environment' tab with
# <Promise> listed next to it.

# The object (mtcars) appears as a 'Promise' object in the
# workspace until we run some code that uses the object.

# R has stored the mtcars data into a spreadsheet-like object
# called a data frame. Run the next command to see what variables
# are in the data set and to fully load the data set as an
# object in R. You should see <Promise> disappear when you
# run the next line of code.

# Visit http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Promise-objects
# if you want the expert insight on Promise objects. You won't
# need to the info on Promise objects to be successful in this course.

names(mtcars)

# names(mtcars) should output all the variable
# names in the data set. You might notice that the car names
# are not a variable in the data set. The car names have been saved
# as row names. More on this later.

# You should also see how many observations (obs.) are in the
# the data frame and the number of variables on each observation.

# 7. To get more information on the data set and the variables
# run the this next line of code.

?mtcars

# You can type a '?' before any command or a data set to learn
# more about it. The details and documentation will appear in
# the 'Help' tab.


# 8. To print out the data, run this next line as code.

mtcars

# Scroll up and down in the console to check out the data.
# This is the entire data frame printed out.

# 9. Run these next two functions, one at a time,
# and see if you can figure out what they do.

str(mtcars)

dim(mtcars)

# Scroll down for the answer.


# The first command, str(mtcars), gives us the structure of the
# data frame. It lists the variable names, the type of each variable
# (all of these variables are numerics) and some values for each
# variable.


# The second command, dim(mtcars), should output '[1] 32 11'
# to the console. The [1] indicates that 32 is the first value
# in the output.

# R uses 1 to start indexing (AND NOT ZERO BASED INDEXING as is true
# of many other programming languages.)

# 10. Read the documentation for row.names if you're want to know more.
?row.names

# Run this code to see the current row names in the data frame.
row.names(mtcars)

# Run this code to change the row names of the cars to numbers.
row.names(mtcars) <- c(1:32)

# Now print out the data frame by running the code below.
mtcars

# It's tedious to relabel our data frame with the right car names
# so let's reload the data set and print out the first ten rows.

data(mtcars)
head(mtcars, 10)

# The head() function prints out the first six rows of a data frame
# by default. Run the code below to see.
head(mtcars)

# I think you'll know what this does.
tail(mtcars, 3)


# 11. We've run nine commands so far:
#      c, nchar, data, str, dim, names, row.names, head, and tail.

# All of these commands took some inputs or arguments.
# To determine if a command takes more arguments or to learn
# about any default settings, you can look up the documentation
# using '?' before the command, much like you did to learn about
# the mtcars data set and the row.names


# 12. Let's examine our car data more closely. We can access an
# an individual variable (or column) from the data frame using
# the '$' sign. Run the code below to print out the variable
# miles per gallon. This is the mpg column in the data frame.

mtcars$mpg

# Print out any two other variables to the console.


# This is a vector containing the mpg (miles per gallon) of
# the 32 cars. Run this next line of code to get the average mpg for
# for all the cars. What is it?

# Enter this number for the quiz on the Udacity website.
# https://www.udacity.com/course/viewer#!/c-ud651/l-729069797/e-804129314/m-830829287

mean(mtcars$mpg)