R Exercises – 71-80 – Loops (For Loop, Which Loop, Repeat Loop), If and Ifelse Statements in R

1. Simple ifelse statement Create the data frame ‘student.df’ with the data provided below: Use a simple ‘ifelse’ statement to add a new column ‘male.teen’ to the data frame. This is a boolean column, indicating T if the observation is a male younger than 20 years. 2. Double for loop Write a double for loop which prints 

Continue Reading…

R Exercises – 61-70 – R String Manipulation | Working with ‘gsub’ and ‘regex’ | Regular Expressions in R

Required packages and datasets 1. ‘College’ dataset – Colleges in Texas a. Get familiar with the ‘college’ dataset and its row names. b. Get a vector with the college names (‘college.names’) which you will need in the further steps of this and the next exercises. c. Get a vector (‘texas.college’) which contains all colleges with ‘Texas’ in its name. 

Continue Reading…

R Exercises – 51-60 – Data Pre-Processing with Data.Table

Required packages for the excises 1. ‘College’ dataset – Basic row manipulations a. Transform ‘College’ from ‘ISLR’ to data.table. Make sure to keep the University identifier. We will use this new data.table called ‘dtcollege’ throughout this block of exercises. b. Get familiar with the dataset and its variables. c. Extract rows 40 to 60 as a new data.table (‘mysubset’). 

Continue Reading…

R Exercises – 31-40 – Data Frame Manipulations

1. Working with the ‘mtcars’ dataset a. Get a histogram of the ‘mpg’ values of ‘mtcars’. Which bin contains the most observations? b. Are there more automatic (0) or manual (1) transmission-type cars in the dataset? Hint: ‘mtcars’ has 32 observations. c. Get a scatter plot of ‘hp’ vs ‘weight’. 2. Working with the ‘iris’ dataset 

Continue Reading…

R Exercises – 21-30 – The Apply Family of Functions

1. Function ‘apply’ on a simple matrix: a. Get the following matrix of 5 rows and call it ‘mymatrix’ b. Get the mean of each row c. Get the mean of each column d. Sort the columns in ascending order 2. Using ‘lapply’ on a data.frame ‘mtcars’ a. Use three ‘apply’ family functions to get the minimum values 

Continue Reading…

r data pre-processing

R Data Pre-Processing and Data Management

Data Pre-Processing is the very first step in data analytics. You cannot escape it, it is too important. Unfortunately this topic is widely overlooked and information is hard to find. With this course I will change this! Data Pre-Processing as taught in this course has the following steps: 1.       Data Import: this might sound trivial 

Continue Reading…

Famous and Very Useful Pre-Installed Exercise Datasets in R

As most of you surely know, R has many exercise datasets already installed. That simply means, as soon as you installed R Base, which includes the library ‘datasets’, you have ample opportunity to explore R with real world data frames. For me as course content creator those datasets help tremendously, because with them I can 

Continue Reading…

Machine Learning – The Pinnacle of Modern Statistics!

In my consulting work, during research or while answering student questions, the topic of machine learning pops up constantly. Unfortunately, there are some misconceptions concerning this topic. In this article I am going to explain what machine learning actually is and how you can benefit from those tools. Machine learning is a collection of modern 

Continue Reading…

trading biotech stocks

Trading Biotech Stocks – Understanding the Healthcare Sector

Do you plan on trading biotech stocks? Do you want to learn how to identify promising healthcare stocks? Do you want to professionally read a company pipeline? Do you want to know where to get all the relevant info for biotech/healthcare company evaluation? Do you want to tailor your investment strategy towards a healthcare portfolio? 

Continue Reading…

software

Proprietary vs. Open Source Analytics Software – Which one should I choose?

When it comes to analytics software and languages you will sooner or later have to decide which one you will use. There are dozens of tools available, some more in demand than others. As a data scientist you will find R and Python as popular open source statistical packages. On the proprietary side, you will find products like SPSS, Matlab, Stata or SAS. Since it 

Continue Reading…

Quality R Training for You