Sunday 13 July 2014

Online R programming course

If I had my time again, I would have taken a gap year before commencing University. Rather than travel the world whilst supporting myself working data entry jobs in windowless offices, I would be particularly nerdy and enrol in a whole bunch of free online courses.

There were very few structured online courses back at the turn of the century. Google was barely a thing.

I didn't know what I wanted to do at University at the age of 17. Yet whilst completing my final year exams, I filled out my list of preferred degrees informed by short blurbs from a phone book-thick course guide. I ended up in a molecular biology/genetics/biochemistry degree and proceeded to hate if for the three year duration. Good choice me [1].

The availability of massive open online courses (MOOCs) allows me to say, "Hey, what's a course in Marine Biology like?", then find one and enrol. In the comfort of my home, I watch videos, attempt quizzes and submit assignments. If I don't understand any of the course content, I can engage with other student on discussion forums [2].

I enrolled and completed two Coursera courses to date: The Data Scientist's Toolbox and R Programming. The courses are really well-structured and the range of topics covered have been broad. R programming included the following:
  • Data types
  • Subsetting
  • Reading and writing data
  • Control structures
  • Functions
  • Scoping rules
  • Vectorized operations
  • Debugging.

Course assignments drove the requirement to learn programming in R. One assignment provided hospital mortality data from hundreds of hospitals across the United States. The assignment was to write a function whereby users could enter the name of a US State and request a list that ranked the best (lowest mortality) or worst hospitals (higher mortality). Since I have programming experience, I could list the logical steps required to fulfil the task: 
  • Read the spreadsheet data
  • Point to the State and mortality columns
  • Get the user's input parameters (State and rank request)
  • Organise the data as specified by the user's parameters (subsetting and sorting)
  • Return the result as a list.

I do know how to code the logical steps via VBA in Excel. However the assignment required coding in R. I had to learn the R syntax required for each step. That is, I had to correctly use the R syntax for a For loop, for an If Else statement, etc. The screen-grab shows a simple For loop with an If Else statement as written in VBA (background) and R (foreground using the R Studio IDE). One step at a time, putting it all together, I had a functioning function.



Different programming languages can all perform the same basic tasks (I assume). Efficiency, tools and packages, time, usability, cost, experience, popularity – such factors influence the choice of a programming language. I can see where I would use R for some projects at work over VBA. Then I can enter the debate of whether one should use SAS/SPSS/Python/R for business needs. I think the first two have a greater market share. However the last two are open source and have a dedicated users with various Meetup groups. For now, I'm happy to sample different programming languages via MOOCs. But life still owes me a gap year.


References and notes
1. I believe that it is a tad bit unfair for a 17 or 18 year old to make a choice that significantly dictates the course of one's career for (most of) life. I had had very few major life experiences at 17; I was living at home, was not working, had barely travelled, and only knew how to make plain omelettes and cheese pizzas. 
2. A disadvantage to "real life" courses is the lack of direct access to lecturers and tutors. Recognition of qualifications upon completing an online course is under development.

No comments:

Post a Comment