Showing posts with label course. Show all posts
Showing posts with label course. Show all posts

Saturday, 27 June 2015

Probably a better way – Year one

It’s been a year since my first post, and a lot has changed. Looking over my early posts, I’m amused that I started with VBA in Excel. At the time I had completed a VBA training course and was writing a program at work. I saw a future of working exclusively in VBA – writing VB scripts where my customers would use my awfully clunky workbooks to manage and analyse their data. I suspect if I searched “VBA” across job ads, the number of hits has reduced since a year ago.

Soon I commenced the Data Science courses on Coursera. I learnt to program in R via these courses (aided with prior Matlab knowledge) then applied this knowledge at work. I managed my data with R and visualised data with R. I even built a web app (using the Shiny package). R is great, and it’s free. Not like how Facebook is free where one hands over personal data. R is proper free.

I am regularly using R in my new role. I do wonder whether I should learn Python. My understanding is that Python excels over R in web applications and text analytics. I wish I knew Javascript – I would like to create custom interactive displays.

I’d love to know more about statistics and multiple regression. Not entirely sure how I would apply this knowledge. At the heart of it, I’d like to be in a position to receive a large amount of data and simply know what statistical methodology I should be applying towards uncovering insights. I have commenced reading my thick stats book mentioned in this post.

At the moment I am being exposed to different data types and methodologies at work. JSON, XML, MySQL. I’ll take them as they come. I’m fortunate that I am surrounded by developers that have advice when I get stumped. I just have to be more comfortable with asking for help.

Summary: Year one was moving away from VBA to R. Here’s to year two!

Wednesday, 22 October 2014

Git add, commit, push – Github

My first experience using GitHub was met with a sea of confusion. "GitHub is a Git repository web-based hosting service, which offers all of the distributed revision control and source code management (SCM) functionality of Git as well as adding its own features" [1]. It's used by my MOOCs for grade assessment. Students upload completed assignment files (such as code text documents and images) to GitHub, made available for viewing by other students. I submitted an assignment for the Reproducible Research course and pleasantly discovered that the GitHib submission went smoothly. This was largely due to recalling I needed to be in the right directory. The following are the steps I took to submit my work to GitHub. I have assumed the reader has set up a GitHub account and has installed Git Bash on their Windows machine [2].

First I was instructed to "fork" a repository. I need to define some terms here. A repository is "a central location in which data is stored and managed" [1]. "A fork is a copy of a repository. Forking a repository allows you to freely experiment with changes without affecting the original project" [3]. I won't go into the details, but to fork a repository, you follow the GitHub link to the repo (that's what the cool kids call it) and click on "Fork". That's it. The forked repo appears in your list of repositories. The contents of the forked repo are available to you to view. But you are not ready to make changes to the contents of these files and make these changes available to others. In order to do that, a local copy of the repo is required on your computer. In other words, the repo and its contents need to sit in a directory on your computer. The local copy and the repo on GitHub are linked. Changes made to the local copy can be synced with GitHib.

To make a local copy of a repo, you need to clone the repo.

Opening Git Bash [4] I typed "git clone" [5], space then a URL. The URL points to the forked repo (called "RepData_PeerAssessment1"). My username on Git is "DataMoose".


After hitting Enter, the repo appears in my Home directory, as seen below with the folder called "RepData_PeerAssessment1".


When I open the folder I can view the contents. Note that there exists a "PA1_temlpate.Rmd" file. Later I will overwrite this file.


I stated that knowing what directory I was in was for smoother sailing. To check the current directory, type "pwd" (which stands for "print working directory").


I'm in my Home directory at C/Users/Karim [6]. I need to change the directory and point to the repo on my machine. Changing directories is achieved by type "cd" followed by a space, then the folder name (that exists in the current directory) I wish to jump into. Hit Enter.


As a sanity check, type "pwd" again to confirm you are now in the desired directory. A very useful thing to type to get a feel for this directory is "git status" [7]. This returned a status on whether any files in the repo had been modified.


Upon completing my assignment, three files ("PA1_template.Rmd", "PA1_template.html", "PA1_template.md") and a folder ("PA1_template_files") contained figures were produced. I copied then pasted the files and folder into my local repo folder, overwriting the "PA1_temlpate.Rmd" file.

Typing "git add ." will add any new or changed files to git. One can add a specific file by typing out the filename. Use of "." Will add all files in the directory.

Typing "git commit –m" will, I suppose, commit the additions. I'm assuming this step is equivalent to asking "Are you sure you wish to continue?". The "first commit" is my choice of statement. I could write anything and it will display on GitHub. After hitting Enter, the files that are added/committed are listed.


Typing "git push –u origin master" then hitting Enter will prompt for your GitHub username and password. After (correctly) keying these in, stuff (hopefully) appears to indicate that the files have been added to the repo on GitHub.


Refreshing your GitHub repo list, you will see that the new/changed files have appeared (those with "first commit").


That's how we fork, clone, add, commit, push with Git. GitHub is a fantastic (free) tool for version control of your work. In the event you made a blunder, you can rollback to prior versions of files. For my purposes, students can hopefully award me full grades for my assignment.


References and notes
1. I Googled this.
2. I haven’t really assumed this. I'm quite aware people use Macs. I'm a PC, and I wear glasses.
3. Help from: https://help.github.com/articles/fork-a-repo/
4. Allow me a moment to get all colloquial on your ass. Git Bash is an interface that lets me type in single line commands that gets the computer to do stuff, OK? OK!
5. As guided by: https://help.github.com/articles/fork-a-repo/
6. My name is Karim. Nice to meet you.
7. I was guided by: http://guides.railsgirls.com/github/

Sunday, 12 October 2014

Screenshots – Snipping Tool

The Programming for Everybody course is already paying off – I learnt a new way to take screenshots.

In one of the lectures, the Snipping Tool is used to take screenshots (for Windows). I've been using it for years to box out a region and capture the image. Sure beats using the PrntScr key exclusively.

About a year ago a colleague pointed out the Screenshot option in MS Word 2010 in the Insert tab. This allows you to get the image of an open window without having to use the Snipping Tool by drawing the box around the window.


I use the Snipping Tool to grab images that are not encased in a window per se. I draw my box and readjust until it looks right. I get really picky about it – I like my screen grabs to be neat. Same border width all 'round I say. However I did not know that I had options with the Snipping Tool. The default is "Rectangula'Snip". Selecting "Window Snip" allows you to select any open window to take the screenshot, including (it turns out) an image on my VLC media player.


If there are two overlapping windows, you can select the background window and Snipping Tool will include the foreground image snipped out via the background border. Did that make sense? Here's a picture of two over lapping Word documents.

Using Windows Snip and selecting the background, we see that part of the foreground is included.


It goes to show that even when you've been using something for a long time, there's still more that can be achieved.

Sunday, 13 July 2014

Online R programming course

If I had my time again, I would have taken a gap year before commencing University. Rather than travel the world whilst supporting myself working data entry jobs in windowless offices, I would be particularly nerdy and enrol in a whole bunch of free online courses.

There were very few structured online courses back at the turn of the century. Google was barely a thing.

I didn't know what I wanted to do at University at the age of 17. Yet whilst completing my final year exams, I filled out my list of preferred degrees informed by short blurbs from a phone book-thick course guide. I ended up in a molecular biology/genetics/biochemistry degree and proceeded to hate if for the three year duration. Good choice me [1].

The availability of massive open online courses (MOOCs) allows me to say, "Hey, what's a course in Marine Biology like?", then find one and enrol. In the comfort of my home, I watch videos, attempt quizzes and submit assignments. If I don't understand any of the course content, I can engage with other student on discussion forums [2].

I enrolled and completed two Coursera courses to date: The Data Scientist's Toolbox and R Programming. The courses are really well-structured and the range of topics covered have been broad. R programming included the following:
  • Data types
  • Subsetting
  • Reading and writing data
  • Control structures
  • Functions
  • Scoping rules
  • Vectorized operations
  • Debugging.

Course assignments drove the requirement to learn programming in R. One assignment provided hospital mortality data from hundreds of hospitals across the United States. The assignment was to write a function whereby users could enter the name of a US State and request a list that ranked the best (lowest mortality) or worst hospitals (higher mortality). Since I have programming experience, I could list the logical steps required to fulfil the task: 
  • Read the spreadsheet data
  • Point to the State and mortality columns
  • Get the user's input parameters (State and rank request)
  • Organise the data as specified by the user's parameters (subsetting and sorting)
  • Return the result as a list.

I do know how to code the logical steps via VBA in Excel. However the assignment required coding in R. I had to learn the R syntax required for each step. That is, I had to correctly use the R syntax for a For loop, for an If Else statement, etc. The screen-grab shows a simple For loop with an If Else statement as written in VBA (background) and R (foreground using the R Studio IDE). One step at a time, putting it all together, I had a functioning function.



Different programming languages can all perform the same basic tasks (I assume). Efficiency, tools and packages, time, usability, cost, experience, popularity – such factors influence the choice of a programming language. I can see where I would use R for some projects at work over VBA. Then I can enter the debate of whether one should use SAS/SPSS/Python/R for business needs. I think the first two have a greater market share. However the last two are open source and have a dedicated users with various Meetup groups. For now, I'm happy to sample different programming languages via MOOCs. But life still owes me a gap year.


References and notes
1. I believe that it is a tad bit unfair for a 17 or 18 year old to make a choice that significantly dictates the course of one's career for (most of) life. I had had very few major life experiences at 17; I was living at home, was not working, had barely travelled, and only knew how to make plain omelettes and cheese pizzas. 
2. A disadvantage to "real life" courses is the lack of direct access to lecturers and tutors. Recognition of qualifications upon completing an online course is under development.