Sunday 17 August 2014

Read in multiple files across different directories – R

Here is the scenario. I have a folder "Data folder" which contains two folders, "First files" and "Second files". Within "First files" are three text files, "File 1", "File 2" and "File 3" (seen below). Within "Second files" are three more text files, "File 4", "File 5" and "File 6".


The contents in the files do not really matter. My objective was to write code that could read in the contents of each file (all six) for further data manipulation in programming software. Let's start moving away from Excel VBA and towards programming with R.

Here is the code. Lines that start with # are comments.


The directories for the Data folder, First files and Second files are established. Then, two for loops iterate across the directories and files to read data into the workspace. These are the steps of the for loops:
  • myDir specifies the two directories ("First files" and "Second files").
  • for (directory in myDir) starts the directory loop.
  • setwd(directory) points to the first directory ("First files").
  • myFiles is the list of filenames in the current directory ("File 1", "File 2" and "File 3").
  • for (filename in myFiles) starts the filename loop.
  • filepath is the filepath that points to an individual file (first is "File 1").
  • dataTemp is the data frame that contains the data read in from the file. A "data frame" is used for storing data tables. Now you know.
  • cbind(assign(filename, dataTemp)) will assign the name of the file ("File 1") to the dataTemp. In this way, each file that is read in is named by the filename automatically. In other words, when "File 1" is read into the workspace, the data frame name is "File 1.txt".
  • The next file in the loop is "File 2". Then "File 3".
  • Then the second directory is selected ("Second files"). Within this directory, the three files "File 4", "File 5" and "File 6" are read in.
The end result is six data frames with the same name as the files [1].



This example pointed to two directories. Within each directory there are three files (six in total). This code can be scaled up and down – I could have a hundred directories with a hundred files in each. Cheers for automation!  


References and notes
1. I use RStudio to make life easier with R. I can see my R Script on the left. The Console which spits out the results of code execution on the top right. And the Environment which shows the data and values created upon code execution.

No comments:

Post a Comment