Sunday, 6 July 2014

Find word matches across two sheets – Excel VBA

We have an Incidents registry where staff members can note down any work, health and safety (WHS)/ occupational health and safety (OHS) events that occur on-site. These can be innocuous (paper cut) to the severe. Here is a simple list of incidents with dates on a worksheet called “Incidents”.


I was asked to provide a subset of this sheet containing any incidents that made mention of "WHS" (red text). I could also highlight any mention of "OHS" (magenta text), though this was secondary. The real list I worked off had many thousands of rows (and no coloured text). What did I do to find the targets "WHS" and "OHS" (including "WH&S" and "OH&S")? I used Ctrl F then marked the cell next to the incident with "1". At the end, I could sort by these numbers and remove rows that contained none – hence subsetting the data as required.

As I Ctrl F'ed and contemplated my life in general, I knew I would have to perform a similar task on a different spreadsheet in the future. Did someone say coding solution?

I built something! After running the code to search for "WHS", "WH&S", "OHS" and "OH&S", I get the below results.


First I created the Target sheet which contains the list of words/terms that I wish to find on the Incidents sheet. Column A lists the primary target words to search for. If either "WHS" or "WH&S" are found, the incident row will be marked with "1".Column B lists the secondary targets. If either "OHS" or "OH&S" are found on the incident row AND the row IS NOT marked with "1", then the row will be marked with "2" (I do not want to overwrite any existing marks).


I shall pause here. For those who wish to see the code [1], I have (poorly) screen-grabbed it at the end of this post [2]. For those still with me – I will attempt to translate the code steps into words. I'll do this is two parts - Part 1: Sections of code, and, Part 2: Details of code.

Part 1: Sections of code

The code is composed of three key sections: 
1. For c = 1 To colLast (red vertical line).
2. For r = 1 To rowLast (blue vertical line).
3. With Worksheets(“Incidents”).UsedRange (green vertical line).

For c = 1 To colLast iterates through each column on the Target sheet.

For r = 1 To rowLast iterates through each row within each column on the Target sheet.

With Worksheets(“Incidents”).UsedRange interrogates the cells that contain data on the Incidents sheet.

The order of what happens is:
1. First column on the Target sheet is selected (column A).
2. First row in the column on the Target sheet is selected (row 1, column A). A target word is selected ("WHS").
3. The target word is used to find all matches on the Incidents sheet (rows 4, 5, 9, 11, 12) and marks these with "1".
4. The next row in the column on the Target sheet is selected (row 2, column A). The next target word is selected (“WH&S”).
5. The target word is used to find all matches on the Incidents sheet (rows 5, 12) and marks these with "1".
6. The next column on the Target sheet is selected (column B). 
7. The next row in the column on the target sheet is selected (row 1, column B). A target word is selected ("OHS").
8. The target word is used to find all matches on the Incidents sheet and marks these with "2". However, if a row already contains a match value, then it will not overwrite this value (only row 7 will be marked with "2").
9. The next row in the column on the Target sheet is selected (row 2, column B). The next target word is selected ("OH&S").
10. The target word is used to find all matches on the Incidents sheet and marks these with "2" if there is no prior marked value (no additional rows will be marked with "2").

And that's it! All of the words on the Target sheet have been grabbed and used to find a match on the Incidents sheet.

Part 2: Details of code (I'm impressed you're still reading)

Remember: the Target sheet contains the words I am trying to find on the Incidents sheet. The below steps follow the lines of code from top to bottom.
  • Determine the last column with data on the Target sheet. Why? I need to know how many columns there are to search across for target words – otherwise I would search endlessly and freeze the program. There are 2 columns in the Target sheet, A and B.
  • For each column on the Target sheet, determine the last row with data. Starting with column A, the last row with data is row 2.
  • Since I now know the column number (1) and the last row with data (2) in the column (column A), I can get the range of the column. I will select my target words from this range to begin. If I used the whole worksheet range, I would inadvertently start selecting blank cells to find target words. Since each row of the Incidents data contains blank spaces, each row would be marked with "1", which I do not want. Setting ranges involve knowing exactly where the data you wish to use is located.
  • Activate the Incidents sheet. A user can activate a sheet simply by selecting that sheet. Once the code has activated the sheet, any action by the code will apply to the Incidents sheet (unless specified otherwise).
  • Get the first empty column of Incidents sheet. Why? Because this is where the mark values will be populated each time a target word is found. I do not wish to populate the values in a column that contains data.
  • Set initial values for variables. I have two – rowLoc and firstRowLoc. Before describing these, I should note the code is driven by a range called Loc (I have underlined in red where this appears in the code below). Loc gets the cell location (range) where the target word match was found. I can get the row number of where Loc exists via rowLoc = Loc.row. firstRowLoc is the row where the first instance of a target word match was found. Before I begin to use rowLoc and firstRowLoc, I have to set initial values (they cannot be blank). rowLoc = 0 and firstRowLoc = -1 and I'll explain why I used these values below.
  • Specify the range to use when searching for target words in the Incident sheet. The code will not search outside of this range.
  • Select a target word from the target range. Recall above that the range of column A listed words has been determined. The first target word “WHS” is chosen.
  • Set Loc to find the row where the target word exists on the Incidents sheet. The first instance of where “WHS” is found is on row 4. At this point, the values of rowLoc and firstRowLoc are updated to both equal "4" (no longer 0 and -1, respectively). 
  • Enter the Do Until loop. The Do Until loop will continue until the rowLoc = firstRowLoc. In other words, when the Loc returns to the first row where a target word match was found, the condition of rowLoc = firstRowLoc is met, and the loop ends. Had rowLoc and firstRowLoc both been initialised with 0 at the start, then this condition (rowLoc = firstRowLoc; 0 = 0) would have been met and the Do Until loop would not have commenced. Hence prior to the loop, rowLoc = 0 and firstRowLoc = -1 (unequal values). But wait! The previous step indicated that both rowLoc and firstRowLoc equalled 4. If this rowLoc = firstRowLoc condition is now met, shouldn't the loop end? No, because there is a code further on that will find the new target word match and update the rowLoc value (to "5"). Thus when entering the loop again, rowLoc (5) will not equal firstRowLoc (4) and the loop will continue. 
  • Still in the loop, when a match is found, mark with "1" if target words are from column A. Otherwise, mark with the column number for column B onwards. That is, if the word is "OHS" or "OH&S", this arises from column B, which is the second column. Hence, the mark value is "2". BUT – only mark with "2" if there is no prior value of "1".
  • Set Loc again to get the location of the cell where the target word is found next. Get the row number of where the next word will be found. This is where the rowLoc value is updated (to "5"). Upon entering the loop again, this next rowLoc value will be compared against the firstRowLoc value. If the values match, that means the FindNext code has pointed back to the row where the first match was found, and the loop will end. This allows the next target word to be selected and the code re-enters the loop to interrogate the Incidents data to find a match.
  • The process repeats until all words on the Target sheet have been selected and assessed for a match on the sheet.

I probably should have used more screen-grabs for Part 2. My bad.

The code



References and notes
1. I modified code I found on the stackoverflow thread Find all matches in workbook using Excel VBA by Nanashi.
2. The code in the Excel IDE ran over the page. I screen-grabbed two separate sections, pasted it in Word to recombine, made some annotations, then screen-grabbed the combined image. A copy of a copy made is a poor copy. I assume there is a better way to have produced this.

No comments:

Post a Comment