How To Remove Duplicate Rows in Google Spreadsheets
Tutorial Category: Google Apps Script
Share This:
Cleaning up duplicate data is a necessary evil. You wouldn’t want your counts or stats to be off because of duplicates. In this tutorial, we’ll learn how to remove duplicates from Google Spreadsheets, including some important points to keep in mind.
Many of the tutorials I’ve seen provide a function to remove duplicates by matching all columns of data. It usually looks something like this:
function removeDuplicates() { var sheet = SpreadsheetApp.getActiveSheet(), data = sheet.getDataRange().getValues(), newData = new Array(); for (i in data) { var row = data[i], duplicate = false; for (j in newData) { if (row.join() == newData[j].join()) { duplicate = true; } } if(!duplicate){ newData.push(row); } } sheet.clearContents(); sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData); } |
The above code works fine, but I think we have to step back and think about when removing duplicates is truly needed.
- In most cases, the spreadsheet is being populated by a form. That means there’s going to be a time stamp column.
- If a user can submit more than one time, the most recent response is at the bottom of the spreadsheet.
If the above two points are true for you, then the above function won’t work and here’s why.
- If you try to compare time stamps, they most often won’t be a match. So the duplicate will never be found, resulting in inaccurate counts and/or stats.
- If you allow a user to respond more than one time, you probably want to keep their most recent response. The above function will remove the most recent response leaving you with the oldest, again resulting in inaccurate data.
- This function shouldn’t be run while users can still respond. If it is run, it’s going to clear the current responses and replace them with no duplicates. It might be safer to use a new sheet and populate it with the cleaned up responses, leaving the original responses as a backup, especially if something goes wrong while the script is running.
Ok so how can we update the above function to take into account the above points? Here’s the new function:
function removeDuplicates() { var ss = SpreadsheetApp.getActiveSpreadsheet(), responses = ss.getSheetByName('Form Responses'), range = responses.getDataRange(), numRows = range.getNumRows()-1, data = range.getValues(), columnHeadings = [data[0]], newData = []; for (var i=numRows; i>0; i--) { var row = data[i], duplicate = false; for (var j in newData) { if (row[1] == newData[j][1]) { duplicate = true; } } if (!duplicate) { newData.push(row); } } var final = ss.getSheetByName('Final'); if (!final) { var final = ss.insertSheet('Final'); } final.clearContents(); final.getRange(1,1,1,columnHeadings[0].length).setFontWeight('bold').setValues(columnHeadings); final.getRange(2, 1, newData.length, newData[0].length).setValues(newData); } |
Let’s explain what’s going on in the code:
- When using a form, by default, there’s usually a sheet called “Form Responses” which collects the form responses. We reference that sheet by name and assign it to the variable responses.
- We get the range of data within the responses sheet and assign it to the variable range.
- We need to know the number of rows so we can start the loop at the bottom of the spreadsheet. So we get the number of rows and assign it to the variable numRows.
- We get the values in each cell within the range and assign it to the variable data.
- We get the column headings, which is row zero of data, so we can copy them to the new sheet.
- The loop starts at the last row and works backwards until it reaches row 1. We don’t want to include row zero because those are the column headings.
- Since a user should only have one response, let’s check for duplicates of their username by looking for the username column.
- Instead of clearing the responses sheet, let’s create a new sheet called Final and populate it with the column headings on row 1 and the responses containing no duplicates starting at row 2.
Hope the revised function helps you! To take it a step further and make it easier for the end user to run the function, learn how to create a custom menu!