13.4.14

Prepare SQL SELECT, INSERT INTO, DELETE query using OpenRefine

I know that a lot of us export data from MySQL / SQL databases to clean them in OpenRefine before loading them back in their original database. Before, I was exporting my project to csv and loading the csv using some command utilities for MySQL, it worked by that was a painful process with a lot of details to pay attention to (encoding, field separator ...). But all this was was before I found a new way to use the template option of OpenRefine to prepare large select, update, insert or delete SQL statement

So instead of exporting to csv and importing through an other interface / tool like phpmyadmin you can use the template function of OpenRefine to preparethat will iterate through all the row of your project.


Before invoking the template function check that your facet are set properly. Only rows currently displayed in the project will be included in the final export. The template option will iterate through all rows and process them through a predefinied template.

Let's get to the fact and see how we leverage each section of the template export.

The template export screen is divided into five different section:

  • the prefix is used to have
  • Row template is the core of this interface, this is where OpenRefine will iterate through the different value and insert them
  • Row separator is a clever interface that take into account the first and last row to separate the different records created through the row template. 
  • In the suffix section you can add any value that will close your statement
  • The preview pan let you preview your final query. 
(click on the image to enlarge it)

Now let's go through each section in details.


Prefix

In the Prefix section write the action and table selection and add the beginning of your where statement this could be
SELECT * FROM `table` WHERE
DELETE FROM `table` WHERE
INSTER INTO FROM `table` .... VALUES
The upate statement will is slightly different and will have his own article

Row Template
We will us the template functionality to build a large query where each project row will be a process through the template defined here.
The value is called using  {{jsonize(cells['COLUMN'].value)}}  all other content could be replaced by a nice `field` LIKE ' ' 

Row Separator,
Add here the separator between your different statement define in the Row Template section. For example you can use this section to  add a blank line, a coma to separate each statement or the aggregation value AND or OR.

The Row Template section take into account the first and last row so content added here will not be repeated when not needed (ie. no comma on your last statement).

Suffix
Add here any extra functionality you whish to add at the end of your query. This can be ORDER BY or SORT BY. In order to have a clean SQL statement this is also the place to add the closing semi column.

Preview
Display a preview using the 20 first row the final result of your export. You can use result here to check that your SQL statement have no syntax errors.

Export and rename your file .sql. Now you can run your query back on your database!

Just an other screenshot to show that you can build complex query:
(click on the image to enlarge it)