4
\$\begingroup\$

I must write updates in a database from a flat file (CSV). I want to do that in the shell, with tools such as AWK.

#!/bin/bashcat in.csv | sed -e '1d' | awk -F';' -v q=\' '{               # For each line.    print "DECLARE @v_trmID varchar(16) = " q $1 q    print "DECLARE @v_trmNom varchar(6) = " q $3 q    print "DECLARE @v_trmNbrTrav smallint = " $4    print "IF EXISTS (SELECT 1 FROM trimestre WHERE trmID = @v_trmID AND trmNom = @v_trmNom)"    print "    BEGIN"    print "        UPDATE trimestre"    print "        SET trmNbrTrav = @v_trmNbrTrav"    print "        WHERE trmID = @v_trmID AND trmNom = @v_trmNom"    print "    END"    print "ELSE"    print "    BEGIN"    print "        PRINT " q "The script execution FAILED for record " NR " (pfiID " q " + @v_trmID + " q ", trimestre " q " + @v_trmNom + " q ")." q    print "    END"    print "go"    print ""}'

Though, there a 2 things I don't like:

  • The way quotes are inserted; it becomes really difficult to follow, even if I choose for the simpler way I found to write quotes inside an AWK string (instead of multiple escape sequences). Still, not that readable.

  • The fact that every SQL line is not readable as is. Code is not highlighted as SQL. I'd like to find a "here doc" solution, where I wouldn't have to prefix lines withprintf.

Do you have any pieces of advice or better ideas on the way to write robust (because more readable / easily modifiable) code?

Jamal's user avatar
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
askedSep 30, 2016 at 21:25
user3341592's user avatar
\$\endgroup\$

2 Answers2

2
\$\begingroup\$

Why involvebash,cat,sed, andawk, when the whole solution could be done inawk alone? Not only does it simplify the execution, it also reduces your quoting headaches.

In addition, I recommend dropping thev_ Hungarian prefix.

Here, I've usedprintf instead ofprint, but it's your choice.

#!/usr/bin/awk -fBEGIN { FS = ";" }NR > 1 {              # Skip the header row    trmID = $1;    trmNom = $3;    trmNbrTrav = $4;    printf "IF EXISTS (SELECT 1 FROM trimestre WHERE trmID = '%s' AND trmNom = '%s')\n", trmID, trmNom;    print  "    BEGIN"    printf "        UPDATE trimestre SET trmNbrTrav = %d WHERE trmID = '%s' AND trmNom = '%s'\n", trmNbrTrav, trmID, trmNom;    print  "    END"    print  "ELSE"    print  "    BEGIN"    printf "        PRINT 'The script execution FAILED for record %d (pfiID ''%s'', trimestre ''%s'').'\n", NR - 1, trmID, trmNom;    print  "    END"    print  "go"}

Note that anyawk-based technique is susceptible to SQL injection. Presumably your CSV data isn't hostile.

Personally, I would do it a bit differently, such that the entire operation is atomic:

  1. Create a temporary table.

    CREATE TABLE #csv_upload( trmID VARCHAR(16), trmNom VARCHAR(6), trmNbrTrav SMALLINT);
  2. Copy all of the CSV data to the temporary table. You can useawk to generateINSERT statements or do aBULK INSERT #csv_upload FROM 'filename.csv' WITH ( FIELDTERMINATOR = ';', FIRSTROW = 2 ).

  3. Do aJOIN query to verify that every row in the temporary table corresponds to a row in thetrimestre table. If not, then report an error before anything gets modified in thetrimestre table.

  4. Perform one singleUPDATE for the entire batch, preferably within a transaction.

answeredSep 30, 2016 at 23:06
200_success's user avatar
\$\endgroup\$
1
  • \$\begingroup\$This code is the most simpler to read... And thanks a lot for your new way to handle the problem at hand! Did not know about BULK INSERT...\$\endgroup\$CommentedOct 4, 2016 at 19:25
2
\$\begingroup\$

You can turn this into "here doc" solution by using wrapping it in aprocess substitution<(...), like this:

awk -f <(cat << "EOF"{    ... // awk script, as if in a file}EOF)

This way, you can write' directly, as if in a.awk script file,because there' is no longer an enclosing character.

Note thatcat << "EOF" is necessary instead of simplycat << EOF to avoid variable expansion of$1,$3 and$4.

Also, to avoid multipleprint statements,you will have to embed newlines with\n, and end lines with\ to continue the sameprint statement on the next line.

Finally, you don't need thesed -e 1d, because awk can do this alone, using aNR > 1 filter. However, to account for the extra line in the input, whenever you useNR in the awk script, you would have to change that toNR - 1.

Putting the above together, this script is equivalent to yours (produces same output):

awk -F';' -f <(cat << "EOF"NR > 1 {print "\DECLARE @v_trmPfiID_fk varchar(16) = '"$1"'\n\DECLARE @v_trmNom varchar(6) = '"$3"'\n\DECLARE @v_trmNbrTrav smallint = "$4"\n\IF EXISTS (SELECT 1 FROM trimestre WHERE trmPfiID_fk = @v_trmPfiID_fk AND trmNom = @v_trmNom)\n\    BEGIN\n\        UPDATE trimestre\n\        SET trmNbrTrav = @v_trmNbrTrav\n\        WHERE trmPfiID_fk = @v_trmPfiID_fk AND trmNom = @v_trmNom\n\    END\n\ELSE\n\    BEGIN\n\        PRINT 'The script execution FAILED for record " (NR - 1) " (pfiID ' + @v_trmPfiID_fk + ', trimestre ' + @v_trmNom + ').'\n\    END\n\go\n\"}EOF) < in.csv
answeredSep 30, 2016 at 21:55
janos's user avatar
\$\endgroup\$
0

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.