Introduction
Loading smalls set of data into SQL Server has been always easy on Windows. Moreover if you like GUI based tools, there are plenty of options. We have SSMS, Azure Data Studio, Visual Studio (SSDT), SSIS, Toad for SQL Server, RapidSQL to name a few. If you are interested to learn how to use Azure Data Studio, I recommend you look at my previousarticle.
Now, let's take a step back and imagine you have no access to any of these GUI based tools for a moment. To make things complex, one of your clients is asking you to export/import data from/into a SQL Server instance running on Linux (VM or container).
Luckily for us, the BCP (BulkCopyProgram) utility is available for Linux as part of the command line tools package called mssql-tools.
Installing mssql-tools
The mssql-tools package comes installed by default when using a Docker container image of SQL Server, unfortunately, this is not what happens when SQL Server is manually installed on a Linux VM or bare metal machine. We have to install mssql-tools separately.
I will be using Ubuntu for this article, this is very important to note because the installation steps are a little bit different if you want to use RedHat.
The first step is to register the public repository key used by apt to authenticate the packages, then we just simply register Microsoft's official repository for Ubuntu, here are the commands:
[dba mastery] $ curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -[dba mastery] $ curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
Now that we have the repository ready, we have to update the source list and then proceed with the installation of the mussel-tools package as follows:
[dba mastery] $ apt-get update [dba mastery] $ apt-get install mssql-tools
Once the installation is complete, we have sqlcmd and bcp ready to use. Both command-line utilities will be installed under this path: /opt/mssql-tools/bin/. As an additional step, you can add this folder as part of the$PATH environment variable, but I will skip that part for now.
Exploring data
There are multiple sources to get CSV files with sample data all over the internet, however, in this example I will use the filename "hw_25000.csv". The file in question is composed of biometric information from 25,000 individuals separated in three columns:ID, height and weight.
Because I will be using command line base tools I want to make sure my data is clean, so the very first thing I want to do is to learn more about my dataset. Checking the first and last ten rows of the file will be the easiest way to do a quick check, I will use thehead andtail commands to make that happen.
I would check the first ten rows first using the "head -10" command followed by the filename:
[dba mastery] $ head -10 hw_25000.csv 1, 65.78331, 112.99252, 71.51521, 136.48733, 69.39874, 153.02694, 68.2166, 142.33545, 67.78781, 144.29716, 68.69784, 123.30247, 69.80204, 141.49478, 70.01472, 136.46239, 67.90265, 112.372310,66.78236,120.6672
At glance, the first column data corresponds to an ID or some kind of correlative number. The second column displays height data, while the third columns holds the weight data. I can confirm, I don't have dirty data in the first ten rows.
Let's move to check the last ten rows of the file now. This is pretty much the same as the "head" command, specifying the number of rows we want to see but in reverse. That means we will see the last ten rows of the file:
[dba mastery] $ tail -10 hw_25000.csv 24991, 69.97767, 125.367224992, 71.91656, 128.28424993, 70.96218, 146.193624994, 66.19462, 118.797424995, 67.21126, 127.660324996, 69.50215, 118.031224997, 64.54826, 120.193224998, 64.69855, 118.265524999, 67.52918, 132.268225000, 68.87761, 124.8742
This concludes, the first and last ten rows of my file has accurate and clean data. The most important, I was able to confirm theID, height and weight structure is looking good.
Let's move on to import some data, using the"hw_25000.csv" source file with BCP.
Pre-requisites
- Have a database created before the import process
- Have a target table created before the import process
Importing data with BCP
I will provide you with the basic example of using BCP to import data in Linux. There are many more complex scenarios, for what I strongly recommend to check BCP's utility documentation atMicrosoft Docs. The BCP utility requires a few arguments when importing data. Let's take a look at each one of them:
- -S: The server name or IP address to connect
- -U: SQL Server user name, this is the login we will use to connect
- -P: Password of the SQL Server login used for the connection
- -d: The target database
- -c: It specifies the operation is made using a character data type (optional)
- -t: It specifies the field terminator, it can be acomma or atab
- in: Specifies we are importing data into a database
Here is how the BCP command looks like when importing data into a table called "HW" that belongs to the "Biometrics" database using a comma-separated CSV file called hw_25000.csv:
bcp HW in hw_25000.csv -S localhost -U sa -P MyP@ssw0rd# -d Biometrics -c -t ','
Right after executing the command I can see the output on screen:
Starting copy...1000 rows sent to SQL Server. Total sent: 10001000 rows sent to SQL Server. Total sent: 20001000 rows sent to SQL Server. Total sent: 30001000 rows sent to SQL Server. Total sent: 40001000 rows sent to SQL Server. Total sent: 50001000 rows sent to SQL Server. Total sent: 60001000 rows sent to SQL Server. Total sent: 70001000 rows sent to SQL Server. Total sent: 80001000 rows sent to SQL Server. Total sent: 90001000 rows sent to SQL Server. Total sent: 100001000 rows sent to SQL Server. Total sent: 110001000 rows sent to SQL Server. Total sent: 120001000 rows sent to SQL Server. Total sent: 130001000 rows sent to SQL Server. Total sent: 140001000 rows sent to SQL Server. Total sent: 150001000 rows sent to SQL Server. Total sent: 160001000 rows sent to SQL Server. Total sent: 170001000 rows sent to SQL Server. Total sent: 180001000 rows sent to SQL Server. Total sent: 190001000 rows sent to SQL Server. Total sent: 200001000 rows sent to SQL Server. Total sent: 210001000 rows sent to SQL Server. Total sent: 220001000 rows sent to SQL Server. Total sent: 230001000 rows sent to SQL Server. Total sent: 240001000 rows sent to SQL Server. Total sent: 2500025000 rows copied.Network packet size (bytes): 4096Clock Time (ms.) Total : 693 Average : (36075.0 rows per sec.)
It took 693 milliseconds to import 25,000 records, not bad at all. I'm using a SQL Server 2017 container with one CPU and two gigabytes of RAM.
We can see from the output that there were no errors, however, I would like to verify the data running a simple query to check the first ten rows of theHW table:
sqlcmd -S localhost -d Biometrics -U sa -P MyP@ssw0rd# -I -Q "SELECT TOP 10 * FROM HW;"Id Height Weight ----------- ---------------- ---------------- 1 65.78331 112.9925 2 71.51521 136.4873 3 69.39874 153.0269 4 68.2166 142.3354 5 67.78781 144.2971 6 68.69784 123.3024 7 69.80204 141.4947 8 70.01472 136.4623 9 67.90265 112.3723 10 66.78236 120.6672(10 rows affected)
These first ten rows match perfectly with the ones I checked using the "head -10" command, this is a really good indicator my data looks as expected.
Exporting data with BCP
Exporting data is pretty straightforward, in this example, I will "dump" all the data from the table created in the previous section. The arguments are basically the same with a slight change, we are exporting data (out):
- out: Specifies we are exporting data into a database
Once again, I will use the same database, table this time exporting the data into a comma-separated file called hw_bcp_out.csv. Here is how the BCP command looks:
bcp HW out hw_bcp_out.csv -S localhost -U sa -P MyP@ssw0rd# -d Biometrics -c -t ','
Right after executing the command I can see the output on screen:
Starting copy...1000 rows successfully bulk-copied to host-file. Total received: 10001000 rows successfully bulk-copied to host-file. Total received: 20001000 rows successfully bulk-copied to host-file. Total received: 30001000 rows successfully bulk-copied to host-file. Total received: 40001000 rows successfully bulk-copied to host-file. Total received: 50001000 rows successfully bulk-copied to host-file. Total received: 60001000 rows successfully bulk-copied to host-file. Total received: 70001000 rows successfully bulk-copied to host-file. Total received: 80001000 rows successfully bulk-copied to host-file. Total received: 90001000 rows successfully bulk-copied to host-file. Total received: 100001000 rows successfully bulk-copied to host-file. Total received: 110001000 rows successfully bulk-copied to host-file. Total received: 120001000 rows successfully bulk-copied to host-file. Total received: 130001000 rows successfully bulk-copied to host-file. Total received: 140001000 rows successfully bulk-copied to host-file. Total received: 150001000 rows successfully bulk-copied to host-file. Total received: 160001000 rows successfully bulk-copied to host-file. Total received: 170001000 rows successfully bulk-copied to host-file. Total received: 180001000 rows successfully bulk-copied to host-file. Total received: 190001000 rows successfully bulk-copied to host-file. Total received: 200001000 rows successfully bulk-copied to host-file. Total received: 210001000 rows successfully bulk-copied to host-file. Total received: 220001000 rows successfully bulk-copied to host-file. Total received: 230001000 rows successfully bulk-copied to host-file. Total received: 240001000 rows successfully bulk-copied to host-file. Total received: 2500025000 rows copied.Network packet size (bytes): 4096Clock Time (ms.) Total : 123 Average : (203252.0 rows per sec.)
Nice! The 25,00 rows were copied into a comma-separated file that contains all the data from the HW table. Let's check my current folder to make the hw_bcp_out.csv file exists using the list (ls -l) with details command:
[dba mastery] $ ls -ltotal 1212-rw-r--r-- 1 root root 608306 Dec 13 04:38 hw_25000.csv-rw-r--r-- 1 root root 608307 Dec 13 05:37 hw_bcp_out.csv
I can see two files, the one I used during the import example (hw_25000.csv) and the other created by the BCP out (hw_bcp_out.csv). In case you want to go the extra mile, you can check the first and last ten rows using the "head" and "tail" commands as before, but I will skip that for now.
Conclusion
The BCP utility is a very powerful cross-platform command-line tool we can use to import/export data into/from a SQL Server instance running on any environment (Windows, Linux, Docker containers).
