Documentation Home
MySQL 9.1 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 40.4Mb
PDF (A4) - 40.5Mb
Man Pages (TGZ) - 259.5Kb
Man Pages (Zip) - 366.7Kb
Info (Gzip) - 4.1Mb
Info (Zip) - 4.1Mb


MySQL 9.1 Reference Manual  / ...  / MySQL NDB Cluster 9.1  / NDB Cluster Programs  /  ndb_import — Import CSV Data Into NDB

25.5.13 ndb_import — Import CSV Data Into NDB

ndb_import imports CSV-formatted data, such as that produced bymysqldump--tab, directly intoNDB using the NDB API.ndb_import requires a connection to an NDB management server (ndb_mgmd) to function; it does not require a connection to a MySQL Server.

Usage

ndb_importdb_namefile_nameoptions

ndb_import requires two arguments.db_name is the name of the database where the table into which to import the data is found;file_name is the name of the CSV file from which to read the data; this must include the path to this file if it is not in the current directory. The name of the file must match that of the table; the file's extension, if any, is not taken into consideration. Options supported byndb_import include those for specifying field separators, escapes, and line terminators, and are described later in this section.

ndb_import rejects any empty lines which it reads from the CSV file, except when importing a single column, in which case an empty value can be used as the column value.ndb_import handles this in the same manner as aLOAD DATA statement does.

ndb_import must be able to connect to an NDB Cluster management server; for this reason, there must be an unused[api] slot in the clusterconfig.ini file.

To duplicate an existing table that uses a different storage engine, such asInnoDB, as anNDB table, use themysql client to perform aSELECT INTO OUTFILE statement to export the existing table to a CSV file, then to execute aCREATE TABLE LIKE statement to create a new table having the same structure as the existing table, then performALTER TABLE ... ENGINE=NDB on the new table; after this, from the system shell, invokendb_import to load the data into the newNDB table. For example, an existingInnoDB table namedmyinnodb_table in a database namedmyinnodb can be exported into anNDB table namedmyndb_table in a database namedmyndb as shown here, assuming that you are already logged in as a MySQL user with the appropriate privileges:

  1. In themysql client:

    mysql> USE myinnodb;mysql> SELECT * INTO OUTFILE '/tmp/myndb_table.csv'     >  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'     >  LINES TERMINATED BY '\n'     >  FROM myinnodbtable;mysql> CREATE DATABASE myndb;mysql> USE myndb;mysql> CREATE TABLE myndb_table LIKE myinnodb.myinnodb_table;mysql> ALTER TABLE myndb_table ENGINE=NDB;mysql> EXIT;Bye$>

    Once the target database and table have been created, a runningmysqld is no longer required. You can stop it usingmysqladmin shutdown or another method before proceeding, if you wish.

  2. In the system shell:

    # if you are not already in the MySQL bin directory:$> cdpath-to-mysql-bin-dir$> ndb_import myndb /tmp/myndb_table.csv --fields-optionally-enclosed-by='"' \    --fields-terminated-by="," --fields-escaped-by='\\'

    The output should resemble what is shown here:

    job-1 import myndb.myndb_table from /tmp/myndb_table.csvjob-1 [running] import myndb.myndb_table from /tmp/myndb_table.csvjob-1 [success] import myndb.myndb_table from /tmp/myndb_table.csvjob-1 imported 19984 rows in 0h0m9s at 2277 rows/sjobs summary: defined: 1 run: 1 with success: 1 with failure: 0$>

All options that can be used withndb_import are shown in the following table. Additional descriptions follow the table.

  • --abort-on-error

    Command-Line Format--abort-on-error

    Dump core on any fatal error; used for debugging only.

  • --ai-increment=#

    Command-Line Format--ai-increment=#
    TypeInteger
    Default Value1
    Minimum Value1
    Maximum Value4294967295

    For a table with a hidden primary key, specify the autoincrement increment, like theauto_increment_increment system variable does in the MySQL Server.

  • --ai-offset=#

    Command-Line Format--ai-offset=#
    TypeInteger
    Default Value1
    Minimum Value1
    Maximum Value4294967295

    For a table with hidden primary key, specify the autoincrement offset. Similar to theauto_increment_offset system variable.

  • --ai-prefetch-sz=#

    Command-Line Format--ai-prefetch-sz=#
    TypeInteger
    Default Value1024
    Minimum Value1
    Maximum Value4294967295

    For a table with a hidden primary key, specify the number of autoincrement values that are prefetched. Behaves like thendb_autoincrement_prefetch_sz system variable does in the MySQL Server.

  • --character-sets-dir

    Command-Line Format--character-sets-dir=path

    Directory containing character sets.

  • --connections=#

    Command-Line Format--connections=#
    TypeInteger
    Default Value1
    Minimum Value1
    Maximum Value4294967295

    Number of cluster connections to create.

  • --connect-retries

    Command-Line Format--connect-retries=#
    TypeInteger
    Default Value12
    Minimum Value0
    Maximum Value12

    Number of times to retry connection before giving up.

  • --connect-retry-delay

    Command-Line Format--connect-retry-delay=#
    TypeInteger
    Default Value5
    Minimum Value0
    Maximum Value5

    Number of seconds to wait between attempts to contact management server.

  • --connect-string

    Command-Line Format--connect-string=connection_string
    TypeString
    Default Value[none]

    Same as--ndb-connectstring.

  • --continue

    Command-Line Format--continue

    When a job fails, continue to the next job.

  • --core-file

    Command-Line Format--core-file

    Write core file on error; used in debugging.

  • --csvopt=string

    Command-Line Format--csvopt=opts
    TypeString
    Default Value[none]

    Provides a shortcut method for setting typical CSV import options. The argument to this option is a string consisting of one or more of the following parameters:

    • c: Fields terminated by comma

    • d: Use defaults, except where overridden by another parameter

    • n: Lines terminated by\n

    • q: Fields optionally enclosed by double quote characters (")

    • r: Line terminated by\r

    The order of parameters used in the argument to this option is handled such that the rightmost parameter always takes precedence over any potentially conflicting parameters which have already been used in the same argument value. This also applies to any duplicate instances of a given parameter.

    This option is intended for use in testing under conditions in which it is difficult to transmit escapes or quotation marks.

  • --db-workers=#

    Command-Line Format--db-workers=#
    TypeInteger
    Default Value4
    Minimum Value1
    Maximum Value4294967295

    Number of threads, per data node, executing database operations.

  • --defaults-file

    Command-Line Format--defaults-file=path
    TypeString
    Default Value[none]

    Read default options from given file only.

  • --defaults-extra-file

    Command-Line Format--defaults-extra-file=path
    TypeString
    Default Value[none]

    Read given file after global files are read.

  • --defaults-group-suffix

    Command-Line Format--defaults-group-suffix=string
    TypeString
    Default Value[none]

    Also read groups with concat(group, suffix).

  • --errins-type=name

    Command-Line Format--errins-type=name
    TypeEnumeration
    Default Value[none]
    Valid Values

    stopjob

    stopall

    sighup

    sigint

    list

    Error insert type; uselist as thename value to obtain all possible values. This option is used for testing purposes only.

  • --errins-delay=#

    Command-Line Format--errins-delay=#
    TypeInteger
    Default Value1000
    Minimum Value0
    Maximum Value4294967295
    Unitms

    Error insert delay in milliseconds; random variation is added. This option is used for testing purposes only.

  • --fields-enclosed-by=char

    Command-Line Format--fields-enclosed-by=char
    TypeString
    Default Value[none]

    This works in the same way as theFIELDS ENCLOSED BY option does for theLOAD DATA statement, specifying a character to be interpreted as quoting field values. For CSV input, this is the same as--fields-optionally-enclosed-by.

  • --fields-escaped-by=name

    Command-Line Format--fields-escaped-by=char
    TypeString
    Default Value\

    Specify an escape character in the same way as theFIELDS ESCAPED BY option does for the SQLLOAD DATA statement.

  • --fields-optionally-enclosed-by=char

    Command-Line Format--fields-optionally-enclosed-by=char
    TypeString
    Default Value[none]

    This works in the same way as theFIELDS OPTIONALLY ENCLOSED BY option does for theLOAD DATA statement, specifying a character to be interpreted as optionally quoting field values. For CSV input, this is the same as--fields-enclosed-by.

  • --fields-terminated-by=char

    Command-Line Format--fields-terminated-by=char
    TypeString
    Default Value\t

    This works in the same way as theFIELDS TERMINATED BY option does for theLOAD DATA statement, specifying a character to be interpreted as the field separator.

  • --help

    Command-Line Format--help

    Display help text and exit.

  • --idlesleep=#

    Command-Line Format--idlesleep=#
    TypeInteger
    Default Value1
    Minimum Value1
    Maximum Value4294967295
    Unitms

    Number of milliseconds to sleep waiting for more work to perform.

  • --idlespin=#

    Command-Line Format--idlespin=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value4294967295

    Number of times to retry before sleeping.

  • --ignore-lines=#

    Command-Line Format--ignore-lines=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value4294967295

    Cause ndb_import to ignore the first# lines of the input file. This can be employed to skip a file header that does not contain any data.

  • --input-type=name

    Command-Line Format--input-type=name
    TypeEnumeration
    Default Valuecsv
    Valid Values

    random

    csv

    Set the type of input type. The default iscsv;random is intended for testing purposes only. .

  • --input-workers=#

    Command-Line Format--input-workers=#
    TypeInteger
    Default Value4
    Minimum Value1
    Maximum Value4294967295

    Set the number of threads processing input.

  • --keep-state

    Command-Line Format--keep-state

    By default, ndb_import removes all state files (except non-empty*.rej files) when it completes a job. Specify this option (nor argument is required) to force the program to retain all state files instead.

  • --lines-terminated-by=name

    Command-Line Format--lines-terminated-by=char
    TypeString
    Default Value\n

    This works in the same way as theLINES TERMINATED BY option does for theLOAD DATA statement, specifying a character to be interpreted as end-of-line.

  • --log-level=#

    Command-Line Format--log-level=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value2

    Performs internal logging at the given level. This option is intended primarily for internal and development use.

    In debug builds of NDB only, the logging level can be set using this option to a maximum of 4.

  • --login-path

    Command-Line Format--login-path=path
    TypeString
    Default Value[none]

    Read given path from login file.

  • --no-login-paths

    Command-Line Format--no-login-paths

    Skips reading options from the login path file.

  • --max-rows=#

    Command-Line Format--max-rows=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value4294967295
    Unitbytes

    Import only this number of input data rows; the default is 0, which imports all rows.

  • --missing-ai-column

    Command-Line Format--missing-ai-column='name'
    TypeBoolean
    Default ValueFALSE

    This option can be employed when importing a single table, or multiple tables. When used, it indicates that the CSV file being imported does not contain any values for anAUTO_INCREMENT column, and thatndb_import should supply them; if the option is used and theAUTO_INCREMENT column contains any values, the import operation cannot proceed.

  • --monitor=#

    Command-Line Format--monitor=#
    TypeInteger
    Default Value2
    Minimum Value0
    Maximum Value4294967295
    Unitbytes

    Periodically print the status of a running job if something has changed (status, rejected rows, temporary errors). Set to 0 to disable this reporting. Setting to 1 prints any change that is seen. Higher values reduce the frequency of this status reporting.

  • --ndb-connectstring

    Command-Line Format--ndb-connectstring=connection_string
    TypeString
    Default Value[none]

    Set connection string for connecting tondb_mgmd. Syntax:[nodeid=id;][host=]hostname[:port]. Overrides entries inNDB_CONNECTSTRING andmy.cnf.

  • --ndb-mgm-tls

    Command-Line Format--ndb-mgm-tls=level
    TypeEnumeration
    Default Valuerelaxed
    Valid Values

    relaxed

    strict

    Sets the level of TLS support required to connect to the management server; one ofrelaxed orstrict.relaxed (the default) means that a TLS connection is attempted, but success is not required;strict means that TLS is required to connect.

  • --ndb-mgmd-host

    Command-Line Format--ndb-mgmd-host=connection_string
    TypeString
    Default Value[none]

    Same as--ndb-connectstring.

  • --ndb-nodeid

    Command-Line Format--ndb-nodeid=#
    TypeInteger
    Default Value[none]

    Set node ID for this node, overriding any ID set by--ndb-connectstring.

  • --ndb-optimized-node-selection

    Command-Line Format--ndb-optimized-node-selection

    Enable optimizations for selection of nodes for transactions. Enabled by default; use--skip-ndb-optimized-node-selection to disable.

  • --ndb-tls-search-path

    Command-Line Format--ndb-tls-search-path=list
    TypePath name
    Default Value (Unix)$HOME/ndb-tls
    Default Value (Windows)$HOMEDIR/ndb-tls

    Specify a list of directories to search for a CA file. On Unix platforms, the directory names are separated by colons (:); on Windows systems, the semicolon character (;) is used as the separator. A directory reference may be relative or absolute; it may contain one or more environment variables, each denoted by a prefixed dollar sign ($), and expanded prior to use.

    Searching begins with the leftmost named directory and proceeds from left to right until a file is found. An empty string denotes an empty search path, which causes all searches to fail. A string consisting of a single dot (.) indicates that the search path limited to the current working directory.

    If no search path is supplied, the compiled-in default value is used. This value depends on the platform used: On Windows, this is\ndb-tls; on other platforms (including Linux), it is$HOME/ndb-tls. This can be overridden by compiling NDB Cluster using-DWITH_NDB_TLS_SEARCH_PATH.

  • --no-asynch

    Command-Line Format--no-asynch

    Run database operations as batches, in single transactions.

  • --no-defaults

    Command-Line Format--no-defaults

    Do not read default options from any option file other than login file.

  • --no-hint

    Command-Line Format--no-hint

    Do not use distribution key hinting to select a data node.

  • --opbatch=#

    Command-Line Format--opbatch=#
    TypeInteger
    Default Value256
    Minimum Value1
    Maximum Value4294967295
    Unitbytes

    Set a limit on the number of operations (including blob operations), and thus the number of asynchronous transactions, per execution batch.

  • --opbytes=#

    Command-Line Format--opbytes=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value4294967295
    Unitbytes

    Set a limit on the number of bytes per execution batch. Use 0 for no limit.

  • --output-type=name

    Command-Line Format--output-type=name
    TypeEnumeration
    Default Valuendb
    Valid Valuesnull

    Set the output type.ndb is the default.null is used only for testing.

  • --output-workers=#

    Command-Line Format--output-workers=#
    TypeInteger
    Default Value2
    Minimum Value1
    Maximum Value4294967295

    Set the number of threads processing output or relaying database operations.

  • --pagesize=#

    Command-Line Format--pagesize=#
    TypeInteger
    Default Value4096
    Minimum Value1
    Maximum Value4294967295
    Unitbytes

    Align I/O buffers to the given size.

  • --pagecnt=#

    Command-Line Format--pagecnt=#
    TypeInteger
    Default Value64
    Minimum Value1
    Maximum Value4294967295

    Set the size of I/O buffers as multiple of page size. The CSV input worker allocates buffer that is doubled in size.

  • --polltimeout=#

    Command-Line Format--polltimeout=#
    TypeInteger
    Default Value1000
    Minimum Value1
    Maximum Value4294967295
    Unitms

    Set a timeout per poll for completed asynchronous transactions; polling continues until all polls are completed, or until an error occurs.

  • --print-defaults

    Command-Line Format--print-defaults

    Print program argument list and exit.

  • --rejects=#

    Command-Line Format--rejects=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value4294967295

    Limit the number of rejected rows (rows with permanent errors) in the data load. The default is 0, which means that any rejected row causes a fatal error. Any rows causing the limit to be exceeded are added to the.rej file.

    The limit imposed by this option is effective for the duration of the current run. A run restarted using--resume is considered anew run for this purpose.

  • --resume

    Command-Line Format--resume

    If a job is aborted (due to a temporary db error or when interrupted by the user), resume with any rows not yet processed.

  • --rowbatch=#

    Command-Line Format--rowbatch=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value4294967295
    Unitrows

    Set a limit on the number of rows per row queue. Use 0 for no limit.

  • --rowbytes=#

    Command-Line Format--rowbytes=#
    TypeInteger
    Default Value262144
    Minimum Value0
    Maximum Value4294967295
    Unitbytes

    Set a limit on the number of bytes per row queue. Use 0 for no limit.

  • --stats

    Command-Line Format--stats

    Save information about options related to performance and other internal statistics in files named*.sto and*.stt. These files are always kept on successful completion (even if--keep-state is not also specified).

  • --state-dir=name

    Command-Line Format--state-dir=path
    TypeString
    Default Value.

    Where to write the state files (tbl_name.map,tbl_name.rej,tbl_name.res, andtbl_name.stt) produced by a run of the program; the default is the current directory.

  • --table=name

    Command-Line Format--table=name
    TypeString
    Default Value[input file base name]

    By default,ndb_import attempts to import data into a table whose name is the base name of the CSV file from which the data is being read. You can override the choice of table name by specifying it with the--table option (short form-t).

  • --tempdelay=#

    Command-Line Format--tempdelay=#
    TypeInteger
    Default Value10
    Minimum Value0
    Maximum Value4294967295
    Unitms

    Number of milliseconds to sleep between temporary errors.

  • --temperrors=#

    Command-Line Format--temperrors=#
    TypeInteger
    Default Value0
    Minimum Value0
    Maximum Value4294967295

    Number of times a transaction can fail due to a temporary error, per execution batch. The default is 0, which means that any temporary error is fatal. Temporary errors do not cause any rows to be added to the.rej file.

  • --verbose,-v

    Command-Line Format--verbose[=#]
    TypeBoolean
    Default Valuefalse

    Enable verbose output.

  • --usage

    Command-Line Format--usage

    Display help text and exit; same as--help.

  • --version

    Command-Line Format--version

    Display version information and exit.

As withLOAD DATA, options for field and line formatting much match those used to create the CSV file, whether this was done usingSELECT INTO ... OUTFILE, or by some other means. There is no equivalent to theLOAD DATA statementSTARTING WITH option.