- Notifications
You must be signed in to change notification settings - Fork49
A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
License
thombashi/sqlitebiter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
sqlitebiter is a CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
- Create a SQLite database file from:
- File(s):
- CSV / Tab separated values (TSV) / Space separated values (SSV)
- Microsoft ExcelTM
- HTML
- JSON
- See also:Limitations of JSON conversion
- Jupyter Notebook
- Labeled Tab-separated Values (LTSV)
- Line-delimited JSON(LDJSON) / NDJSON / JSON Lines
- Markdown
- Mediawiki
- SQLite
- Google Sheets
- URL (scrape tabular data from web pages)
- Multi-byte character support
- Automatic file encoding detection
Following is an example that converts HTML table tags within a web page to SQLite tables by the web page URL.
Example: | $sqlitebiter url"https://en.wikipedia.org/wiki/Comparison_of_firewalls"[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html1' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html2' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html3' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html4' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html5' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html6' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html7' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html8' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html9' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html10' table[INFO] sqlitebiter url: convert 'Comparison_of_firewalls' to 'Comparison_of_firewalls_Wikipedia_html11' table[INFO] sqlitebiter url: converted results: source=1, success=11, created-table=11[INFO] sqlitebiter url: database path: out.sqlite |
---|---|
Output: | $ sqlite3out.sqlite .schemaCREATETABLEIF NOT EXISTS'_source_info_' ("source_id"INTEGERNOT NULL,"dir_name"TEXT,"base_name"TEXTNOT NULL,"format_name"TEXTNOT NULL,"dst_table"TEXTNOT NULL, sizeINTEGER, mtimeINTEGER);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html1' (FirewallTEXT, LicenseTEXT, [Costand usage limits]TEXT, OSTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html2' (FirewallTEXT, LicenseTEXT, CostTEXT, OSTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html3' ([Can Target:]TEXT, [Changing default policy to accept/reject (by issuing a single rule)]TEXT, [IP destination address(es)]TEXT, [IP source address(es)]TEXT, [TCP/UDP destination port(s)]TEXT, [TCP/UDP source port(s)]TEXT, [Ethernet MAC destination address]TEXT, [Ethernet MAC source address]TEXT, [Inbound firewall (ingress)]TEXT, [Outbound firewall (egress)]TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html4' ([Can:]TEXT, [work at OSI Layer4 (stateful firewall)]TEXT, [work at OSI Layer7 (application inspection)]TEXT, [Change TTL? (Transparent to traceroute)]TEXT, [Configure REJECT-with answer]TEXT, [DMZ (de-militarized zone)- allows for single/several hosts not to be firewalled.]TEXT, [Filter according totime of day]TEXT, [Redirect TCP/UDP ports (port forwarding)]TEXT, [Redirect IP addresses (forwarding)]TEXT, [Filter according to User Authorization]TEXT, [Traffic rate-limit/ QoS]TEXT, TarpitTEXT, LogTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html5' ([Features:]TEXT,"Configuration: GUI_ text or both modes?"TEXT,"Remote Access: Web (HTTP)_ Telnet_ SSH_ RDP_ Serial COM RS232_ ..."TEXT, [Change rules without requiring restart?]TEXT, [Ability to centrally manage all firewalls together]TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html6' ([Features:]TEXT, [Modularity: supports third-party modules to extend functionality?]TEXT, [IPS : Intrusion prevention system]TEXT, [Open-Source License?]TEXT, [supports IPv6 ?]TEXT, [Class: Home/ Professional]TEXT, [Operating Systemson which it runs?]TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html7' ([Can:]TEXT,"NAT44 (static_ dynamic w/o ports_ PAT)"TEXT,"NAT64_ NPTv6"TEXT, [IDS (Intrusion Detection System)]TEXT, [VPN (Virtual Private Network)]TEXT, [AV (Anti-Virus)]TEXT, SnifferTEXT, [Profile selection]TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html8' ([vteFirewall software]TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html9' (ATEXT, BTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html10' (ATEXT, BTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html11' (ATEXT, BTEXT); |
The attributes within the converted SQLite database may include symbols as the above.Symbols within attributes can be replaced by using--replace-symbol
option.The following example shows replacing symbols with underscores.
Example: | $sqlitebiter --replace-symbol _ -q url"https://en.wikipedia.org/wiki/Comparison_of_firewalls" |
---|---|
Output: | $ sqlite3out.sqlite .schemaCREATETABLEIF NOT EXISTS'_source_info_' ("source_id"INTEGERNOT NULL,"dir_name"TEXT,"base_name"TEXTNOT NULL,"format_name"TEXTNOT NULL,"dst_table"TEXTNOT NULL, sizeINTEGER, mtimeINTEGER);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html1' (FirewallTEXT, LicenseTEXT,"Cost_and_usage_limits"TEXT, OSTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html2' (FirewallTEXT, LicenseTEXT, CostTEXT, OSTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html3' ("Can_Target"TEXT,"Changing_default_policy_to_accept_reject_by_issuing_a_single_rule"TEXT,"IP_destination_address_es"TEXT,"IP_source_address_es"TEXT,"TCP_UDP_destination_port_s"TEXT,"TCP_UDP_source_port_s"TEXT,"Ethernet_MAC_destination_address"TEXT,"Ethernet_MAC_source_address"TEXT,"Inbound_firewall_ingress"TEXT,"Outbound_firewall_egress"TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html4' (CanTEXT,"work_at_OSI_Layer_4_stateful_firewall"TEXT,"work_at_OSI_Layer_7_application_inspection"TEXT,"Change_TTL_Transparent_to_traceroute"TEXT,"Configure_REJECT_with_answer"TEXT,"DMZ_de_militarized_zone_allows_for_single_several_hosts_not_to_be_firewalled"TEXT,"Filter_according_to_time_of_day"TEXT,"Redirect_TCP_UDP_ports_port_forwarding"TEXT,"Redirect_IP_addresses_forwarding"TEXT,"Filter_according_to_User_Authorization"TEXT,"Traffic_rate_limit_QoS"TEXT, TarpitTEXT, LogTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html5' (FeaturesTEXT,"Configuration_GUI_text_or_both_modes"TEXT,"Remote_Access_Web_HTTP_Telnet_SSH_RDP_Serial_COM_RS232"TEXT,"Change_rules_without_requiring_restart"TEXT,"Ability_to_centrally_manage_all_firewalls_together"TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html6' (FeaturesTEXT,"Modularity_supports_third_party_modules_to_extend_functionality"TEXT,"IPS _Intrusion_prevention_system"TEXT,"Open_Source_License"TEXT,"supports_IPv6"TEXT,"Class_Home_Professional"TEXT,"Operating_Systems_on_which_it_runs"TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html7' (CanTEXT,"NAT44_static_dynamic_w_o_ports_PAT"TEXT,"NAT64_NPTv6"TEXT,"IDS_Intrusion_Detection_System"TEXT,"VPN_Virtual_Private_Network"TEXT,"AV_Anti_Virus"TEXT, SnifferTEXT,"Profile_selection"TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html8' ("vteFirewall_software"TEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html9' (ATEXT, BTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html10' (ATEXT, BTEXT);CREATETABLEIF NOT EXISTS'Comparison_of_firewalls_Wikipedia_html11' (ATEXT, BTEXT); |
A data format is a mandatory argument for sqlitebiterstdin subcommand
:
Example: | $echo'[{"hoge": 4, "foo": "abc"}, {"hoge": 12, "foo": "xyz"}]'| sqlitebiter stdin json[INFO] convert 'stdin' to 'json1' table[INFO] converted results: source=1, success=1, created-table=1[INFO] database path: out.sqlite |
---|
Usage: sqlitebiter [OPTIONS] COMMAND [ARGS]...Options: -o, --output-path PATH Output path of the SQLite database file. [default: out.sqlite] -a, --append Append table(s) to existing database. --add-primary-key PRIMARY_KEY_NAME Add 'PRIMARY KEY AUTOINCREMENT' column to a converted table with the specified name. --convert-config TEXT [experimental] Configurations for data conversion. The option can be used only for url subcommand. -i, --index INDEX_ATTR Comma separated attribute names to create indices. --no-type-inference All of the columns assume as TEXT data type in creating tables. --type-hint-header Use header suffixes as type hints. If there are type hints, convert columns by datatype corresponding with type hints. The following suffixes can be recognized as type hints (case insensitive): "text": TEXT datatype. "integer": INTEGER datatype. "real": REAL datatype. --matrix-formatting [header_aligned|trim] header_aligned: fitting table data to header columns. trim: fitting table data to minimum column size. [default: header_aligned] --replace-symbol TEXT Replace symbols in attributes. -v, --verbose Verbosity level. [default: 0] --max-workers WORKERS Specify the maximum number of workers that the command may use. [default: 1] --debug For debug print. -q, --quiet Suppress execution log messages. -h, --help Show this message and exit.Commands: completion A helper command to setup command completion. configure Configure the following application settings: file Convert tabular data within CSV/Excel/HTML/JSON/Jupyter... gs Convert a spreadsheet in Google Sheets to a SQLite database... stdin Convert tabular data within CSV/HTML/JSON/Jupyter... url Scrape tabular data from a URL and convert data to a SQLite... version Show version information
More examples are available athttps://sqlitebiter.rtfd.io/en/latest/pages/usage/index.html
pip install sqlitebiter
You can install the package byapt
via a Personal Package Archive (PPA):
sudo add-apt-repository ppa:thombashi/ppasudo apt updatesudo apt install sqlitebiter
The following commands will download the latest.deb
package from therelease page, and install it.
curl -sSL https://raw.githubusercontent.com/thombashi/sqlitebiter/master/scripts/installer.sh | sudo bash
sqlitebiter
can be used in Windows environments without Python installation as follows:
- Navigate tohttps://github.com/thombashi/sqlitebiter/releases
- Download the latest version of the
sqlitebiter_windows_amd64.zip
- Unzip the file
- Execute
sqlitebiter.exe
in either Command Prompt or PowerShell
The following commands will download the latest execution binary from therelease page to the current directory.
wget https://github.com/thombashi/sqlitebiter/raw/master/scripts/get-sqlitebiter.ps1 -OutFile get-sqlitebiter.ps1Set-ExecutionPolicy Unrestricted -Scope Process -Force; .\get-sqlitebiter.ps1
$brew tap thombashi/sqlitebiter$brew install sqlitebiter
setup command completion for bash: sqlitebiter completion bash >> ~/.bashrcsetup command completion for zsh: sqlitebiter completion zsh >> ~/.zshrc
Python 3.7+
Extra Python packages are required to install to use the Google Sheets feature (gs subcommand):
The extra packages can be installed with the following pip command;
$pip install sqlitebiter[gs]
note: binary packages include these dependencies
libxml2
(faster HTML/Markdown conversion)- pandoc (required when converting MediaWiki files)
About
A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.