- Notifications
You must be signed in to change notification settings - Fork5
A fusion of SQL and awk: Applying SQL to text-based data files
License
jgarzik/sqawk
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Sqawk is an SQL-based command-line tool for processing delimiter-separated files (CSV, TSV, etc.), inspired by the classicawk command. It loads data into in-memory tables, executes SQL queries against these tables, and writes the results back to the console or files.
Powerful SQL Query Engine
- Support for SELECT, INSERT, UPDATE, and DELETE operations
- WHERE clause filtering with comparison operators
- DISTINCT keyword for removing duplicate rows
- ORDER BY for sorting results (ASC/DESC)
- Column aliases with the AS keyword
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
- GROUP BY for data aggregation
Multi-Table Operations
- Cross joins between tables
- INNER JOIN with ON conditions for precise join criteria
- Support for joining multiple tables
- Table-qualified column names
Smart Data Handling
- Automatic type inference (Integer, Float, Boolean, String)
- Type coercion for comparisons
- Null value support
File Format Support
- Process CSV, TSV, and custom-delimited files
- Custom field separator support with -F option (like awk)
- Fast in-memory execution
- Process multiple files in a single command
- Table name customization
- Chain multiple SQL statements
Safe Operation
- Doesn't modify files without explicit request (--write flag)
- Only writes back tables that were modified
- Verbose mode for operation transparency
cargo install sqawk
sqawk -s"SELECT * FROM data" data.csvThis loadsdata.csv into an in-memory table called "data" and performs a SELECT query.
sqawk -s"SELECT * FROM employees WHERE salary > 50000" employees.csvsqawk -s"UPDATE data SET status = 'active' WHERE id = 5" data.csv --writeThis updates the status field to 'active' for rows with id = 5 and saves the changes back to data.csv.
sqawk -s"DELETE FROM data WHERE id = 5" data.csv --writeThis removes rows with id = 5 and saves the changes back to data.csv.
sqawk -s"UPDATE data SET status = 'inactive' WHERE last_login < '2023-01-01'" -s"DELETE FROM data WHERE status = 'inactive' AND last_login < '2022-01-01'" -s"SELECT * FROM data" data.csv --write
This executes multiple SQL statements in sequence: first marking recent inactive accounts, then removing very old inactive accounts, and finally showing the results.
sqawk -s"SELECT * FROM users" -s"SELECT * FROM orders" users.csv orders.csv
# Get unique values from a single columnsqawk -s"SELECT DISTINCT category FROM products" products.csv# Get unique combinations of columnssqawk -s"SELECT DISTINCT department, role FROM employees" employees.csv# Use DISTINCT with ORDER BY for sorted unique valuessqawk -s"SELECT DISTINCT region FROM customers ORDER BY region" customers.csv
# Join users and orders using INNER JOIN with ON conditionsqawk -s"SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id" users.csv orders.csv# Join with additional WHERE filteringsqawk -s"SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id WHERE orders.product_id > 100" users.csv orders.csv# Using DISTINCT with JOINs to find unique customer-product pairssqawk -s"SELECT DISTINCT users.name, products.name FROM users INNER JOIN orders ON users.id = orders.user_id INNER JOIN products ON orders.product_id = products.product_id" users.csv orders.csv products.csv
# Process a tab-delimited file (TSV)sqawk -F'\t' -s"SELECT * FROM employees WHERE salary > 70000" employees.tsv# Process a colon-delimited filesqawk -F':' -s"SELECT id, name, email FROM contacts" contacts.txt
sqawk -s"SELECT * FROM data WHERE value > 100" data.csv -vsqawk -s"DELETE FROM data WHERE status = 'expired'" data.csv --writeBy default, sqawk doesn't modify input files. Use the--write flag to save changes back to the original files.
For more detailed information, see:
- User Guide - Complete guide to installing and using Sqawk
- SQL Language Reference - Comprehensive guide to Sqawk's SQL dialect
- In-Memory Database Architecture - Technical details about the database implementation (for developers)
Licensed under the MIT License (LICENSE orhttp://opensource.org/licenses/MIT).
Unless you explicitly state otherwise, any contribution intentionally submittedfor inclusion in the work by you shall be licensed as MIT, without any additionalterms or conditions.
About
A fusion of SQL and awk: Applying SQL to text-based data files
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.