jgarzik/sqawkPublic

NotificationsYou must be signed in to change notification settings
Fork5
Star235

A fusion of SQL and awk: Applying SQL to text-based data files

License

MIT license

235 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 361 Commits
.github/workflows		.github/workflows
doc		doc
src		src
tests		tests
.gitignore		.gitignore
.replit		.replit
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
generated-icon.png		generated-icon.png

Repository files navigation

Sqawk

Sqawk is an SQL-based command-line tool for processing delimiter-separated files (CSV, TSV, etc.), inspired by the classicawk command. It loads data into in-memory tables, executes SQL queries against these tables, and writes the results back to the console or files.

Features

Powerful SQL Query Engine
- Support for SELECT, INSERT, UPDATE, and DELETE operations
- WHERE clause filtering with comparison operators
- DISTINCT keyword for removing duplicate rows
- ORDER BY for sorting results (ASC/DESC)
- Column aliases with the AS keyword
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
- GROUP BY for data aggregation
Multi-Table Operations
- Cross joins between tables
- INNER JOIN with ON conditions for precise join criteria
- Support for joining multiple tables
- Table-qualified column names
Smart Data Handling
- Automatic type inference (Integer, Float, Boolean, String)
- Type coercion for comparisons
- Null value support
File Format Support
- Process CSV, TSV, and custom-delimited files
- Custom field separator support with -F option (like awk)
- Fast in-memory execution
- Process multiple files in a single command
- Table name customization
- Chain multiple SQL statements
Safe Operation
- Doesn't modify files without explicit request (--write flag)
- Only writes back tables that were modified
- Verbose mode for operation transparency

Installation

cargo install sqawk

Usage

Basic SELECT query

sqawk -s"SELECT * FROM data" data.csv

This loadsdata.csv into an in-memory table called "data" and performs a SELECT query.

Filtering data with WHERE clause

sqawk -s"SELECT * FROM employees WHERE salary > 50000" employees.csv

Updating rows

sqawk -s"UPDATE data SET status = 'active' WHERE id = 5" data.csv --write

This updates the status field to 'active' for rows with id = 5 and saves the changes back to data.csv.

Deleting rows

sqawk -s"DELETE FROM data WHERE id = 5" data.csv --write

This removes rows with id = 5 and saves the changes back to data.csv.

Multiple operations

sqawk -s"UPDATE data SET status = 'inactive' WHERE last_login < '2023-01-01'" -s"DELETE FROM data WHERE status = 'inactive' AND last_login < '2022-01-01'" -s"SELECT * FROM data" data.csv --write

This executes multiple SQL statements in sequence: first marking recent inactive accounts, then removing very old inactive accounts, and finally showing the results.

Multiple files

sqawk -s"SELECT * FROM users" -s"SELECT * FROM orders" users.csv orders.csv

Finding unique values with DISTINCT

# Get unique values from a single columnsqawk -s"SELECT DISTINCT category FROM products" products.csv# Get unique combinations of columnssqawk -s"SELECT DISTINCT department, role FROM employees" employees.csv# Use DISTINCT with ORDER BY for sorted unique valuessqawk -s"SELECT DISTINCT region FROM customers ORDER BY region" customers.csv

Join tables with INNER JOIN

# Join users and orders using INNER JOIN with ON conditionsqawk -s"SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id" users.csv orders.csv# Join with additional WHERE filteringsqawk -s"SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id WHERE orders.product_id > 100" users.csv orders.csv# Using DISTINCT with JOINs to find unique customer-product pairssqawk -s"SELECT DISTINCT users.name, products.name FROM users INNER JOIN orders ON users.id = orders.user_id INNER JOIN products ON orders.product_id = products.product_id" users.csv orders.csv products.csv

Custom field separators

# Process a tab-delimited file (TSV)sqawk -F'\t' -s"SELECT * FROM employees WHERE salary > 70000" employees.tsv# Process a colon-delimited filesqawk -F':' -s"SELECT id, name, email FROM contacts" contacts.txt

Verbose mode

sqawk -s"SELECT * FROM data WHERE value > 100" data.csv -v

Write mode

sqawk -s"DELETE FROM data WHERE status = 'expired'" data.csv --write

By default, sqawk doesn't modify input files. Use the--write flag to save changes back to the original files.

Documentation

For more detailed information, see:

User Guide - Complete guide to installing and using Sqawk
SQL Language Reference - Comprehensive guide to Sqawk's SQL dialect
In-Memory Database Architecture - Technical details about the database implementation (for developers)

License

Licensed under the MIT License (LICENSE orhttp://opensource.org/licenses/MIT).

Contribution

Unless you explicitly state otherwise, any contribution intentionally submittedfor inclusion in the work by you shall be licensed as MIT, without any additionalterms or conditions.

About

A fusion of SQL and awk: Applying SQL to text-based data files

Releases1

Initial release Latest

May 6, 2025

Movatterモバイル変換

License

jgarzik/sqawk

Folders and files

Latest commit

History

Repository files navigation

Sqawk

Features

Installation

Usage

Basic SELECT query

Filtering data with WHERE clause

Updating rows

Deleting rows

Multiple operations

Multiple files

Finding unique values with DISTINCT

Join tables with INNER JOIN

Custom field separators

Verbose mode

Write mode

Documentation

License

Contribution

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Uh oh!

Languages