DuckDB is a fast
|
database system
Query and transform your data anywhere
using DuckDB's feature-rich SQL dialect
- SQL
- Python
- R
- Java
- Node.js
-- Get the top-3 busiest train stationsSELECTstation_name,count(*)ASnum_servicesFROMtrain_servicesGROUPBYALLORDERBYnum_servicesDESCLIMIT3;in ClickBench hot runs
on Stack Overflow
companies use DuckDB
month
machine
DuckDB at a glance
Simple
DuckDB was designed to be simple to use. It has zero dependencies, so it can be installed in seconds and deployed in milliseconds.
Read morePortable
DuckDB runs on all popular operating systems and hardware architectures. It has idiomatic client APIs for major programming languages.
Read moreFeature-rich
DuckDB supports multiple file formats (CSV, Parquet, and JSON), data lake formats. It can also connect to network and cloud storage.
Read moreFast
DuckDB runs queries at blazing speed. It uses parallel execution and can process larger-than-memory workloads.
Read moreExtensible
DuckDB is extensible by third-party features. User contributions are available as community extensions.
Read moreFree
DuckDB, its core extensions and the DuckLake format are
Installation
DuckDB is seamlessly integrated with major programming languages and can be installed in seconds on most platforms.
More installation options- Command line
- Python
- Go
- Java
- Node.js
- R
- Rust
- ODBC
curlhttps://install.duckdb.org |shLatest release: DuckDB 1.4.4 |System detected:
pipinstallduckdbinstall.packages("duckdb")# install from source using CRANinstall.packages("duckdb")# install from binaries using the Posit Public Package Manageroptions(HTTPUserAgent=sprintf("R/%s R (%s)",getRversion(),paste(getRversion(),R.version["platform"],R.version["arch"],R.version["os"])))install.packages("duckdb",repos="https://p3m.dev/cran/__linux__/manylinux_2_28/latest/")install.packages("duckdb")npminstall @duckdb/node-apicurlhttps://install.duckdb.org |shcurlhttps://install.duckdb.org |shcargoaddduckdb--features bundledgo getgithub.com/duckdb/duckdb-go/v2-- Get the top-3 busiest train stationsSELECTstation_name,count(*)ASnum_servicesFROMtrain_servicesGROUPBYALLORDERBYnum_servicesDESCLIMIT3;-- Load CSV file to a table. DuckDB auto-detects-- the CSV's format, column name and typesCREATETABLEstationsASFROM'https://blobs.duckdb.org/stations.csv';-- Directly query Parquet file over HTTPSSELECTstation_name,count(*)ASnum_servicesFROM'https://blobs.duckdb.org/train_services.parquet'GROUPBYALLORDERBYnum_servicesDESCLIMIT10;-- Find the top-3 longest domestic train routesSELECTs1.name_short,s2.name_short,d.distanceFROMdistancesdJOINstationss1ONd.station1=s1.codeJOINstationss2ONd.station2=s2.codeWHEREs1.country=s2.countryANDs1.code<s2.codeORDERBYdistanceDESCLIMIT3;-- List the closest IC stations (as the crow flies)SELECTs1.name_longASstation1,s2.name_longASstation2,ST_Distance(ST_Point(s1.geo_lng,s1.geo_lat),ST_Point(s2.geo_lng,s2.geo_lat))*111_139ASdistanceFROMstationss1,stationss2WHEREs1.typeLIKE'%Intercity%'ANDs2.typeLIKE'%Intercity%'ANDs1.id<s2.idORDERBYdistanceASCLIMIT3;# Get the top-3 busiest train stationsimportduckdbduckdb.sql(""" SELECT station, count(*) AS num_services FROM train_services GROUP BY ALL ORDER BY num_services DESC LIMIT 3; """)# Reading and writing Pandas dataframesimportpandasaspdimportduckdbdf_in=pd.DataFrame({'station':['Delft','Delft','Gouda','Gouda'],'day':['Mon','Tue','Mon','Tue'],'num_services':[22,20,27,25]})# Run query on a dataframe and return a dataframedf_out=duckdb.sql(""" SELECT station, sum(num_services) FROM df_in GROUP BY station """).to_df()# Create custom user-defined functionimportduckdbdefplus_one(x):returnx+1con=duckdb.connect()con.create_function('plus_one',plus_one,['BIGINT'],'BIGINT',type='native')con.sql(""" SELECT sum(plus_one(i)) FROM range(10) tbl(i); """)# Find the largest sepals/petals in the Iris data setlibrary(duckdb)con<-dbConnect(duckdb())duckdb_register(con,"iris",iris)query<-r'( SELECT count(*) AS num_observations, max("Sepal.Width") AS max_width, max("Petal.Length") AS max_petal_length FROM iris WHERE "Sepal.Length" > 5 GROUP BY ALL )'dbGetQuery(con,query)# Find the largest sepals/petals in the Iris data set# using duckplyrlibrary("duckplyr")iris|>filter(Sepal.Length>5)|>group_by(Species)|>summarize(num_observations=n(),max_width=max(Sepal.Width),max_petal_length=max(Petal.Length),na.rm=TRUE)|>collect()# Find the largest sepals/petals in the Iris data set# using dplyrlibrary("duckdb")library("dplyr")con<-dbConnect(duckdb())duckdb_register(con,"iris",iris)tbl(con,"iris")|>filter(Sepal.Length>5)|>group_by(Species)|>summarize(num_observations=count(),max_width=max(Sepal.Width),max_petal_length=max(Petal.Length),na.rm=TRUE)|>collect()// Get a list of train stations by trafficConnectionconn=DriverManager.getConnection("jdbc:duckdb:");Statementst=conn.createStatement();ResultSetrs=st.executeQuery(""" SELECT station_name, count(*) AS num_services FROM train_services GROUP BY ALL ORDER BY num_services DESC; """);System.out.println(rs.next());// Perform bulk inserts using the Appender APIDuckDBConnectionconn=(DuckDBConnection)DriverManager.getConnection("jdbc:duckdb:");Statementst=conn.createStatement();st.execute("CREATE TABLE person (name VARCHAR, age INT)");varappender=conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA,"person");appender.beginRow();appender.append("MC Ducky");appender.append(49);appender.endRow();appender.close();// Get the top-3 busiest train stations in Mayimport{DuckDBInstance}from'@duckdb/node-api';constinstance=awaitDuckDBInstance.create();constconnection=awaitinstance.connect();constreader=awaitconnection.runAndReadAll(`SELECT station_name, count(*) AS num_services FROM 'https://blobs.duckdb.org/train_services.parquet' WHERE monthname(date) = 'May' GROUP BY ALL ORDER BY num_services DESC LIMIT 3;`);console.table(reader.getRows());// Web Service Integration:// Create endpoint to generate numbersimportexpressfrom"express";import{DuckDBInstance}from'@duckdb/node-api';constapp=express();constinstance=awaitDuckDBInstance.create();constconnection=awaitinstance.connect();app.get("/getnumbers",async(req,res)=>{constreader=awaitconnection.runAndReadAll("SELECT random() AS num FROM range(10)");res.end(JSON.stringify(reader.getRows()));});app.listen(8082,()=>console.log("Go to: http://localhost:8082/getnumbers"));[8]ページ先頭