Movatterモバイル変換


[0]ホーム

URL:


Usage Guide

This guide will teach you how to assemble SQL queries using FunSQL.

Test Database

To demonstrate database queries, we need a test database. The database we use here is a tiny 10 person sample of simulated patient data extracted from a much largerCMS DE-SynPuf dataset. For a database engine, we pickedSQLite. Using SQLite in a guide is convenient because it does not require a database server to run and allows us to distribute the whole database as a single file. FunSQL supports SQLite and many other database engines. The techniques discussed here are not specific to SQLite and once you learn them, you will be able to apply them to any SQL database.

If you wish to follow along with the guide and run the examples, download the database file:

const URL = "https://github.com/MechanicalRabbit/ohdsi-synpuf-demo/releases/download/20210412/synpuf-10p.sqlite"const DB = download(URL)

All examples in this guide are tested on each update using theNarrativeTest package. To avoid downloading the database file all the time, we registered the download URL as anartifact and usePkg.Artifacts API to fetch it:

using Pkg.Artifacts, LazyArtifactsconst DB = joinpath(artifact"synpuf-10p", "synpuf-10p.sqlite")#-> ⋮

Database Connection

To interact with a SQLite database from Julia code, we need to install theSQLite package:

using PkgPkg.add("SQLite")

Once the package is installed, we can use it to connect to the database:

using SQLiteconst conn = SQLite.DB(DB)

Later we will use theconn object to execute database queries.

Database Schema

The data in the test database is stored in the format of theOMOP Common Data Model, an open source database schema for observational healthcare data. In this guide, we will only use a small fragment of the Common Data Model.

Fragment of the OMOP Common Data Model

Before we can start assembling queries with FunSQL, we need to make FunSQL aware of the database schema. For each table in the database, we need to create a correspondingSQLTable object, which encapsulates the name of the table together with the names of the columns.

using FunSQL: SQLTable

The patient data, including basic demographic information, is stored in the tableperson:

const person =    SQLTable(:person,             columns = [:person_id, :year_of_birth, :location_id])

Patient addresses are stored in a separate tablelocation, linked toperson by the key columnlocation_id:

const location =    SQLTable(:location,             columns = [:location_id, :city, :state])

The bulk of patient data consists of clinical events: visits to healthcare providers, recorded observations, diagnosed conditions, prescribed medications, etc. In this guide we only use two types of events, visits and conditions:

const visit_occurrence =    SQLTable(:visit_occurrence,             columns = [:visit_occurrence_id, :person_id,                        :visit_concept_id,                        :visit_start_date, :visit_end_date])const condition_occurrence =    SQLTable(:condition_occurrence,             columns = [:condition_occurrence_id, :person_id,                        :condition_concept_id,                        :condition_start_date, :condition_end_date])

The specific type of the event (e.g.,Inpatient visit orEssential hypertension condition) is indicated using aconcept id column, which refers to theconcept table:

const concept =    SQLTable(:concept,             columns = [:concept_id, :concept_name])

Different concepts may be related to each other. For instance,Essential hypertensionis aHypertensive disorder, which itselfis aDisorder of cardiovascular system. Concept relationships are recorded in the corresponding table:

const concept_relationship =    SQLTable(:concept_relationship,             columns = [:concept_id_1, :concept_id_2, :relationship_id])

Why FunSQL?

Let us start with clarifying the purpose of FunSQL. Consider a problem:

Find all patients born between 1930 and 1940 and living in Illinois, and for each patient show their current age (by the end of 2020).

The answer can be obtained with the following SQL query:

SELECT p.person_id, 2020 - p.year_of_birth AS ageFROM person pJOIN location l ON (p.location_id = l.location_id)WHERE (p.year_of_birth BETWEEN 1930 AND 1940) AND (l.state = 'IL')

The simplest way to incorporate this query into Julia code is to embed it as a string literal:

sql = """SELECT p.person_id, 2020 - p.year_of_birth AS ageFROM person pJOIN location l ON (p.location_id = l.location_id)WHERE (p.year_of_birth BETWEEN 1930 AND 1940) AND (l.state = 'IL')"""

Using the appropriatedatabase engine API and the connection object createdearlier, we can execute this query and get back the answer:

res = DBInterface.execute(conn, sql)#-> SQLite.Query( … )

As an aside, it is convenient to use theDataFrame interface to show the output of a query in tabular form:

using DataFramesDataFrame(res)#=>1×2 DataFrame Row │ person_id  age     │ Int64      Int64─────┼──────────────────   1 │     72120     83=#

FunSQL introduces an extra step to this workflow. Instead of embedding the SQL query directly into Julia code, we construct aquery object:

using FunSQL: As, From, Fun, Get, Join, Select, Whereq = From(person) |>    Where(Fun.between(Get.year_of_birth, 1930, 1940)) |>    Join(From(location) |> Where(Get.state .== "IL") |> As(:location),         on = Get.location_id .== Get.location.location_id) |>    Select(Get.person_id, :age => 2020 .- Get.year_of_birth)

The value ofq is a composite object of typeSQLNode. "Composite" means thatq is assembled from components (also of typeSQLNode), which themselves are either atomic or assembled from smaller components. Different kinds of components are created bySQLNode constructors such asFrom,Where,Fun,Get, etc.

The actual SQL query is generated byrendering the query object:

using FunSQL: rendersql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "person_2"."person_id",  (2020 - "person_2"."year_of_birth") AS "age"FROM (  SELECT    "person_1"."location_id",    "person_1"."person_id",    "person_1"."year_of_birth"  FROM "person" AS "person_1"  WHERE ("person_1"."year_of_birth" BETWEEN 1930 AND 1940)) AS "person_2"JOIN (  SELECT "location_1"."location_id"  FROM "location" AS "location_1"  WHERE ("location_1"."state" = 'IL')) AS "location_2" ON ("person_2"."location_id" = "location_2"."location_id")=#

Notice that therender function takes a parameter calleddialect. Although the SQL language is standardized, different implementations of SQL tend to deviate from the standard far enough to make them mutually incompatible. For this reason, FunSQL lets us select the target SQL dialect.

At this point, the job of FunSQL is done and, just as before, we can execute the query and display the result:

res = DBInterface.execute(conn, sql)DataFrame(res)#=>1×2 DataFrame Row │ person_id  age     │ Int64      Int64─────┼──────────────────   1 │     72120     83=#

Why, instead of embedding a complete SQL query, we prefer to generate it through a query object? To justify this extra step, consider that in a real Julia program, any query is likely going to be parameterized:

Find all patients born between$start_year and$end_year and living in$states, and for each patient show the$output_columns.

If this is the case, the SQL query cannot be prepared in advance and must be assembled on the fly. While it is possible to assemble a SQL query from string fragments, it is tedious, error-prone and definitely not fun. FunSQL provides a more robust and effective approach: build the query as a composite data structure.

Here is how a parameterized query may be constructed with FunSQL:

function FindPatients(; start_year = nothing,                        end_year = nothing,                        states = String[])    q = From(person) |>        Where(BirthRange(start_year, end_year))    if !isempty(states)        q = q |>            Join(:location => From(location) |>                              Where(Fun.in(Get.state, states...)),                 on = Get.location_id .== Get.location.location_id)    end    qendfunction BirthRange(start_year, end_year)    p = true    if start_year !== nothing        p = Fun.and(p, Get.year_of_birth .>= start_year)    end    if end_year !== nothing        p = Fun.and(p, Get.year_of_birth .<= end_year)    end    pend

The functionFindPatients effectively becomes a newSQLNode constructor, which can be used directly or as a component of a larger query.

Show all patients.

q = FindPatients()print(render(q, dialect = :sqlite))#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"=#

Show all patients born in or after 1930.

q = FindPatients(start_year = 1930) |>    Select(Get.person_id)print(render(q, dialect = :sqlite))#=>SELECT "person_1"."person_id"FROM "person" AS "person_1"WHERE ("person_1"."year_of_birth" >= 1930)=#

Find all patients born between 1930 and 1940 and living in Illinois, and for each patient show their current age.

q = FindPatients(start_year = 1930, end_year = 1940, states = ["IL"]) |>    Select(Get.person_id, :age => 2020 .- Get.year_of_birth)print(render(q, dialect = :sqlite))#=>SELECT  "person_2"."person_id",  (2020 - "person_2"."year_of_birth") AS "age"FROM (  SELECT    "person_1"."location_id",    "person_1"."person_id",    "person_1"."year_of_birth"  FROM "person" AS "person_1"  WHERE    ("person_1"."year_of_birth" >= 1930) AND    ("person_1"."year_of_birth" <= 1940)) AS "person_2"JOIN (  SELECT "location_1"."location_id"  FROM "location" AS "location_1"  WHERE ("location_1"."state" IN ('IL'))) AS "location_2" ON ("person_2"."location_id" = "location_2"."location_id")=#

Tabular Operations

Recall the query from theprevious section:

Find all patients born between 1930 and 1940 and living in Illinois, and for each patient show their current age.

From(person) |>Where(Fun.between(Get.year_of_birth, 1930, 1940)) |>Join(From(location) |> Where(Get.state .== "IL") |> As(:location),     on = Get.location_id .== Get.location.location_id) |>Select(Get.person_id, :age => 2020 .- Get.year_of_birth)

At the outer level, this query is constructed from tabular operationsFrom,Where,Join, andSelect arranged in a pipeline by the pipe (|>) operator. In SQL, atabular operation takes a certain number of input datasets and produces an output dataset. It is helpful to visualize a tabular operation as a node with a certain number of input arrows and one output arrow.

From, Where, Select, and Join nodes

Then the whole query can be visualized as a pipeline diagram. Each arrow in this diagram represents a dataset, and each node represents an elementary data processing operation.

Query pipeline

The following tabular operations are available in FunSQL.

ConstructorFunction
Appendconcatenate datasets
Aswrap all columns in a nested record
Bindcorrelate a subquery in ajoin expression
Defineadd an output column
Fromproduce the content of a database table
Grouppartition the dataset into disjoint groups
Iterateiterate a query
Joincorrelate two datasets
Limittruncate the dataset
Ordersort the dataset
Partitionadd a window to the dataset
Selectspecify output columns
Wherefilter the dataset by the given condition
Withassign a name to a temporary dataset

From,Select, andDefine

TheFrom node outputs the content of a database table. The constructor takes one argument, aSQLTable object (see the sectionDatabase Schema). In a query, a bareSQLTable object is automatically converted to aFrom node, so one could write more compactly:

For each patient, show their current age.

person |>Select(Get.person_id, :age => 2020 .- Get.year_of_birth)

It is possible for a query not to have aFrom node:

Show the current date and time.

q = Select(Fun.current_timestamp())sql = render(q)print(sql)#-> SELECT CURRENT_TIMESTAMP AS "current_timestamp"

In this query, theSelect node is not connected to any source of data. In such a case, it is supplied with aunit dataset containing one row and no columns. Hence this query will generate one row of output.

The same effect could be achieved with theFrom(nothing) node.

q = From(nothing) |>    Select(Fun.current_timestamp())sql = render(q)print(sql)#-> SELECT CURRENT_TIMESTAMP AS "current_timestamp"

In general, theSelect node is used to specify the output columns. The name of the column is either derived from the expression or set explicitly withAs (or its shorthand, the arrow (=>) operator).

As opposed to SQL, FunSQL does not demand that all queries have an explicitSelect. The following query will produce all columns of the table:

Show all patients.

q = From(person)sql = render(q)print(sql)#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"=#

To add a new column while preserving existing output columns, we use theDefine node.

Show the patient data together with their current age.

using FunSQL: Defineq = From(person) |>    Define(:age => 2020 .- Get.year_of_birth)sql = render(q)print(sql)#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id",  (2020 - "person_1"."year_of_birth") AS "age"FROM "person" AS "person_1"=#

Define could also be used to replace an existing column.

Hide the year of birth of patients born before 1930.

q = From(person) |>    Define(:year_of_birth => Fun.case(Get.year_of_birth .>= 1930,                                      Get.year_of_birth,                                      missing))sql = render(q)print(sql)#=>SELECT  "person_1"."person_id",  (CASE WHEN ("person_1"."year_of_birth" >= 1930) THEN "person_1"."year_of_birth" ELSE NULL END) AS "year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"=#

Join

TheJoin node correlates the rows of two input datasets. Predominantly,Join is used for looking up table records by key. In the following example,Join associates eachperson record with theirlocation using the key columnlocation_id that uniquely identifies alocation record:

Show all patients together with their state of residence.

person |>Join(:location => location,     Get.location_id .== Get.location.location_id,     left = true) |>Select(Get.person_id, Get.location.state)

The modifierleft = true tellsJoin that it must outputallperson records including those without the correspondinglocation. Since this is a very common requirement, FunSQL provides an alias:

using FunSQL: LeftJoinperson |>LeftJoin(:location => location,         Get.location_id .== Get.location.location_id) |>Select(Get.person_id, Get.location.state)

SinceJoin needs two input datasets, it must be attached to two input pipelines. The first pipeline is attached using the|> operator and the second one is provided as an argument to theJoin constructor. Alternatively, both input pipelines can be specified as keyword arguments:

Join(over = person,     joinee = :location => location,     on = Get.location_id .== Get.location.location_id,     left = true) |>Select(Get.person_id, Get.location.state)

The output ofJoin combines columns of both input datasets, which will cause ambiguity if both datasets have a column with the same name. Such is the case in the previous example since both tables,person andlocation, have a column calledlocation_id. To disambiguate them, we can place all columns of one of the datasets into a nested record. This is the action of the arrow (=>) operator or its full form, theAs node:

using FunSQL: AsFrom(person) |>LeftJoin(From(location) |> As(:location),         on = Get.location_id .== Get.location.location_id) |>Select(Get.person_id, Get.location.state)

Alternatively, we could usebound column references, which are described in alater section.

Scalar Operations

Many tabular operations includingJoin,Select andWhere are parameterized with scalar operations. Ascalar operation acts on an individual row of a dataset and produces a scalar value. Scalar operations are assembled from literal values, column references, and applications of SQL functions and operators. Below is a list of scalar operations available in FunSQL.

ConstructorFunction
Aggapply an aggregate function
Asassign a column alias
Bindcorrelate an inner subquery
Funapply a scalar function or a scalar operator
Getproduce the value of a column
Litproduce a constant value
Sortindicate the sort order
Varproduce the value of a query parameter

Lit

TheLit node creates a literal value, although we could usually omit the constructor:

using FunSQL: LitSelect(Lit(42))Select(42)

The SQL valueNULL is represented by the Julia constantmissing:

q = Select(missing)sql = render(q, dialect = :sqlite)print(sql)#-> SELECT NULL AS "_"

Get

TheGet node creates a column reference. TheGet constructor admits several equivalent forms:

Get.year_of_birthGet(:year_of_birth)Get."year_of_birth"Get("year_of_birth")

Such column references are resolved at the place of use against the input dataset. As we mentioned earlier, sometimes column references cannot be resolved unambiguously. To alleviate this problem, we can bind the column reference to the node that produces it:

Show all patients with their state of residence.

qₚ = From(person)qₗ = From(location)q = qₚ |>    LeftJoin(qₗ, on = qₚ.location_id .== qₗ.location_id) |>    Select(qₚ.person_id, qₗ.state)

The notationqₚ.location_id andqₗ.location_id is a syntax sugar for

Get(:location_id, over = qₚ)Get(:location_id, over = qₗ)

Fun

SQL functions and operators are represented using theFun node, which also has several equivalent forms:

Fun.between(Get.year_of_birth, 1930, 1940)Fun(:between, Get.year_of_birth, 1930, 1940)Fun."between"(Get.year_of_birth, 1930, 1940)Fun("between", Get.year_of_birth, 1930, 1940)

Certain SQL operators, notably comparison operators, can be represented using Julia broadcasting notation:

Fun.">="(Get.year_of_birth, 1930)Get.year_of_birth .>= 1930

We should note that FunSQL does not verify if a SQL function or an operator is used correctly or even whether it exists or not. In such a case, FunSQL will generate a SQL query that fails to execute:

q = From(person) |>    Select(Fun.frobnicate(Get.year_of_birth))sql = render(q, dialect = :sqlite)print(sql)#=>SELECT FROBNICATE("person_1"."year_of_birth") AS "frobnicate"FROM "person" AS "person_1"=#DBInterface.execute(conn, sql)#-> ERROR: SQLite.SQLiteException("no such function: FROBNICATE")

On the other hand, FunSQL will correctly serialize many SQL functions and operators that have irregular syntax includingAND,OR,NOT,IN,EXISTS,CASE, and others:

Show the demographic cohort of each patient.

q = From(person) |>    Select(Fun.case(Get.year_of_birth .<= 1960, "boomer", "millenial"))sql = render(q, dialect = :sqlite)print(sql)#=>SELECT (CASE WHEN ("person_1"."year_of_birth" <= 1960) THEN 'boomer' ELSE 'millenial' END) AS "case"FROM "person" AS "person_1"=#

Group and Aggregate Functions

Group and aggregate functions are used for summarizing data to report totals, averages and so on. We start by applying theGroup node to partition the input rows into disjoint groups. We can then use aggregate functions to calculate summary values from the rows of each group. In FunSQL, aggregate functions are created using theAgg node. In the following example, we use the aggregate functionAgg.count, which simply counts the number of rows in each group.

Show the number of patients by the year of birth.

using FunSQL: Agg, Groupq = From(person) |>    Group(Get.year_of_birth) |>    Select(Get.year_of_birth, Agg.count())sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "person_1"."year_of_birth",  COUNT(*) AS "count"FROM "person" AS "person_1"GROUP BY "person_1"."year_of_birth"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>10×2 DataFrame Row │ year_of_birth  count     │ Int64          Int64─────┼──────────────────────   1 │          1911      1   2 │          1913      1   3 │          1922      1⋮=#

To indicate that aggregate functions must be applied to the dataset as a whole, we create aGroup node without arguments. This is the case where FunSQL notation deviates from SQL, where we would omit theGROUP BY clause to achieve the same effect.

Show the average year of birth.

q = From(person) |>    Group() |>    Select(Agg.avg(Get.year_of_birth))sql = render(q, dialect = :sqlite)print(sql)#=>SELECT AVG("person_1"."year_of_birth") AS "avg"FROM "person" AS "person_1"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>1×1 DataFrame Row │ avg     │ Float64─────┼─────────   1 │  1935.4=#

In general, the arguments of theGroup node form thegrouping key so that two rows of the input dataset belongs to the same group when they have the same value of the grouping key. The output ofGroup contains all distinct values of the grouping key.

Show the US states that are present in the location records.

q = From(location) |>    Group(Get.state)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT DISTINCT "location_1"."state"FROM "location" AS "location_1"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>10×1 DataFrame Row │ state     │ String─────┼────────   1 │ MI   2 │ WA   3 │ FL⋮=#

FunSQL has no lexical limitations on the use of aggregate functions. While in SQL aggregate functions can only be used in theSELECT orHAVING clauses, there is no such restriction in FunSQL: they could be used in any context where an ordinary expression is permitted. The only requirement is that for each aggregate function, FunSQL can determine the correspondingGroup node. It is convenient to imagine that the output ofGroup contains the grouped rows, which cannot be observed directly, but whose presence in the output allows us to apply aggregate functions.

In particular, we use a regularWhere node where SQL would require aHAVING clause.

Show patients who saw a doctor within the last year.

q = From(visit_occurrence) |>    Group(Get.person_id) |>    Where(Agg.max(Get.visit_end_date) .>= Fun.date("now", "-1 year"))sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "visit_occurrence_1"."person_id"FROM "visit_occurrence" AS "visit_occurrence_1"GROUP BY "visit_occurrence_1"."person_id"HAVING (MAX("visit_occurrence_1"."visit_end_date") >= DATE('now', '-1 year'))=#

When the output ofGroup is blocked by anAs node, we need to traverse it withGet in order to use an aggregate function.

For each patient, show the date of their latest visit to a doctor.

q = From(person) |>    LeftJoin(:visit_group => From(visit_occurrence) |> Group(Get.person_id),             on = Get.person_id .== Get.visit_group.person_id) |>    Select(Get.person_id,           Get.visit_group |> Agg.max(Get.visit_start_date))sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "person_1"."person_id",  "visit_group_1"."max"FROM "person" AS "person_1"LEFT JOIN (  SELECT    "visit_occurrence_1"."person_id",    MAX("visit_occurrence_1"."visit_start_date") AS "max"  FROM "visit_occurrence" AS "visit_occurrence_1"  GROUP BY "visit_occurrence_1"."person_id") AS "visit_group_1" ON ("person_1"."person_id" = "visit_group_1"."person_id")=#

Partition and Window Functions

We can relate each row to other rows in the same dataset using thePartition node andwindow functions. We start by applying thePartition node to partition the input rows into disjoint groups. The rows in each group are reordered according to the given sort order. UnlikeGroup, which collapses each row group into a single row, thePartition node preserves the original rows, but allows us to relate each row to adjacent rows in the same partition. In particular, we can apply regular aggregate functions, which calculate the summary value of a subset of rows related to the current row.

In the following example, the rowsvisit_occurrence are partitioned per patient and ordered by the starting date of the visit. Theframe clause specifies the subset of rows relative to the current row (thewindow frame) to be used by aggregate functions. In this example, the frame contains all rows prior to the current row.

For each visit, show the time passed since the previous visit.

using FunSQL: Partitionq = From(visit_occurrence) |>    Partition(Get.person_id,              order_by = [Get.visit_start_date],              frame = (mode = :rows, start = -Inf, finish = -1)) |>    Select(Get.person_id,           Get.visit_start_date,           Get.visit_end_date,           :gap => Fun.julianday(Get.visit_start_date) .- Fun.julianday(Agg.max(Get.visit_end_date)))sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "visit_occurrence_1"."person_id",  "visit_occurrence_1"."visit_start_date",  "visit_occurrence_1"."visit_end_date",  (JULIANDAY("visit_occurrence_1"."visit_start_date") - JULIANDAY((MAX("visit_occurrence_1"."visit_end_date") OVER (PARTITION BY "visit_occurrence_1"."person_id" ORDER BY "visit_occurrence_1"."visit_start_date" ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)))) AS "gap"FROM "visit_occurrence" AS "visit_occurrence_1"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>27×4 DataFrame Row │ person_id  visit_start_date  visit_end_date  gap     │ Int64      String            String          Float64?─────┼────────────────────────────────────────────────────────   1 │      1780  2008-04-09        2008-04-13      missing   2 │      1780  2008-04-10        2008-04-10           -3.0   3 │      1780  2008-11-22        2008-11-22          223.0   4 │      1780  2009-05-22        2009-05-22          181.0⋮=#

Query Parameters

A SQL query may include a reference to aquery parameter. When we execute such a query, we must supply the actual values for all parameters used in the query. This is a restricted form of dynamic query construction directly supported by SQL syntax.

Show all patients born between$start_year and$end_year.

sql = """SELECT p.person_idFROM person pWHERE p.year_of_birth BETWEEN ? AND ?"""res = DBInterface.execute(conn, sql, (1930, 1940))DataFrame(res)#=>3×1 DataFrame Row │ person_id     │ Int64─────┼───────────   1 │      1780   2 │     30091   3 │     72120=#

FunSQL can be used to construct a query with parameters. Similar toGet, parameter references are created using theVar node.

using FunSQL: Varq = From(person) |>    Where(Fun.between(Get.year_of_birth, Var.start_year, Var.end_year)) |>    Select(Get.person_id)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "person_1"."person_id"FROM "person" AS "person_1"WHERE ("person_1"."year_of_birth" BETWEEN ?1 AND ?2)=#

While we specified parameters by name, in the generated SQL query the same parameters are numbered. If we know the values of the parameters and we wish to execute the query with them, we need to pack the values in the order in which parameters appear in the SQL query:

using FunSQL: packparams = pack(sql, (start_year = 1930, end_year = 1940))#-> Any[1930, 1940]res = DBInterface.execute(conn, sql, params)DataFrame(res)#=>3×1 DataFrame Row │ person_id     │ Int64─────┼───────────   1 │      1780   2 │     30091   3 │     72120=#

Correlated Queries

Aninner query is a SQL query that is included into theouter query a part of a scalar expression. An inner query must either produce a single value or be used as an argument of a query operator, such asIN orEXISTS, which transforms the query output to a scalar value.

It is easy to assemble an inner query with FunSQL.

Find the oldest patients.

qᵢ = From(person) |>     Group() |>     Select(Agg.min(Get.year_of_birth))qₒ = From(person) |>     Where(Get.year_of_birth .== qᵢ)sql = render(qₒ, dialect = :sqlite)print(sql)#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"WHERE ("person_1"."year_of_birth" = (  SELECT MIN("person_2"."year_of_birth") AS "min"  FROM "person" AS "person_2"))=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>1×3 DataFrame Row │ person_id  year_of_birth  location_id     │ Int64      Int64          Int64─────┼───────────────────────────────────────   1 │    110862           1911          436=#

Find patients with no visits to a healthcare provider.

qᵢ = From(visit_occurrence) |>     Select(Get.person_id)qₒ = From(person) |>     Where(Fun."not in"(Get.person_id, qᵢ))sql = render(qₒ, dialect = :sqlite)print(sql)#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"WHERE ("person_1"."person_id" NOT IN (  SELECT "visit_occurrence_1"."person_id"  FROM "visit_occurrence" AS "visit_occurrence_1"))=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>0×3 DataFrame=#

An inner query may depend on the data from the outer query. Such inner queries are calledcorrelated. In FunSQL, correlated queries are created using theBind node. Specifically, in the body of a correlated query we usequery parameters to refer to the external data. TheBind node, which wrap the correlated query, binds each parameter to an expression evaluated in the context of the outer query.

Find all visits where at least one condition was diagnosed.

using FunSQL: BindCorrelatedCondition(person_id, start_date, end_date) =    From(condition_occurrence) |>    Where(Fun.and(Get.person_id .== Var.person_id,                  Fun.between(Get.condition_start_date, Var.start_date, Var.end_date))) |>    Bind(:person_id => person_id,         :start_date => start_date,         :end_date => end_date)q = From(visit_occurrence) |>    Where(Fun.exists(CorrelatedCondition(Get.person_id, Get.visit_start_date, Get.visit_end_date)))sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "visit_occurrence_1"."visit_occurrence_id",  "visit_occurrence_1"."person_id",  "visit_occurrence_1"."visit_concept_id",  "visit_occurrence_1"."visit_start_date",  "visit_occurrence_1"."visit_end_date"FROM "visit_occurrence" AS "visit_occurrence_1"WHERE (EXISTS (  SELECT NULL  FROM "condition_occurrence" AS "condition_occurrence_1"  WHERE    ("condition_occurrence_1"."person_id" = "visit_occurrence_1"."person_id") AND    ("condition_occurrence_1"."condition_start_date" BETWEEN "visit_occurrence_1"."visit_start_date" AND "visit_occurrence_1"."visit_end_date")))=#

Order andLimit

The database server emits the output rows in an arbitrary order. In fact, different runs of the same query may produce rows in a different order. To specify a particular order of output rows, we use theOrder node.

Show patients ordered by the year of birth.

using FunSQL: Orderq = From(person) |>    Order(Get.year_of_birth)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"ORDER BY "person_1"."year_of_birth"=#

TheAsc and theDesc modifiers specify whether to sort the rows in an ascending or in a descending order.

Show patients ordered by the year of birth in the descending order.

using FunSQL: Descq = From(person) |>    Order(Get.year_of_birth |> Desc())sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"ORDER BY "person_1"."year_of_birth" DESC=#

TheLimit node lets us take a slice of the input dataset. To make the output deterministic,Limit must be applied right afterOrder.

Show the top three oldest patients.

using FunSQL: Limitq = From(person) |>    Order(Get.year_of_birth) |>    Limit(1:3)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "person_1"."person_id",  "person_1"."year_of_birth",  "person_1"."location_id"FROM "person" AS "person_1"ORDER BY "person_1"."year_of_birth"LIMIT 3OFFSET 0=#

Append andIterate

TheAppend node concatenates two or more input datasets. Only the columns that are present in every input dataset will be included to the output.

Show all clinical events (visits and conditions) associated with each patient.

using FunSQL: Appendq = From(visit_occurrence) |>    Define(:type => "visit", :date => Get.visit_start_date) |>    Append(From(condition_occurrence) |>           Define(:type => "condition", :date => Get.condition_start_date)) |>    Order(Get.person_id, Get.date)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT  "union_1"."person_id",  "union_1"."type",  "union_1"."date"FROM (  SELECT    "visit_occurrence_1"."person_id",    'visit' AS "type",    "visit_occurrence_1"."visit_start_date" AS "date"  FROM "visit_occurrence" AS "visit_occurrence_1"  UNION ALL  SELECT    "condition_occurrence_1"."person_id",    'condition' AS "type",    "condition_occurrence_1"."condition_start_date" AS "date"  FROM "condition_occurrence" AS "condition_occurrence_1") AS "union_1"ORDER BY  "union_1"."person_id",  "union_1"."date"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>53×3 DataFrame Row │ person_id  type       date     │ Int64      String     String─────┼──────────────────────────────────   1 │      1780  visit      2008-04-09   2 │      1780  visit      2008-04-10   3 │      1780  condition  2008-04-10   4 │      1780  visit      2008-11-22   5 │      1780  condition  2008-11-22   6 │      1780  visit      2009-05-22   7 │      1780  condition  2009-05-22   8 │     30091  visit      2008-11-12  ⋮  │     ⋮          ⋮          ⋮  47 │    110862  condition  2008-09-07  48 │    110862  visit      2009-06-30  49 │    110862  condition  2009-06-30  50 │    110862  visit      2009-09-30  51 │    110862  condition  2009-09-30  52 │    110862  visit      2010-06-07  53 │    110862  condition  2010-06-07                         38 rows omitted=#

For a second example, consider the tableconcept, which contains the vocabulary of medical concepts (such asMyocardial Infarction). These concepts may be related to each other (Myocardial Infarction has a subtypeAcute Myocardial Infarction), and their relationships are stored in the tableconcept_relationship. We can encapsulate construction of a query that finds immediate subtypes as the function:

SubtypesOf(base) =    From(concept) |>    Join(From(concept_relationship) |>         Where(Get.relationship_id .== "Is a"),         on = Get.concept_id .== Get.concept_id_1) |>    Join(:base => base,         on = Get.concept_id_2 .== Get.base.concept_id)

Show the concept "Myocardial Infarction" and its immediate subtypes.

base = From(concept) |>       Where(Get.concept_name .== "Myocardial infarction")q = base |> Append(SubtypesOf(base))sql = render(q, dialect = :sqlite)res = DBInterface.execute(conn, sql)DataFrame(res)#=>2×2 DataFrame Row │ concept_id  concept_name     │ Int64       String─────┼─────────────────────────────────────────   1 │    4329847  Myocardial infarction   2 │     312327  Acute myocardial infarction=#

But how can we fetch not just immediate, but all of the subtypes of a concept?

Show the concept "Myocardial Infarction" and all of its subtypes.

A good start is to repeatedly applySubtypesOf and concatenate all the outputs:

base |>Append(SubtypesOf(base),       SubtypesOf(SubtypesOf(base)),       SubtypesOf(SubtypesOf(SubtypesOf(base))),       SubtypesOf(SubtypesOf(SubtypesOf(SubtypesOf(base)))))

However we do not know if 4 iterations ofSubtypesOf is enough to fully traverse the concept hierarchy. Ideally, we should continue applyingSubtypesOf until the last iteration produces an empty output. This is exactly the action of theIterate node.

using FunSQL: Iterateq = base |>    Iterate(:subtype => SubtypesOf(From(:subtype)))sql = render(q, dialect = :sqlite)print(sql)#=>WITH RECURSIVE "subtype_1" ("concept_id", "concept_name") AS (  SELECT    "concept_1"."concept_id",    "concept_1"."concept_name"  FROM "concept" AS "concept_1"  WHERE ("concept_1"."concept_name" = 'Myocardial infarction')  UNION ALL  SELECT    "concept_2"."concept_id",    "concept_2"."concept_name"  FROM "concept" AS "concept_2"  JOIN (    SELECT      "concept_relationship_1"."concept_id_1",      "concept_relationship_1"."concept_id_2"    FROM "concept_relationship" AS "concept_relationship_1"    WHERE ("concept_relationship_1"."relationship_id" = 'Is a')  ) AS "concept_relationship_2" ON ("concept_2"."concept_id" = "concept_relationship_2"."concept_id_1")  JOIN "subtype_1" ON ("concept_relationship_2"."concept_id_2" = "subtype_1"."concept_id"))SELECT  "subtype_1"."concept_id",  "subtype_1"."concept_name"FROM "subtype_1"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>6×2 DataFrame Row │ concept_id  concept_name     │ Int64       String─────┼───────────────────────────────────────────────   1 │    4329847  Myocardial infarction   2 │     312327  Acute myocardial infarction   3 │     434376  Acute myocardial infarction of a…   4 │     438170  Acute myocardial infarction of i…   5 │     438438  Acute myocardial infarction of a…   6 │     444406  Acute subendocardial infarction=#

Settings


This document was generated withDocumenter.jl version 0.27.10 onFriday 14 January 2022. Using Julia version 1.6.0.


[8]ページ先頭

©2009-2025 Movatter.jp