We use FunSQL to assemble SQL queries. To actually run these queries, we need a regular database library such asSQLite.jl,LibPQ.jl,MySQL.jl, orODBC.jl.
In the following examples, we use a SQLite database containing a tiny sample of theCMS DE-SynPuf dataset. See theUsage Guide for the description of the database schema.
Download the database file.
const URL = "https://github.com/MechanicalRabbit/ohdsi-synpuf-demo/releases/download/20210412/synpuf-10p.sqlite"const DB = download(URL)Download the database file as anartifact.
using Pkg.Artifacts, LazyArtifactsconst DB = joinpath(artifact"synpuf-10p", "synpuf-10p.sqlite")#-> ⋮Create a SQLite connection object.
using SQLiteconst conn = SQLite.DB(DB)FunSQL does not export any symbols by default. The following statement imports all available query constructors, aSQLTable constructor, and the functionrender.
using FunSQL: Agg, Append, As, Asc, Bind, Define, Desc, Fun, From, Get, Group, Highlight, Join, LeftJoin, Limit, Lit, Order, Partition, SQLTable, Select, Sort, Var, Where, renderFor each database table referenced in a query, we need to create aSQLTable object encapsulating the name of the table and the list of the table columns.
SQLTable(:person, columns = [:person_id, :year_of_birth, :location_id])Instead of creatingSQLTable objects manually, we could create them automatically by extracting the information about the available tables from the database itself. For SQLite, this could be done as follows.
using Tablesconst introspect_sqlite_sql = """ SELECT NULL AS schema, sm.name, pti.name AS column FROM sqlite_master sm, pragma_table_info(sm.name) pti WHERE sm.type IN ('table', 'view') AND sm.name NOT LIKE 'sqlite_%' ORDER BY sm.name """introspect_sqlite(conn) = DBInterface.execute(conn, introspect_sqlite_sql) |> make_tablesfunction make_tables(res) tables = SQLTable[] schema = name = nothing columns = Symbol[] for (s, n, c) in Tables.rows(res) s = s !== missing ? Symbol(s) : nothing n = Symbol(n) c = Symbol(c) if s === schema && n === name push!(columns, c) else if !isempty(columns) t = SQLTable(schema = schema, name = name, columns = columns) push!(tables, t) end schema = s name = n columns = [c] end end if !isempty(columns) t = SQLTable(schema = schema, name = name, columns = columns) push!(tables, t) end return tablesendconst tables = introspect_sqlite(conn)The vectortables contains all the tables available in the database.
display(tables)#=>44-element Vector{SQLTable}: SQLTable(:attribute_definition, …) SQLTable(:care_site, …) SQLTable(:cdm_source, …) SQLTable(:cohort, …) SQLTable(:cohort_ace, …) SQLTable(:cohort_all, …) SQLTable(:cohort_ami, …) SQLTable(:cohort_ang, …) SQLTable(:cohort_attribute, …) SQLTable(:cohort_definition, …) ⋮ SQLTable(:procedure_cost, …) SQLTable(:procedure_occurrence, …) SQLTable(:provider, …) SQLTable(:relationship, …) SQLTable(:source_to_concept_map, …) SQLTable(:specimen, …) SQLTable(:visit_cost, …) SQLTable(:visit_occurrence, …) SQLTable(:vocabulary, …)=#It is convenient to add theSQLTable objects to the global scope.
for t in tables @eval const $(t.name) = $tenddisplay(person)#=>SQLTable(:person, columns = [:person_id, :gender_concept_id, :year_of_birth, :month_of_birth, :day_of_birth, :time_of_birth, :race_concept_id, :ethnicity_concept_id, :location_id, :provider_id, :care_site_id, :person_source_value, :gender_source_value, :gender_source_concept_id, :race_source_value, :race_source_concept_id, :ethnicity_source_value, :ethnicity_source_concept_id])=#Alternatively, we could encapsulate allSQLTable objects in aNamedTuple.
const db = NamedTuple([t.name => t for t in tables])display(db.person)#=>SQLTable(:person, columns = [:person_id, :gender_concept_id, :year_of_birth, :month_of_birth, :day_of_birth, :time_of_birth, :race_concept_id, :ethnicity_concept_id, :location_id, :provider_id, :care_site_id, :person_source_value, :gender_source_value, :gender_source_concept_id, :race_source_value, :race_source_concept_id, :ethnicity_source_value, :ethnicity_source_concept_id])=#The following code generatesSQLTable objects for a PostgreSQL database. See the sectionDatabase introspection (SQLite) for the definition of themake_tables() function and instructions on how to bring the generatedSQLTable objects into the global scope.
const introspect_postgresql_sql = """ SELECT n.nspname AS schema, c.relname AS name, a.attname AS column FROM pg_catalog.pg_namespace AS n JOIN pg_catalog.pg_class AS c ON (n.oid = c.relnamespace) JOIN pg_catalog.pg_attribute AS a ON (c.oid = a.attrelid) WHERE n.nspname = \$1 AND c.relkind IN ('r', 'v') AND HAS_TABLE_PRIVILEGE(c.oid, 'SELECT') AND a.attnum > 0 AND NOT a.attisdropped ORDER BY n.nspname, c.relname, a.attnum """introspect_postgresql(conn, schema = :public) = execute(conn, introspect_postgresql_sql, (String(schema),)) |> make_tablesAlternatively, we could generate the introspection query using FunSQL.
const pg_namespace = SQLTable(schema = :pg_catalog, name = :pg_namespace, columns = [:oid, :nspname])const pg_class = SQLTable(schema = :pg_catalog, name = :pg_class, columns = [:oid, :relname, :relnamespace, :relkind])const pg_attribute = SQLTable(schema = :pg_catalog, name = :pg_attribute, columns = [:attrelid, :attname, :attnum, :attisdropped])const IntrospectPostgreSQL = From(pg_class) |> Where(Fun.in(Get.relkind, "r", "v")) |> Where(Fun.has_table_privilege(Get.oid, "SELECT")) |> Join(From(pg_namespace) |> Where(Get.nspname .== Var.schema) |> As(:nsp), on = Get.relnamespace .== Get.nsp.oid) |> Join(From(pg_attribute) |> Where(Fun.and(Get.attnum .> 0, Fun.not(Get.attisdropped))) |> As(:att), on = Get.oid .== Get.att.attrelid) |> Order(Get.nsp.nspname, Get.relname, Get.att.attnum) |> Select(Get.nsp.nspname, Get.relname, Get.att.attname)const introspect_postgresql_sql = render(IntrospectPostgreSQL, dialect = :postgresql)The following code generatesSQLTable objects for a MySQL database. See the sectionDatabase introspection (SQLite) for the definition of themake_tables() function and instructions on how to bring the generatedSQLTable objects into the global scope.
const introspect_mysql_sql = """ SELECT table_schema AS `schema`, table_name AS `name`, column_name AS `column` FROM information_schema.columns WHERE table_schema = COALESCE(?, DATABASE()) ORDER BY table_schema, table_name, ordinal_position """introspect_mysql(conn, schema = nothing) = DBInterface.execute( DBInterface.prepare(conn, introspect_mysql_sql), (schema !== nothing ? String(schema) : missing,)) |> make_tablesAlternatively, we could generate the introspection query using FunSQL.
const information_schema_columns = SQLTable(schema = :information_schema, name = :columns, columns = [:table_schema, :table_name, :column_name, :ordinal_position])const IntrospectMySQL = From(information_schema_columns) |> Where(Get.table_schema .== Fun.coalesce(Var.schema, Fun.database())) |> Order(Get.table_schema, Get.table_name, Get.ordinal_position) |> Select(Get.table_schema, Get.table_name, Get.column_name)const introspect_mysql_sql = render(IntrospectMySQL, dialect = :mysql) |> StringThe following code generatesSQLTable objects for a Microsoft SQL Server database. See the sectionDatabase introspection (SQLite) for the definition of themake_tables() function and instructions on how to bring the generatedSQLTable objects into the global scope.
const introspect_sqlserver_sql = """ SELECT s.name AS [schema], o.name AS [name], c.name AS [column] FROM sys.schemas AS s JOIN sys.objects AS o ON (s.schema_id = o.schema_id) JOIN sys.columns AS c ON (o.object_id = c.object_id) WHERE s.name = ? AND o.type IN ('U', 'V') ORDER BY s.name, o.name, c.column_id """introspect_sqlserver(conn, schema = :dbo) = DBInterface.execute(conn, introspect_sqlserver_sql, (String(schema),)) |> make_tablesAlternatively, we could generate the introspection query using FunSQL.
const sys_schemas = SQLTable(schema = :sys, name = :schemas, columns = [:schema_id, :name])const sys_tables = SQLTable(schema = :sys, name = :tables, columns = [:schema_id, :object_id, :name, :type])const sys_columns = SQLTable(schema = :sys, name = :columns, columns = [:object_id, :column_id, :name])const IntrospectSQLServer = From(sys_tables) |> Where(Fun.in(Get.type, "U", "V")) |> Join(From(sys_schemas) |> Where(Get.name .== Var.schema) |> As(:schema), on = Get.schema_id .== Get.schema.schema_id) |> Join(From(sys_columns) |> As(:column), on = Get.object_id .== Get.column.object_id) |> Order(Get.schema.name, Get.name, Get.column.column_id) |> Select(:schema => Get.schema.name, Get.name, :column => Get.column.name)const introspect_sqlserver_sql = render(IntrospectSQLServer, dialect = :sqlserver)SeeDatabase introspection (PostgreSQL).
SELECT * FROM tableFunSQL does not require that a query object containsSelect, so a minimal FunSQL query consists of a singleFrom node.
Show all patient records.
q = From(person)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "person_1"."person_id", ⋮ "person_1"."ethnicity_source_concept_id"FROM "person" AS "person_1"=#res = DBInterface.execute(conn, sql)To display the output of a query, it is convenient to use theDataFrame interface.
using DataFramesDataFrame(res)#=>10×18 DataFrame Row │ person_id gender_concept_id year_of_birth month_of_birth day_of_bir ⋯ │ Int64 Int64 Int64 Int64 Int64 ⋯─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1780 8532 1940 2 ⋯ 2 │ 30091 8532 1932 8 3 │ 37455 8532 1913 7 4 │ 42383 8507 1922 2 5 │ 69985 8532 1956 7 ⋯ 6 │ 72120 8507 1937 10 7 │ 82328 8532 1957 9 8 │ 95538 8507 1923 11 9 │ 107680 8532 1963 12 ⋯ 10 │ 110862 8507 1911 4 14 columns omitted=#WHERE,ORDER,LIMITTabular operations such asWhere,Order, andLimit are available in FunSQL. Unlike SQL, FunSQL lets you apply them in any order.
Show the top 3 oldest male patients.
q = From(person) |> Where(Get.gender_concept_id .== 8507) |> Order(Get.year_of_birth) |> Limit(3)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "person_1"."person_id", ⋮ "person_1"."ethnicity_source_concept_id"FROM "person" AS "person_1"WHERE ("person_1"."gender_concept_id" = 8507)ORDER BY "person_1"."year_of_birth"LIMIT 3=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>3×18 DataFrame Row │ person_id gender_concept_id year_of_birth month_of_birth day_of_bir ⋯ │ Int64 Int64 Int64 Int64 Int64 ⋯─────┼────────────────────────────────────────────────────────────────────────── 1 │ 110862 8507 1911 4 ⋯ 2 │ 42383 8507 1922 2 3 │ 95538 8507 1923 11 14 columns omitted=#Show all males among the top 3 oldest patients.
q = From(person) |> Order(Get.year_of_birth) |> Limit(3) |> Where(Get.gender_concept_id .== 8507)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "person_2"."person_id", ⋮ "person_2"."ethnicity_source_concept_id"FROM ( SELECT "person_1"."person_id", ⋮ "person_1"."ethnicity_source_concept_id" FROM "person" AS "person_1" ORDER BY "person_1"."year_of_birth" LIMIT 3) AS "person_2"WHERE ("person_2"."gender_concept_id" = 8507)=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>2×18 DataFrame Row │ person_id gender_concept_id year_of_birth month_of_birth day_of_bir ⋯ │ Int64 Int64 Int64 Int64 Int64 ⋯─────┼────────────────────────────────────────────────────────────────────────── 1 │ 110862 8507 1911 4 ⋯ 2 │ 42383 8507 1922 2 14 columns omitted=#SELECT COUNT(*) FROM tableTo apply an aggregate function to the dataset as a whole, we use aGroup node without arguments.
Show the number of patient records.
q = From(person) |> Group() |> Select(Agg.count())sql = render(q, dialect = :sqlite)print(sql)#=>SELECT COUNT(*) AS "count"FROM "person" AS "person_1"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>1×1 DataFrame Row │ count │ Int64─────┼─────── 1 │ 10=#SELECT DISTINCTIf we use aGroup node, but do not apply any aggregate functions, FunSQL will render it as aSELECT DISTINCT clause.
Show all US states present in the location records.
q = From(location) |> Group(Get.state)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT DISTINCT "location_1"."state"FROM "location" AS "location_1"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>10×1 DataFrame Row │ state │ String─────┼──────── 1 │ MI 2 │ WA 3 │ FL 4 │ MD 5 │ NY 6 │ MS 7 │ CO 8 │ GA 9 │ MA 10 │ IL=#Either broadcasting or vector comprehension could be used to filter the list of output columns.
Filter out all "source" columns from patient records.
is_not_source_column(c::Symbol) = !contains(String(c), "source")q = From(person) |> Select(Get.(filter(is_not_source_column, person.columns))...)# q = From(person) |># Select(args = [Get(c) for c in person.columns if is_not_source_column(c)])display(q)#=>let person = SQLTable(:person, …), q1 = From(person), q2 = q1 |> Select(Get.person_id, Get.gender_concept_id, Get.year_of_birth, Get.month_of_birth, Get.day_of_birth, Get.time_of_birth, Get.race_concept_id, Get.ethnicity_concept_id, Get.location_id, Get.provider_id, Get.care_site_id) q2end=#sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "person_1"."person_id", "person_1"."gender_concept_id", "person_1"."year_of_birth", "person_1"."month_of_birth", "person_1"."day_of_birth", "person_1"."time_of_birth", "person_1"."race_concept_id", "person_1"."ethnicity_concept_id", "person_1"."location_id", "person_1"."provider_id", "person_1"."care_site_id"FROM "person" AS "person_1"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>10×11 DataFrame Row │ person_id gender_concept_id year_of_birth month_of_birth day_of_bir ⋯ │ Int64 Int64 Int64 Int64 Int64 ⋯─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1780 8532 1940 2 ⋯ 2 │ 30091 8532 1932 8 3 │ 37455 8532 1913 7 4 │ 42383 8507 1922 2 5 │ 69985 8532 1956 7 ⋯ 6 │ 72120 8507 1937 10 7 │ 82328 8532 1957 9 8 │ 95538 8507 1923 11 9 │ 107680 8532 1963 12 ⋯ 10 │ 110862 8507 1911 4 7 columns omitted=#JoinAs is often used to disambiguate the columns of the two input branches of theJoin node. By default, columns fenced byAs are not present in the output.
q = From(person) |> Join(From(visit_occurrence) |> As(:visit), on = Get.person_id .== Get.visit.person_id)print(render(q, dialect = :sqlite))#=>SELECT "person_1"."person_id", ⋮ "person_1"."ethnicity_source_concept_id"FROM "person" AS "person_1"JOIN "visit_occurrence" AS "visit_occurrence_1" ON ("person_1"."person_id" = "visit_occurrence_1"."person_id")=#q′ = From(person) |> As(:person) |> Join(From(visit_occurrence), on = Get.person.person_id .== Get.person_id)print(render(q′, dialect = :sqlite))#=>SELECT "visit_occurrence_1"."visit_occurrence_id", ⋮ "visit_occurrence_1"."visit_source_concept_id"FROM "person" AS "person_1"JOIN "visit_occurrence" AS "visit_occurrence_1" ON ("person_1"."person_id" = "visit_occurrence_1"."person_id")=#We could use aSelect node to output the columns of both branches, however we must ensure that all column names are unique.
q = q |> Select(Get.(person.columns)..., Get.(visit_occurrence.columns, over = Get.visit)...)#=>ERROR: FunSQL.DuplicateLabelError: person_id is used more than once in:⋮=#q = q |> Select(Get.(person.columns)..., Get.(filter(!in(person.columns), visit_occurrence.columns), over = Get.visit)...)print(render(q, dialect = :sqlite))#=>SELECT "person_1"."person_id", ⋮ "person_1"."ethnicity_source_concept_id", "visit_occurrence_1"."visit_occurrence_id", ⋮ "visit_occurrence_1"."visit_source_concept_id"FROM "person" AS "person_1"JOIN "visit_occurrence" AS "visit_occurrence_1" ON ("person_1"."person_id" = "visit_occurrence_1"."person_id")=#Show the number of patients diagnosed with myocardial infarction stratified by the age group at the time of diagnosis.
PersonAgeAt(date) = Fun.strftime("%Y", date) .- Get.year_of_birthAgeGroup(age) = Fun.case(Iterators.flatten([(age .< y, "$(y-5) - $(y-1)") for y = 5:5:100])..., "100 +")ConceptByName(name) = From(concept) |> Where(Fun.like(Get.concept_name, "%$(name)%"))MyocardialInfarctionConcept() = ConceptByName("myocardial infarction")MyocardialInfarctionOccurrence() = From(condition_occurrence) |> Join(:concept => MyocardialInfarctionConcept(), on = Get.condition_concept_id .== Get.concept.concept_id)q = From(person) |> Join(:condition => MyocardialInfarctionOccurrence(), on = Get.person_id .== Get.condition.person_id) |> Group(:age_group => AgeGroup(PersonAgeAt(Get.condition.condition_start_date))) |> Select(Get.age_group, Agg.count())sql = render(q, dialect = :sqlite)print(sql)#=>SELECT (CASE WHEN ((STRFTIME('%Y', "condition_1"."condition_start_date") - "person_1"."year_of_birth") < 5) THEN '0 - 4' … ELSE '100 +' END) AS "age_group", COUNT(*) AS "count"FROM "person" AS "person_1"JOIN ( SELECT "condition_occurrence_1"."person_id", "condition_occurrence_1"."condition_start_date" FROM "condition_occurrence" AS "condition_occurrence_1" JOIN ( SELECT "concept_1"."concept_id" FROM "concept" AS "concept_1" WHERE ("concept_1"."concept_name" LIKE '%myocardial infarction%') ) AS "concept_2" ON ("condition_occurrence_1"."condition_concept_id" = "concept_2"."concept_id")) AS "condition_1" ON ("person_1"."person_id" = "condition_1"."person_id")GROUP BY (CASE WHEN ((STRFTIME('%Y', "condition_1"."condition_start_date") - "person_1"."year_of_birth") < 5) THEN '0 - 4' … ELSE '100 +' END)=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>3×2 DataFrame Row │ age_group count │ String Int64─────┼────────────────── 1 │ 50 - 54 1 2 │ 65 - 69 1 3 │ 95 - 99 4=#It is often convenient to build a query incrementally, one component at a time. This allows us to validate individual components, inspect their output, and possibly reuse them in other queries. Note that FunSQL allows to encapsulate not just intermediate datasets, but also dataset operations such asFilterByGap().
Find all occurrences of myocardial infarction that was diagnosed during an inpatient visit. Filter out repeating occurrences by requiring a 180-day gap between consecutive events.
using DatesConceptByName(name) = From(concept) |> Where(Fun.like(Get.concept_name, "%$(name)%"))MyocardialInfarctionConcept() = ConceptByName("myocardial infarction")MyocardialInfarctionOccurrence() = From(condition_occurrence) |> Join(:concept => MyocardialInfarctionConcept(), on = Get.condition_concept_id .== Get.concept.concept_id)InpatientVisitConcept() = ConceptByName("inpatient")InpatientVisitOccurrence() = From(visit_occurrence) |> Join(:concept => InpatientVisitConcept(), on = Get.visit_concept_id .== Get.concept.concept_id)CorrelatedInpatientVisit(person_id, date) = InpatientVisitOccurrence() |> Where(Fun.and(Get.person_id .== Var.person_id, Fun.between(Var.date, Get.visit_start_date, Get.visit_end_date))) |> Bind(:person_id => person_id, :date => date)MyocardialInfarctionDuringInpatientVisit() = MyocardialInfarctionOccurrence() |> Where(Fun.exists(CorrelatedInpatientVisit(Get.person_id, Get.condition_start_date)))FilterByGap(date, gap) = Partition(Get.person_id, order_by = [date]) |> Define(:boundary => Agg.lag(Fun.date(date, gap))) |> Where(Fun.or(Fun."is null"(Get.boundary), Get.boundary .< date))FilteredMyocardialInfarctionDuringInpatientVisit() = MyocardialInfarctionDuringInpatientVisit() |> FilterByGap(Get.condition_start_date, Day(180))q = FilteredMyocardialInfarctionDuringInpatientVisit() |> Select(Get.person_id, Get.condition_start_date)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "condition_occurrence_2"."person_id", "condition_occurrence_2"."condition_start_date"FROM ( SELECT "condition_occurrence_1"."person_id", "condition_occurrence_1"."condition_start_date", (LAG(DATE("condition_occurrence_1"."condition_start_date", '180 days')) OVER (PARTITION BY "condition_occurrence_1"."person_id" ORDER BY "condition_occurrence_1"."condition_start_date")) AS "boundary" FROM "condition_occurrence" AS "condition_occurrence_1" JOIN ( SELECT "concept_1"."concept_id" FROM "concept" AS "concept_1" WHERE ("concept_1"."concept_name" LIKE '%myocardial infarction%') ) AS "concept_2" ON ("condition_occurrence_1"."condition_concept_id" = "concept_2"."concept_id") WHERE (EXISTS ( SELECT NULL FROM "visit_occurrence" AS "visit_occurrence_1" JOIN ( SELECT "concept_3"."concept_id" FROM "concept" AS "concept_3" WHERE ("concept_3"."concept_name" LIKE '%inpatient%') ) AS "concept_4" ON ("visit_occurrence_1"."visit_concept_id" = "concept_4"."concept_id") WHERE ("visit_occurrence_1"."person_id" = "condition_occurrence_1"."person_id") AND ("condition_occurrence_1"."condition_start_date" BETWEEN "visit_occurrence_1"."visit_start_date" AND "visit_occurrence_1"."visit_end_date") ))) AS "condition_occurrence_2"WHERE (("condition_occurrence_2"."boundary" IS NULL) OR ("condition_occurrence_2"."boundary" < "condition_occurrence_2"."condition_start_date"))=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>1×2 DataFrame Row │ person_id condition_start_date │ Int64 String─────┼───────────────────────────────── 1 │ 1780 2008-04-10=#Merging overlapping intervals into a single encompassing period could be done in three steps:
FunSQL lets us encapsulate and reuse this rather complex sequence of transformations.
Merge overlapping visits.
MergeOverlappingIntervals(start_date, end_date) = Partition(Get.person_id, order_by = [start_date], frame = (mode = :rows, start = -Inf, finish = -1)) |> Define(:new => Fun.case(start_date .<= Agg.max(end_date), 0, 1)) |> Partition(Get.person_id, order_by = [start_date, .- Get.new], frame = :rows) |> Define(:period => Agg.sum(Get.new)) |> Group(Get.person_id, Get.period) |> Define(:start_date => Agg.min(start_date), :end_date => Agg.max(end_date))q = From(visit_occurrence) |> MergeOverlappingIntervals(Get.visit_start_date, Get.visit_end_date) |> Select(Get.person_id, Get.start_date, Get.end_date)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "visit_occurrence_3"."person_id", MIN("visit_occurrence_3"."visit_start_date") AS "start_date", MAX("visit_occurrence_3"."visit_end_date") AS "end_date"FROM ( SELECT "visit_occurrence_2"."person_id", (SUM("visit_occurrence_2"."new") OVER (PARTITION BY "visit_occurrence_2"."person_id" ORDER BY "visit_occurrence_2"."visit_start_date", (- "visit_occurrence_2"."new") ROWS UNBOUNDED PRECEDING)) AS "period", "visit_occurrence_2"."visit_start_date", "visit_occurrence_2"."visit_end_date" FROM ( SELECT "visit_occurrence_1"."person_id", (CASE WHEN ("visit_occurrence_1"."visit_start_date" <= (MAX("visit_occurrence_1"."visit_end_date") OVER (PARTITION BY "visit_occurrence_1"."person_id" ORDER BY "visit_occurrence_1"."visit_start_date" ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING))) THEN 0 ELSE 1 END) AS "new", "visit_occurrence_1"."visit_start_date", "visit_occurrence_1"."visit_end_date" FROM "visit_occurrence" AS "visit_occurrence_1" ) AS "visit_occurrence_2") AS "visit_occurrence_3"GROUP BY "visit_occurrence_3"."person_id", "visit_occurrence_3"."period"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>25×3 DataFrame Row │ person_id start_date end_date │ Int64 String String─────┼─────────────────────────────────── 1 │ 1780 2008-04-09 2008-04-13 2 │ 1780 2008-11-22 2008-11-22 3 │ 1780 2009-05-22 2009-05-22 4 │ 30091 2008-11-12 2008-11-12 5 │ 30091 2009-07-30 2009-08-07 6 │ 37455 2008-03-18 2008-03-18 7 │ 37455 2008-10-30 2008-10-30 8 │ 37455 2010-08-12 2010-08-12 ⋮ │ ⋮ ⋮ ⋮ 19 │ 95538 2009-09-02 2009-09-02 20 │ 107680 2009-06-07 2009-06-07 21 │ 107680 2009-07-20 2009-07-30 22 │ 110862 2008-09-07 2008-09-16 23 │ 110862 2009-06-30 2009-06-30 24 │ 110862 2009-09-30 2009-10-01 25 │ 110862 2010-06-07 2010-06-07 10 rows omitted=#Derive a patient's observation periods by merging visits with less than one year gap between them.
MergeIntervalsByGap(start_date, end_date, gap) = MergeOverlappingIntervals(start_date, Fun.date(end_date, gap)) |> Define(:end_date => Fun.date(Get.end_date, -gap))q = From(visit_occurrence) |> MergeIntervalsByGap(Get.visit_start_date, Get.visit_end_date, Day(365)) |> Select(Get.person_id, Get.start_date, Get.end_date)sql = render(q, dialect = :sqlite)print(sql)#=>SELECT "visit_occurrence_3"."person_id", MIN("visit_occurrence_3"."visit_start_date") AS "start_date", DATE(MAX(DATE("visit_occurrence_3"."visit_end_date", '365 days')), '-365 days') AS "end_date"FROM ( SELECT "visit_occurrence_2"."person_id", (SUM("visit_occurrence_2"."new") OVER (PARTITION BY "visit_occurrence_2"."person_id" ORDER BY "visit_occurrence_2"."visit_start_date", (- "visit_occurrence_2"."new") ROWS UNBOUNDED PRECEDING)) AS "period", "visit_occurrence_2"."visit_start_date", "visit_occurrence_2"."visit_end_date" FROM ( SELECT "visit_occurrence_1"."person_id", (CASE WHEN ("visit_occurrence_1"."visit_start_date" <= (MAX(DATE("visit_occurrence_1"."visit_end_date", '365 days')) OVER (PARTITION BY "visit_occurrence_1"."person_id" ORDER BY "visit_occurrence_1"."visit_start_date" ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING))) THEN 0 ELSE 1 END) AS "new", "visit_occurrence_1"."visit_start_date", "visit_occurrence_1"."visit_end_date" FROM "visit_occurrence" AS "visit_occurrence_1" ) AS "visit_occurrence_2") AS "visit_occurrence_3"GROUP BY "visit_occurrence_3"."person_id", "visit_occurrence_3"."period"=#res = DBInterface.execute(conn, sql)DataFrame(res)#=>12×3 DataFrame Row │ person_id start_date end_date │ Int64 String String─────┼─────────────────────────────────── 1 │ 1780 2008-04-09 2009-05-22 2 │ 30091 2008-11-12 2009-08-07 3 │ 37455 2008-03-18 2008-10-30 4 │ 37455 2010-08-12 2010-08-12 5 │ 42383 2009-06-29 2010-04-15 6 │ 69985 2009-01-09 2009-01-09 7 │ 69985 2010-04-17 2010-07-30 8 │ 72120 2008-12-15 2008-12-15 9 │ 82328 2008-10-20 2009-01-25 10 │ 95538 2009-03-30 2009-09-02 11 │ 107680 2009-06-07 2009-07-30 12 │ 110862 2008-09-07 2010-06-07=#Settings
This document was generated withDocumenter.jl version 0.27.10 onFriday 14 January 2022. Using Julia version 1.6.0.