Movatterモバイル変換


[0]ホーム

URL:


Title:An Abstracted System for Easily Working with Databases withLarge Datasets
Version:2025.7.30
Maintainer:Richard Aubrey White <hello@rwhite.no>
Description:Provides object-oriented database management tools for working with large datasets across multiple database systems. Features include robust connection management for SQL Server and PostgreSQL databases, advanced table operations with bulk data loading and upsert functionality, comprehensive data validation through customizable field type and content validators, efficient index management, and cross-database compatibility. Designed for high-performance data operations in surveillance systems and large-scale data processing workflows.
Depends:R (≥ 4.1.0)
License:MIT + file LICENSE
URL:https://www.csids.no/csdb/,https://github.com/csids/csdb
BugReports:https://github.com/csids/csdb/issues
Encoding:UTF-8
LazyData:true
Imports:csutil, data.table, DBI, dplyr, fs, ggplot2, glue, methods,odbc, R6, S7, stringr, uuid
Suggests:testthat, knitr, rmarkdown, rstudioapi, digest, crayon
RoxygenNote:7.3.2
VignetteBuilder:knitr
NeedsCompilation:no
Packaged:2025-07-30 08:54:31 UTC; raw996
Author:Richard Aubrey WhiteORCID iD [aut, cre], August Sørli Mathisen [aut], CSIDS [cph]
Repository:CRAN
Date/Publication:2025-08-18 14:00:02 UTC

R6 Class representing a database connection

Description

A robust database connection manager that handles connections to various database systemsincluding Microsoft SQL Server and PostgreSQL. This class provides connection management,authentication, and automatic reconnection capabilities.

Details

The DBConnection_v9 class encapsulates database connection logic and provides a consistentinterface for connecting to different database systems. It supports both trusted connectionsand user/password authentication, handles connection failures gracefully, and providesautomatic reconnection functionality.

Key features:

Public fields

config

Configuration details of the database.

Active bindings

connection

Database connection.

autoconnection

Database connection that automatically connects if possible.

Methods

Public methods


Methodnew()

Create a new DBConnection_v9 object.

Usage
DBConnection_v9$new(  driver = NULL,  server = NULL,  port = NULL,  db = NULL,  schema = NULL,  user = NULL,  password = NULL,  trusted_connection = NULL,  sslmode = NULL,  role_create_table = NULL)
Arguments
driver

Driver

server

Server

port

Port

db

DB

schema

Schema (e.g. "dbo")

user

User

password

Password

trusted_connection

NULL or "yes"

sslmode

NULL or "require"

role_create_table

NULL or the role to take when creating tables.

Returns

A new 'DBConnection_v9' object.


Methodis_connected()

Is the DB schema connected?

Usage
DBConnection_v9$is_connected()
Returns

TRUE/FALSE


Methodprint()

Class-specific print function.

Usage
DBConnection_v9$print(...)
Arguments
...

Not used.


Methodconnect()

Connect to the database

Usage
DBConnection_v9$connect(attempts = 2)
Arguments
attempts

Number of attempts to be made to try to connect


Methoddisconnect()

Disconnect from the database

Usage
DBConnection_v9$disconnect()

Methodclone()

The objects of this class are cloneable with this method.

Usage
DBConnection_v9$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

## Not run: # Create a SQL Server connectiondb_config <- DBConnection_v9$new(  driver = "ODBC Driver 17 for SQL Server",  server = "localhost",  port = 1433,  db = "mydb",  user = "myuser",  password = "mypass")# Connect to the databasedb_config$connect()# Check connection statusdb_config$is_connected()# Use the connectiontables <- DBI::dbListTables(db_config$connection)# Disconnect when donedb_config$disconnect()# PostgreSQL examplepg_config <- DBConnection_v9$new(  driver = "PostgreSQL",  server = "localhost",  port = 5432,  db = "mydb",  user = "myuser",  password = "mypass")pg_config$connect()# ... use connection ...pg_config$disconnect()## End(Not run)

R6 Class representing a database table with advanced data management capabilities

Description

A comprehensive database table management class that provides high-level operationsfor data manipulation, schema validation, and table administration. This classcombines database connectivity with data validation and efficient bulk operations.

Details

The DBTable_v9 class is a sophisticated database table abstraction that provides:

Core functionality:

Advanced features:

Data validation:The class supports custom validation functions for both field types and data contents,ensuring data integrity and schema compliance.

Public fields

dbconnection

Database connection.

dbconfig

Configuration details of the database.

table_name

Name of the table in the database.

table_name_short_for_mssql_fully_specified_for_postgres

Fully specified name of the table in the database (e.g. \[db\].\[dbo\].\[table_name\]).

table_name_short_for_mssql_fully_specified_for_postgres_text

Fully specified name of the table in the database (e.g. \[db\].\[dbo\].\[table_name\]).

table_name_fully_specified

Fully specified name of the table in the database (e.g. \[db\].\[dbo\].\[table_name\]).

table_name_fully_specified_text

Fully specified name of the table in the database (e.g. \[db\].\[dbo\].\[table_name\]) as a text string.

field_types

The types of each column in the database table (INTEGER, DOUBLE, TEXT, BOOLEAN, DATE, DATETIME).

field_types_with_length

The same asfield_types but with(100) added to the end of all TEXT fields.

keys

The combination of variables that uniquely identify each row in the database.

keys_with_length

The same askeys but with(100) added to the end of all TEXT fields.

indexes

A named list of vectors (generally "ind1", "ind2", etc.) that improves the speed of data retrieval operations on a database table.

validator_field_contents

A function that validates the data before it is inserted into the database.

load_folder

A temporary folder that is used to write data to before inserting into the database.

censors

A named list of censors.

Methods

Public methods


Methodnew()

Create a new DBTable_v9 object.

Usage
DBTable_v9$new(  dbconfig,  table_name,  field_types,  keys,  indexes = NULL,  validator_field_types = validator_field_types_blank,  validator_field_contents = validator_field_contents_blank)
Arguments
dbconfig

Configuration details of the database (driver, server, port, db, schema, user, password, trusted_connection, sslmode, role_create_table).

table_name

Name of the table in the database.

field_types

The types of each column in the database table (INTEGER, DOUBLE, TEXT, BOOLEAN, DATE, DATETIME).

keys

The combination of these variables uniquely identifies each row of data in the table.

indexes

A named list of vectors (generally "ind1", "ind2", etc.) that improves the speed of data retrieval operations on a database table.

validator_field_types

A function that validates thefield_types before the DB schema is created.

validator_field_contents

A function that validates the data before it is inserted into the database.

Returns

A new 'DBTable_v9' object.


Methodprint()

Class-specific print function.

Usage
DBTable_v9$print(...)
Arguments
...

Not in use.


Methodconnect()

Connect from the database

Usage
DBTable_v9$connect()

Methoddisconnect()

Disconnect from the database

Usage
DBTable_v9$disconnect()

Methodtable_exists()

Does the table exist

Usage
DBTable_v9$table_exists()

Methodcreate_table()

Create the database table

Usage
DBTable_v9$create_table()

Methodremove_table()

Drop the database table

Usage
DBTable_v9$remove_table()

Methodinsert_data()

Inserts data

Usage
DBTable_v9$insert_data(  newdata,  confirm_insert_via_nrow = FALSE,  verbose = TRUE)
Arguments
newdata

The data to insert.

confirm_insert_via_nrow

Checks nrow() before insert and after insert. If nrow() has not increased sufficiently, then attempt an upsert.

verbose

Boolean.Inserts data into the database table


Methodupsert_data()

Upserts data into the database table

Usage
DBTable_v9$upsert_data(  newdata,  drop_indexes = names(self$indexes),  verbose = TRUE)
Arguments
newdata

The data to insert.

drop_indexes

A vector containing the indexes to be dropped before upserting (can increase performance).

verbose

Boolean.


Methoddrop_all_rows()

Drops all rows in the database table

Usage
DBTable_v9$drop_all_rows()

Methoddrop_rows_where()

Drops rows in the database table according to the SQL condition.

Usage
DBTable_v9$drop_rows_where(condition)
Arguments
condition

SQL text condition.


Methodkeep_rows_where()

Keeps rows in the database table according to the SQL condition.

Usage
DBTable_v9$keep_rows_where(condition)
Arguments
condition

SQL text condition.


Methoddrop_all_rows_and_then_upsert_data()

Drops all rows in the database table and then upserts data.

Usage
DBTable_v9$drop_all_rows_and_then_upsert_data(  newdata,  drop_indexes = names(self$indexes),  verbose = TRUE)
Arguments
newdata

The data to insert.

drop_indexes

A vector containing the indexes to be dropped before upserting (can increase performance).

verbose

Boolean.


Methoddrop_all_rows_and_then_insert_data()

Drops all rows in the database table and then inserts data.

Usage
DBTable_v9$drop_all_rows_and_then_insert_data(  newdata,  confirm_insert_via_nrow = FALSE,  verbose = TRUE)
Arguments
newdata

The data to insert.

confirm_insert_via_nrow

Checks nrow() before insert and after insert. If nrow() has not increased sufficiently, then attempt an upsert.

verbose

Boolean.


Methodtbl()

Provides access to the database table via dplyr::tbl.

Usage
DBTable_v9$tbl()

Methodprint_dplyr_select()

Prints a template dplyr::select call that you can easily copy/paste for all your variables.

Usage
DBTable_v9$print_dplyr_select()

Methodadd_indexes()

Adds indexes to the database table from 'self$indexes'

Usage
DBTable_v9$add_indexes()

Methoddrop_indexes()

Drops all indees from the database table

Usage
DBTable_v9$drop_indexes()

Methodconfirm_indexes()

Confirms that the names and number of indexes in the database are the same as in the R code.Does not confirm the contents of the indexes!

Usage
DBTable_v9$confirm_indexes()

Methodnrow()

Gets the number of rows in the database table

Usage
DBTable_v9$nrow(use_count = FALSE)
Arguments
use_count

If true, then uses the count command, which is slow but accurate. If false, then uses summary statistics, which is fast but inaccurate.


Methodinfo()

Gets the information about the database table

Usage
DBTable_v9$info()

Methodclone()

The objects of this class are cloneable with this method.

Usage
DBTable_v9$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

## Not run: # Create database connectiondb_config <- list(  driver = "ODBC Driver 17 for SQL Server",  server = "localhost",  db = "mydb",  user = "myuser",  password = "mypass")# Define table schemafield_types <- c(  "id" = "INTEGER",  "name" = "TEXT",  "value" = "DOUBLE",  "date_created" = "DATE")# Create table objectmy_table <- DBTable_v9$new(  dbconfig = db_config,  table_name = "my_data_table",  field_types = field_types,  keys = c("id"),  validator_field_types = validator_field_types_blank,  validator_field_contents = validator_field_contents_blank)# Create table in databasemy_table$create_table()# Insert datasample_data <- data.frame(  id = 1:3,  name = c("Alice", "Bob", "Charlie"),  value = c(10.5, 20.3, 15.7),  date_created = as.Date("2023-01-01"))my_table$insert_data(sample_data)# Query data using dplyrresult <- my_table$tbl() |>  dplyr::filter(value > 15) |>  dplyr::collect()# Add indexes for performancemy_table$add_indexes(c("name", "date_created"))# Upsert (insert or update) datanew_data <- data.frame(  id = 2:4,  name = c("Bob_Updated", "Charlie", "David"),  value = c(25.0, 15.7, 30.2),  date_created = as.Date("2023-01-02"))my_table$upsert_data(new_data)## End(Not run)

Get table names, number of rows, and size information

Description

Retrieves comprehensive information about database tables including their names,row counts, and storage size metrics. This function provides database-specificimplementations for different database systems.

Usage

get_table_names_and_info(connection)

Arguments

connection

A database connection object (e.g., fromdbConnect)

Value

A data.table containing table information with columns:

table_name

Character. Name of the table

nrow

Numeric. Number of rows in the table

size_total_gb

Numeric. Total size of the table in gigabytes

size_data_gb

Numeric. Size of data in gigabytes

size_index_gb

Numeric. Size of indexes in gigabytes

Examples

## Not run: # Microsoft SQL Server examplecon <- DBI::dbConnect(odbc::odbc(),                       driver = "ODBC Driver 17 for SQL Server",                      server = "localhost",                       database = "mydb")table_info <- get_table_names_and_info(con)print(table_info)DBI::dbDisconnect(con)# PostgreSQL example  con <- DBI::dbConnect(RPostgres::Postgres(),                      host = "localhost",                      dbname = "mydb",                      user = "user")table_info <- get_table_names_and_info(con)print(table_info)DBI::dbDisconnect(con)## End(Not run)

Covid-19 data for PCR-confirmed cases in Norway (nation and county)

Description

This data comes from the Norwegian Surveillance System for Communicable Diseases (MSIS).The date corresponds to when the PCR-test was taken.

Usage

nor_covid19_cases_by_time_location

Format

A csfmt_rts_data_v1 with 11028 rows and 18 variables:

granularity_time

day/isoweek

granularity_geo

nation, county

country_iso3

nor

location_code

norge, 11 counties

border

2020

age

total

isoyear

Isoyear of event

isoweek

Isoweek of event

isoyearweek

Isoyearweek of event

season

Season of event

seasonweek

Seasonweek of event

calyear

Calyear of event

calmonth

Calmonth of event

calyearmonth

Calyearmonth of event

date

Date of event

covid19_cases_testdate_n

Number of confirmed covid19 cases

covid19_cases_testdate_pr100000

Number of confirmed covid19 cases per 100.000 population

Details

The raw number of cases and cases per 100.000 population are recorded.

This data was extracted on 2022-05-04.

Source

https://github.com/folkehelseinstituttet/surveillance_data/blob/master/covid19/_DOCUMENTATION_data_covid19_msis_by_time_location.txt


Blank data contents validator

Description

A pass-through validator that accepts any data without validation.This is useful as a placeholder when no specific data content validation is needed.

Usage

validator_field_contents_blank(data)

Arguments

data

A data.frame or data.table containing the data to validate

Value

Always returns TRUE

Examples

# This validator always returns TRUE regardless of inputtest_data <- data.frame(id = 1:3, name = c("A", "B", "C"), value = c(10, 20, 30))validator_field_contents_blank(test_data)# Works with any data structureempty_data <- data.frame()validator_field_contents_blank(empty_data)

Field contents validator for csfmt_rts_data_v1 schema

Description

Validates that data contents conform to the csfmt_rts_data_v1 schema specification.This validator checks that granularity_time and granularity_geo fields containvalid values according to the surveillance data format requirements.

Usage

validator_field_contents_csfmt_rts_data_v1(data)

Arguments

data

A data.frame or data.table containing the data to validate

Value

TRUE if data is valid for csfmt_rts_data_v1, FALSE otherwise (with error attribute)

Examples

# Valid data for csfmt_rts_data_v1valid_data <- data.frame(  granularity_time = c("date", "isoyearweek", "total"),  granularity_geo = c("nation", "county", "municip"),  stringsAsFactors = FALSE)validator_field_contents_csfmt_rts_data_v1(valid_data)# Invalid data (wrong granularity_geo value)invalid_data <- data.frame(  granularity_time = "date",  granularity_geo = "invalid_geo",  stringsAsFactors = FALSE)validator_field_contents_csfmt_rts_data_v1(invalid_data)

Field contents validator for csfmt_rts_data_v2 schema

Description

Validates that data contents conform to the csfmt_rts_data_v2 schema specification.This validator checks that granularity_time and granularity_geo fields containvalid values according to the surveillance data format requirements for version 2.

Usage

validator_field_contents_csfmt_rts_data_v2(data)

Arguments

data

A data.frame or data.table containing the data to validate

Value

TRUE if data is valid for csfmt_rts_data_v2, FALSE otherwise (with error attribute)

Examples

# Valid data for csfmt_rts_data_v2valid_data_v2 <- data.frame(  granularity_time = c("date", "isoyearweek", "total"),  granularity_geo = c("nation", "county", "municip"),  stringsAsFactors = FALSE)validator_field_contents_csfmt_rts_data_v2(valid_data_v2)

Blank field types validator

Description

A pass-through validator that accepts any field types without validation.This is useful as a placeholder when no specific field type validation is needed.

Usage

validator_field_types_blank(db_field_types)

Arguments

db_field_types

A named character vector of database field types

Value

Always returns TRUE

Examples

# This validator always returns TRUE regardless of inputfield_types <- c("id" = "INTEGER", "name" = "TEXT", "date" = "DATE")validator_field_types_blank(field_types)# Works with any field typesother_types <- c("value" = "DOUBLE", "status" = "BOOLEAN")validator_field_types_blank(other_types)

Field types validator for csfmt_rts_data_v1 schema

Description

Validates that field types conform to the csfmt_rts_data_v1 schema specification.This validator ensures that the first 16 fields match the expected structurefor real-time surveillance data format version 1.

Usage

validator_field_types_csfmt_rts_data_v1(db_field_types)

Arguments

db_field_types

A named character vector of database field types

Value

TRUE if field types are valid for csfmt_rts_data_v1, FALSE otherwise

Examples

# Valid field types for csfmt_rts_data_v1valid_fields <- c(  "granularity_time" = "TEXT",  "granularity_geo" = "TEXT",   "country_iso3" = "TEXT",  "location_code" = "TEXT",  "border" = "INTEGER",  "age" = "TEXT",  "sex" = "TEXT",  "isoyear" = "INTEGER",  "isoweek" = "INTEGER",  "isoyearweek" = "TEXT",  "season" = "TEXT",  "seasonweek" = "DOUBLE",  "calyear" = "INTEGER",  "calmonth" = "INTEGER",  "calyearmonth" = "TEXT",  "date" = "DATE",  "cases_n" = "INTEGER")validator_field_types_csfmt_rts_data_v1(valid_fields)# Invalid field types (wrong structure)invalid_fields <- c("id" = "INTEGER", "name" = "TEXT")validator_field_types_csfmt_rts_data_v1(invalid_fields)

Field types validator for csfmt_rts_data_v2 schema

Description

Validates that field types conform to the csfmt_rts_data_v2 schema specification.This validator ensures that the first 18 fields match the expected structurefor real-time surveillance data format version 2.

Usage

validator_field_types_csfmt_rts_data_v2(db_field_types)

Arguments

db_field_types

A named character vector of database field types

Value

TRUE if field types are valid for csfmt_rts_data_v2, FALSE otherwise

Examples

# Valid field types for csfmt_rts_data_v2 (includes additional fields)valid_fields_v2 <- c(  "granularity_time" = "TEXT",  "granularity_geo" = "TEXT",   "country_iso3" = "TEXT",  "location_code" = "TEXT",  "border" = "INTEGER",  "age" = "TEXT",  "sex" = "TEXT",  "isoyear" = "INTEGER",  "isoweek" = "INTEGER",  "isoyearweek" = "TEXT",  "season" = "TEXT",  "seasonweek" = "DOUBLE",  "calyear" = "INTEGER",  "calmonth" = "INTEGER",  "calyearmonth" = "TEXT",  "date" = "DATE",  "tag_outcome" = "TEXT",  "tag_type" = "TEXT",  "cases_n" = "INTEGER")validator_field_types_csfmt_rts_data_v2(valid_fields_v2)

[8]ページ先頭

©2009-2025 Movatter.jp