Movatterモバイル変換


[0]ホーム

URL:


Getting Started with unexcel

Hercules Freitas

2025-10-04

Introduction

Spreadsheets such as Microsoft Excel and LibreOffice Calc oftenauto-convert entries like 30.3 into dates. When imported into R, thesevalues appear as Excel date serials (integers) instead of the intendednumeric values.

For example: - In Excel, 30.3 might be stored as 45812, whichrepresents 2025-03-30 in the 1900 date system. - What the useroriginally wanted was simply 30.3.

The unexcel package provides tools to safely detect these cases andreconstruct the originally intended day.month numerics. Non-serialvalues are preserved, and both the 1900 and 1904 date systems aresupported.

Excel date systems

Excel represents dates as the number of days since an origin date. -1900 system: Day 1 is 1900-01-01. Excel mistakenly treats 1900 as a leapyear. To compensate, R and most packages use “1899-12-30” as the origin,so calculations align with Excel’s behavior. - 1904 system: Day 0 is1904-01-01, historically used by older Mac versions of Excel.

Example:

as.Date(45812,origin ="1899-12-30")
## [1] "2025-06-04"
#> "2025-03-30"

Here, 45812 is the serial for March 30, 2025.

The 1900 leap-year bug

Excel was originally designed to be compatible with Lotus 1-2-3,which treated 1900 as a leap year (incorrectly). This means Excelincludes the non-existent date 1900-02-29 in its calendar.

To correct for this, R uses an origin of “1899-12-30” rather than“1900-01-01”. This offset ensures that serial numbers map correctly tothe dates displayed inside Excel.

Reconstructing day.month numerics

The function restore_day_month() converts Excel serials back intonumerics of the form day.month.

library(unexcel)# Mixed vector: two serials + one real numberx<-c(45812,12.5,44730)restore_day_month(x,origin_mode ="1900")
## [1]  4.6 12.5 18.6
#> [1] 30.3 12.5 15.6

Explanation: - 45812 (2025-03-30) → 30.3 - 44730 (2022-06-15) → 15.6- 12.5 (not a serial) → preserved as 12.5

Guardrails against false conversions

To avoid changing ordinary numbers, unexcel applies guardrails: -Only integer-like values are considered potential serials. - Values mustfall in a plausible range (20000–65000, roughly 1954–2078). - Theconverted year must lie inside a configurable year_window (default:1990–2035).

Anything outside these conditions is left unchanged.

Difference from formatted strings

By design, unexcel reconstructs numeric day.month values, notcharacter strings: - Numeric output: 30.3 (easily compared or plotted).- Character output: “30/03” (typical formatted date).

If you prefer a string, you can use R’s base functions:

format(as.Date(45812,origin ="1899-12-30"),"%d/%m")
## [1] "04/06"
#> "30/03"

But the goal of unexcel is to give you back the numeric decimal youoriginally typed in Excel.

Working with data frames

The helper fix_serial_columns() scans an entire data frame andcorrects only the columns that appear dominated by Excel serials.

df<-data.frame(a =c(45812,44730,45900),b =c(1.2,3.4,5.6))fix_serial_columns(df,origin_mode ="1900")
##      a   b## 1  4.6 1.2## 2 18.6 3.4## 3 31.8 5.6
#>      a   b#> 1 30.3 1.2#> 2 15.6 3.4#> 3 19.4 5.6

Column a was fixed; column b was left untouched.

Summary

Session information

sessionInfo()
## R version 4.5.1 (2025-06-13)## Platform: aarch64-apple-darwin20## Running under: macOS Tahoe 26.0.1## ## Matrix products: default## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1## ## locale:## [1] C/C.UTF-8/C.UTF-8/C/C.UTF-8/C.UTF-8## ## time zone: America/Sao_Paulo## tzcode source: internal## ## attached base packages:## [1] stats     graphics  grDevices utils     datasets  methods   base     ## ## other attached packages:## [1] unexcel_0.1.0## ## loaded via a namespace (and not attached):##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.52        ##  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   ##  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  ## [13] compiler_4.5.1    tools_4.5.1       evaluate_1.0.4    bslib_0.9.0      ## [17] yaml_2.3.10       rlang_1.1.6       jsonlite_2.0.0

[8]ページ先頭

©2009-2025 Movatter.jp