Read Data

The following line of code will read in your data when using thepackage and interprets which reading function to use based on the fileextension.

spectra<-read_any("path/to/your/data")

Open Specy allows for upload of native Open Specy .csv, .y(a)ml,.json, or .rds files. Open Specy and .csv files should always loadcorrectly but the other file types are still in development, though mostof the time these files work perfectly. In addition, .csv, .asp, .jdx,.0, .spa, .spc, and .zip files can be imported. .zip files can eithercontain multiple files with individual spectra in them of the non-zipformats or it can contain a .hdr and .dat file that form an ENVI filefor a spectral map. If uploading a .csv file, it is ideal to label thecolumn with the wavenumberswavenumber and name the columnwith the intensitiesintensity. Columns besides wavenumberwill be interpreted as unique spectra. If any columns are numbers, thecsv will be interpreted in wide format with the number columns thewavenumbers and rows containing the unique spectral intensities andmetadata contained in non number columns. Wavenumber units should becm^-1 or useadj_wave() to correct fromwavelength. Always keep a copy of the original file before alteration topreserve metadata and raw data for your records.

It is best practice to cross check files in the proprietary softwarethey came from and Open Specy before use in Open Specy. Due to thecomplexity of some proprietary file types, we haven’t been able to makethem fully compatible yet. If your file is not working, please contactthe administrator and share the file so that we can work on integratingit.

The specific steps to converting your instrument’s native files to.csv can be found in its software manual or you can check outSpectragryph, whichsupports many spectral file conversions. For instructions, seeSpectragryphTutorial. Unfortunately the maintainer of Spectragryph passed awayand it is unclear how much longer this will be supported.

If you don’t have your own data, you can use a test dataset.

data("raman_hdpe")

We also have many onboard files that you can call to test differentformats:

spectral_map<-read_extdata("CA_tiny_map.zip")|>read_any()# preserves some metadataasp_example<-read_extdata("ftir_ldpe_soil.asp")|>read_any()ps_example<-read_extdata("ftir_ps.0")|>read_any()# preserves some metadatacsv_example<-read_extdata("raman_hdpe.csv")|>read_any()json_example<-read_extdata("raman_hdpe.json")|>read_any()# read in exactly as an OpenSpecy object

You will notice now that the R package reads in files into an objectwith classOpenSpecy. This is a class we created for highthroughput spectral analysis which now also preserves spectral metadata.You can even create these from scratch if you’d like.

scratch_OpenSpecy<-as_OpenSpecy(x =seq(1000,2000,by =5),spectra =data.frame(runif(n =201)),metadata =list(file_name ="fake_noise"))

Open Specy objects are lists with three components,wavenumber is a vector of the wavenumber values for thespectra and corresponds to the rows inspectra which is adata.table where each column is a set of spectralintensities.metadata is a data.table which holdsadditional information about the spectra. Each row inmetadata corresponds to a column inspectra.

# Access the wavenumbersscratch_OpenSpecy$wavenumber#>   [1] 1000 1005 1010 1015 1020 1025 1030 1035 1040 1045 1050 1055 1060 1065 1070#>  [16] 1075 1080 1085 1090 1095 1100 1105 1110 1115 1120 1125 1130 1135 1140 1145#>  [31] 1150 1155 1160 1165 1170 1175 1180 1185 1190 1195 1200 1205 1210 1215 1220#>  [46] 1225 1230 1235 1240 1245 1250 1255 1260 1265 1270 1275 1280 1285 1290 1295#>  [61] 1300 1305 1310 1315 1320 1325 1330 1335 1340 1345 1350 1355 1360 1365 1370#>  [76] 1375 1380 1385 1390 1395 1400 1405 1410 1415 1420 1425 1430 1435 1440 1445#>  [91] 1450 1455 1460 1465 1470 1475 1480 1485 1490 1495 1500 1505 1510 1515 1520#> [106] 1525 1530 1535 1540 1545 1550 1555 1560 1565 1570 1575 1580 1585 1590 1595#> [121] 1600 1605 1610 1615 1620 1625 1630 1635 1640 1645 1650 1655 1660 1665 1670#> [136] 1675 1680 1685 1690 1695 1700 1705 1710 1715 1720 1725 1730 1735 1740 1745#> [151] 1750 1755 1760 1765 1770 1775 1780 1785 1790 1795 1800 1805 1810 1815 1820#> [166] 1825 1830 1835 1840 1845 1850 1855 1860 1865 1870 1875 1880 1885 1890 1895#> [181] 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970#> [196] 1975 1980 1985 1990 1995 2000

# Access the spectrascratch_OpenSpecy$spectra#>      runif.n...201.#>               <num>#>   1:      0.9160465#>   2:      0.4805940#>   3:      0.7589889#>   4:      0.6137815#>   5:      0.1459633#>  ---#> 197:      0.6221519#> 198:      0.9713817#> 199:      0.8041799#> 200:      0.4148931#> 201:      0.5145928

# Access the metadatascratch_OpenSpecy$metadata#>        x     y  file_name         col_id                          file_id#>    <int> <int>     <char>         <char>                           <char>#> 1:     1     1 fake_noise runif.n...201. 92d0fd48034fa33597459e3d58d1d5fb

# Performs checks to ensure that OpenSpecy objects are adhering to our standards;# returns TRUE if it passes.check_OpenSpecy(scratch_OpenSpecy)#> [1] TRUE# Checks only the object type to make sure it has OpenSpecy typeis_OpenSpecy(scratch_OpenSpecy)#> [1] TRUE

We have some generic functions built for inspecting the spectra:

print(scratch_OpenSpecy)# shows the raw object#>      wavenumber runif.n...201.#>           <num>          <num>#>   1:       1000      0.9160465#>   2:       1005      0.4805940#>   3:       1010      0.7589889#>   4:       1015      0.6137815#>   5:       1020      0.1459633#>  ---#> 197:       1980      0.6221519#> 198:       1985      0.9713817#> 199:       1990      0.8041799#> 200:       1995      0.4148931#> 201:       2000      0.5145928#>#> $metadata#>        x     y  file_name         col_id                          file_id#>    <int> <int>     <char>         <char>                           <char>#> 1:     1     1 fake_noise runif.n...201. 92d0fd48034fa33597459e3d58d1d5fb

summary(scratch_OpenSpecy)# summarizes the contents of the spectra#> $wavenumber#>  Length Min. Max.     Res.#>     201 1000 2000 4.975124#>#> $spectra#>  Number Min. Intensity Max. Intensity#>       1     0.01498346      0.9973631#>#> $metadata#>   Min. Max.#> x    1    1#> y    1    1#> [1] "x"         "y"         "file_name" "col_id"    "file_id"

head(scratch_OpenSpecy)# shows the top wavenumbers and intensities#>    wavenumber runif.n...201.#>         <num>          <num>#> 1:       1000      0.9160465#> 2:       1005      0.4805940#> 3:       1010      0.7589889#> 4:       1015      0.6137815#> 5:       1020      0.1459633#> 6:       1025      0.6050478

Processing

The goal of processing is to increase the signal to noise ratio (S/N)of the spectra and remove unwanted artifacts without distorting theshape, position, or relative size of the peaks. After loading data, youcan process the data using intensity adjustment, baseline subtraction,smoothing, flattening, and range selection. The default settings is anabsolute first derivative transformation. It is really powerful for manydata issues. It does something similar to smoothing, baselinesubtraction, and intensity correction simultaneously and reallyquickly.

Theprocess_spec() function is a monolithic function forall processing procedures which is optimized by default to result inhigh signal to noise in most cases, same as the app.

processed<-process_spec(raman_hdpe)

You can compare the processed and unprocessed data in an overlayplot.

plotly_spec(raman_hdpe, processed)

We want people to use theprocess_spec() function formost processing operations. All other processing functions can be tunedusing its parameters in the single function see?process_spec() for details. However, we recognize thatnesting of functions and order of operations can be useful for users tocontrol so you can also use individual functions for each operation ifyou’d like. See explanations of each processing sub-function below.

Threshold Signal and Noise

Considering whether you have enough signal to analyze spectra in thefirst place is important, because if you don’t have enough signal youshould recollect the spectrum. Classical spectroscopy would recommendyour highest peak to be at least 10 times greater than the baseline ofyour processed spectra before you begin analysis. Setting a thresholdcan assist in standardized differentiation between low and high qualityspectral. If your spectra is below that threshold even after processing,you may want to consider recollecting it. In practice, we are rarelyable to collect spectra of that good quality and more often use 4 as alower bound with anything below 2 being completely unusable. The“run_sig_over_noise”metric searches your spectra for highand low regions and conducts division on them to derive the signal tonoise ratio. In the example below you can see that our signal to noiseratio is increased by the processing, the goal of processing isgenerally to maximize the signal to noise ratio. If you know where yoursignal region and noise regions are, you can specify them withsig_min,sig_max,noise_min, andnoise_max.

# Automatic signal to noise ratio comparisonsig_noise(processed,metric ="run_sig_over_noise")>sig_noise(raman_hdpe,metric ="run_sig_over_noise")#Manual signal to noise ratio calculationsig_noise(processed,metric ="sig_over_noise",sig_min =2700,sig_max =3000,noise_min =1500,noise_max =2500)>sig_noise(raman_hdpe,metric ="sig_over_noise",sig_min =2700,sig_max =3000,noise_min =1500,noise_max =2500)

If analyzing spectra in batch, we recommend looking at the heatmapand optimizing the percent of spectra that are above your signal tonoise threshold to determine the correct settings instead of lookingthrough spectra individually. Setting themin_sn willthreshold the heatmap image to only color spectra which have asn value over the threshold.

#Remove CO2 regionspectral_map_p<- spectral_map|>process_spec(flatten_range = T)#Calculate signal times noisespectral_map_p$metadata$sig_noise<-sig_noise(spectral_map_p,metric ="run_sig_over_noise")#Plot resultheatmap_spec(spectral_map_p,sn = spectral_map_p$metadata$sig_noise,min_sn =5)

Intensity Adjustment

Most functions in Open Specy assume that intensity units are inabsorbance units and Open Specy can adjust reflectance or transmittancespectra to absorbance units. The transmittance adjustment uses the\(\log_{10} 1/T\) calculation which does notcorrect for system or particle characteristics. The reflectanceadjustment uses the Kubelka-Munk equation\(\frac{(1-R)^2}{2R}\).

This is the respective R code for a scenario where the spectradoesn’t need intensity adjustment:

trans_raman_hdpe<- raman_hdpetrans_raman_hdpe$spectra<-2- trans_raman_hdpe$spectra^2rev_trans_raman_hdpe<- trans_raman_hdpe|>adj_intens(type ="transmittance")plotly_spec(trans_raman_hdpe, rev_trans_raman_hdpe)

Conforming

Conforming spectra is essential before comparing to a referencelibrary and can be useful for summarizing data when you don’t need it tobe highly resolved spectrally. We set the default spectral resolution to5 because this tends to be pretty good for a lot of applications and isin between 4 and 8 which are commonly used wavenumber resolutions.

conform_spec(raman_hdpe,res =8)|># Convert res to 8 wavenumbers.summary()# Force one spectrum to have the exact same wavenumbers as anotherconform_spec(asp_example,range = ps_example$wavenumber,res =NULL)|>summary()

Smoothing

TheSavitzky-Golayfilter is used for smoothing. Higher polynomial numbers lead to morewiggly fits and thus less smoothing, lower numbers lead to more smoothfits. The SG filter is fit to a moving window of 11 data points bydefault where the center point in the window is replaced with thepolynomial estimate. Larger windows will produce smoother fits. Thederivative order is set to 1 by default which transforms the spectra totheir first derivative. A zero order derivative will have no derivativetransformation and only apply smoothing. When smoothing is done well,peak shapes and relative heights should not change. The absolute valueis primarily useful for first derivative spectra where the result lookssimilar to absorbance units and is easy to have intuition about.

Examples of smoothing:

none<-make_rel(raman_hdpe)p1<-smooth_intens(raman_hdpe,polynomial =1,derivative =0,abs = F)p4<-smooth_intens(raman_hdpe,polynomial =4,derivative =0,abs = F)c_spec(list(none, p1, p4))|>plot()

Sampleraman_hdpe spectrum withdifferent smoothing polynomials.

Derivative transformation can be done with the same function.

none<-make_rel(raman_hdpe)window<-calc_window_points(raman_hdpe,100)#Calculate the number of points needed for a 190 wavenumber window.d1<-smooth_intens(raman_hdpe,derivative =1,window = window,abs = T)d2<-smooth_intens(raman_hdpe,derivative =2,window = window,abs = T)c_spec(list(none, d1, d2))|>plot()

Sampleraman_hdpe spectrum withdifferent derivatives.

Baseline Correction

The goal of baseline correction is to get all non-peak regions of thespectra to zero absorbance. The higher the polynomial order, the morewiggly the fit to the baseline. If the baseline is not very wiggly, amore wiggly fit could remove peaks which is not desired. The baselinecorrection algorithm used in Open Specy is called “iModPolyfit” (Zhao etal. 2007). This algorithm iteratively fits polynomial equations of thespecified order to the whole spectrum. During the first fit iteration,peak regions will often be above the baseline fit. The data in the peakregion is removed from the fit to make sure that the baseline is lesslikely to fit to the peaks. The iterative fitting terminates once thedifference between the new and previous fit is small. An example of agood baseline fit below. Manual baseline correction can also bespecified by providing a baselineOpenSpecy object. Thereare many fine tuning options that can be chosen, see?subtr_baseline() for more details.

alternative_baseline<-smooth_intens(raman_hdpe,polynomial =1,window =51,derivative =0,abs = F,make_rel = F)|>flatten_range(min =2700,max =3200,make_rel = F)#Manual baseline with heavily smoothed spectranone<-make_rel(raman_hdpe)#rawd<-subtr_baseline(raman_hdpe,type ="manual",baseline = alternative_baseline)#manual subtractiond8<-subtr_baseline(raman_hdpe,degree =8)#standard imodpolyfitdr<-subtr_baseline(raman_hdpe,refit_at_end = T)#optionally retain baseline noise with refittingc_spec(list(none, d, d8, dr))|>plot(offset =0.25)

Sampleraman_hdpe spectrum withdifferent degrees of background subtraction (Cowger et al., 2020).

Range Selection

Sometimes an instrument operates with high noise at the ends of thespectrum and, a baseline fit produces distortions, or there are regionsof interest for analysis. Range selection accomplishes those goals. Manyof these issues can be resolved during spectral collection by specifyinga wavenumber range which is well characterized by the instrument.Multiple ranges can be specified simultaneously.

none<-make_rel(raman_hdpe)#Specify one ranger1<-restrict_range(raman_hdpe,min =1000,max =2000)|>conform_spec(range = none$wavenumber,res =NULL,allow_na = T)#Specify multiple rangesr2<-restrict_range(raman_hdpe,min =c(1000,1800),max =c(1200,2000))|>conform_spec(range = none$wavenumber,res =NULL,allow_na = T)compare_ranges<-c_spec(list(none, r1, r2),range ="common")# Common argument crops the ranges to the most common range between the spectra# when joining.plot(compare_ranges)

Sampleraman_hdpe spectrum withdifferent degrees of range restriction.

Flattening Ranges

Sometimes there are peaks that really shouldn’t be in your spectraand can distort your interpretation of the spectra but you don’tnecessarily want to remove the regions from the analysis because youbelieve those regions should exist and be flat instead of having a peak.One way to deal with this is to replace the peak values with the mean ofthe values around the peak. This is the purpose of theflatten_range function. By default it is set to flatten theCO2 region for FTIR spectra because that region often needs to beflattened when atmospheric artifacts occur in spectra. Likerestrict_range, the R function can accept multipleranges.

single<-filter_spec(spectral_map,120)# Function to filter spectra by index# number or name or a logical vector.none<-make_rel(single)f1<-flatten_range(single)#default flattening the CO2 region.f2<-flatten_range(single,min =c(1000,2500),max =c(1200,3000))#multple range examplecompare_flats<-c_spec(list(none, f1, f2),range ="common")plot(compare_flats,offset =0.25)

Sampleraman_hdpe spectrum withdifferent degrees of background subtraction (Cowger et al., 2020).

Min-Max Normalization

Often we regard spectral intensities as arbitrary and min-maxnormalization allows us to view spectra on the same scale withoutdrastically distorting their shapes or relative peak intensities. In thepackage, most of the processing functions will min-max transform yourspectra by default if you do not specify otherwise withmake_rel = FALSE.

raman_hdpe|>plot()make_rel(raman_hdpe)|>plot()

Identifying Spectra

Reading Libraries

Reference libraries are spectra with known identities. The Open Specylibrary now has over 30,000 spectra in it and is getting so large thatwe cannot fit it within the R package size limit of 5 MB. We host thereference libraries onOSF and havea function to pull the libraries down automatically. Running get_lib byitself will download all libraries to your package directory or you canspecify which libraries you want and where you want them.

get_lib(type ="derivative")

After download you can load the libraries into your activeenvironment one at a time asOpenSpecy objects. You can useany Open Specy object as a library which makes it easy to work with andcreate libraries because everything we explained earlier applies tothem.

lib<-load_lib(type ="derivative")

Matches

Before attempting to use a reference library to identify spectra itis really important to understand what format the reference library isin. All the OpenSpecy reference libraries are in Absorbance units.derivative has been absolute first derivative transformed,nobaseline has been baseline corrected,raw isthe rawest form of the reference spectra (not recommended except foradvanced uses). The previously mentioned libraries all have Raman andFTIR spectra in them.mediod_{derivative or nobasline} isthe mediod compressed library version of the libraries and has onlycritical spectra in it,model{derivative or nobasline} isan exception because it is a multinomial regression approach foridentification. In this example we use thedata("test_lib")which is a subsampled version of thederivative library anddata("raman_hdpe") which is an unprocessed Raman spectrumin absorbance units of HDPE plastic.

data("test_lib")data("raman_hdpe")processed<-process_spec(x = raman_hdpe,conform_spec = F,#We will conform during matching.smooth_intens = T#Conducts the default derivative transformation.                          )# Check to make sure that the signal to noise ratio of the processed spectra is# greater than 10.print(sig_noise(processed)>10)#Plot to assess the accuracy of the processing visuallyplotly_spec(raman_hdpe, processed)

After your spectra is processed similarly to the libraryspecifications, you can identify the spectra usingmatch_spec(). Whichever library you choose, you need to getyour spectra into a similar enough format to use for comparison. Theadd_library_metadata andadd_object_metadataoptions specify the column name in the metadata that you want to addmetadata from andtop_n specifies how many matches youwant. In this example we just identified a single spectrum with thelibrary but you can also send an OpenSpecy object with multiple spectra.The outputmatches is a data.table with at least 3 columns,object_id tells you the column names of the spectra inx,library_id tells you the column names fromthe library that it matched to.match_val is the value ofthe Pearson correlation coefficient (default) or other correlation ifspecified in... or if using the model identificationoptionmatch_val will be the model confidence. The outputin this example returned the correct material type, HDPE, as the topmatch. If using Pearson correlation, 0.7 is a good threshold to use fora positive ID. In this example, only our top match is greater than thethreshold so we would disregard the other matches. If no matches wereabove our threshold, we would proclaim that the spectrum is of anunknown identity. You’ll also notice in this example that we matched toa library with both Raman and FTIR spectra but the Raman spectra had thehighest hits, this is the rationale for lazily matching to a librarywith both. If you want to just match to a library with FTIR or Ramanspectra, you can first filter the library usingfilter_spec() usingSpectrumType.

matches<-match_spec(x = processed,library = test_lib,conform = T,add_library_metadata ="sample_name",top_n =5)[order(match_val,decreasing = T)]print(matches[,c("object_id","library_id","match_val","SpectrumType","SpectrumIdentity")])

Library Metadata

The libraries we have created have over 100 variables of metadata inthem and this can be onerous to read through especially given that manyof the variables areNA values. We createdget_metadata() to remedy this by removing columns from themetadata which are all blank values. The function below will return themetadata for the top match inmatches. Remember, similarfilter_spec(), you can specifylogic for morethan one thing at a time.

get_metadata(x = test_lib,logic = matches[[1,"library_id"]],rm_empty = T)

Plot Matches

Overlaying unknown spectra and the best matches can be extremelyuseful to identify peaks that don’t fit to the reference library whichmay need further investigation. The example below shows greatcorrespondence between the best match and the unknown spectrum. Allmajor peaks are accounted for and the correct relative height. There aretwo small peaks in the unknown spectrum near 500 that are not accountedfor which could be investigated further but we would call this apositive id to HDPE.

plotly_spec(processed,filter_spec(test_lib,logic = matches[[1,"library_id"]]))

Sharing Reference Data

If you have reference data or AI models that you think would beuseful for sharing with the spectroscopy community through OpenSpecyplease contact the package administrator to discuss options forcollaborating.

Characterizing Particles

Sometimes the spectroscopy task we want to perform is to identifyparticles in a spectral map. This is especially common for microplasticanalysis where a spectral map is used to image a sample and spectralinformation is used to differentiate microplastic particles fromnonplastic particles. In addition to the material id, one often wants tomeasure the shape and size of the particles. In a brute force technique,one could first identify every spectrum in the map, then usethresholding and image analysis to measure the particles. However, moreoften than not, particles are well separated on the image surface andbackground spectra is quite different from particle spectra andtherefore we can use thresholding a priori to identify and measure theparticles, then pass an exemplary spectrum for each particle to theidentification routine. It is important to note here that this is at thebleeding edge of theory and technique so we may be updating thesefunctions in the near future.

Brute Force

#Test librarydata("test_lib")#Example hyperspectral image with one cellulose acetate particle in the middle of it.test_map<-read_any(read_extdata("CA_tiny_map.zip"))#Process the map to conform to the library.test_map_processed<-process_spec(test_map,conform_spec_args =list(range = test_lib$wavenumber,res =NULL)  )#Identify every spectrum in the map.identities<-match_spec(test_map_processed, test_lib,order = test_map,add_library_metadata ="sample_name",top_n =1)#Relabel any spectra with low correlation coefficients.features<-ifelse(identities$match_val>0.7,tolower(identities$polymer_class),"unknown")#Use spectra identities to identify particle regions as those that have the same material type and are touching.id_map<-def_features(x = test_map_processed,features = features)id_map$metadata$identities<- features# Also should probably be implemented automatically in the function when a# character value is provided.heatmap_spec(id_map,z = id_map$metadata$identities)# Collapses spectra to their median for each particletest_collapsed<-collapse_spec(id_map)# Plot spectra for each identified particleplot(test_collapsed,offset =1,legend_var ="feature_id")

A Priori Particle Thresholding

# Read in test librarydata("test_lib")# Example dataset with one cellulose acetate particle.Conduct spatial smoothing to average each spectrum using adjacent spectra.test_map<-read_any(read_extdata("CA_tiny_map.zip"),spectral_smooth = T,sigma =c(1,1,1))# Characterize the signal times noise to determine where particle regions are.snr<-sig_noise(test_map,metric ="sig_times_noise")# Use this to find your particles and the idal signal times noise value to use for thresholding.heatmap_spec(test_map,z = snr)# Define the feature regions based on the threshold. Pixels from the background in the heatmap above were below 0.05 while my particle's pixels were above so I set snr > 0.05.id_map<-def_features(x = test_map,features = snr>0.05)# Check that the thresholding worked as expected. Here we see a single particle region identified separate from the background.heatmap_spec(id_map,z = id_map$metadata$feature_id)# Collapse the spectra to their medians based on the threshold. Important to# note here that the particles with id -88 are anything from the FALSE values# so they should be your background.collapsed_id_map<- id_map|>collapse_spec()# Process the collapsed spectra to have the same transformation and units as the library.id_map_processed<-process_spec(collapsed_id_map,conform_spec_args =list(range = test_lib$wavenumber,res =NULL)  )# Check the spectra for the background and particle. Background has considerable signal in it too suggesting double bounce along the edges of the particle.plot(id_map_processed,offset =1,legend_var ="feature_id")# Get the matches of the collapsed spectra for the particles.matches<-match_spec(id_map_processed, test_lib,add_library_metadata ="sample_name",top_n =1)

Movatterモバイル変換

Open Specy Package Tutorial

Win Cowger, Zacharias Steinmetz, Rachel Kozloski,Aleksandra Karapetrova

2025-04-26

Document Overview

Installation

Running the App

Read Data

Save Data

Format Conversions

Visualization

Spectra

Maps

Combining OpenSpecy Objects

Filtering OpenSpecy Objects

Sampling OpenSpecy Objects