Main aim of this tutorial is to load the full content of an MTZ filein the workspace, to analyse and modify data, and to write the modifiedcontent into a new MTZ file.
Some sample files are stored as external data in this package. Theseare the MTZ files available with the current release. To access thefile, first load thecry package.
Next, have a look at what is included in the external-data directoryofcry.
datadir<-system.file("extdata",package="cry")all_files<-list.files(datadir)print(all_files)#> [1] "1dei-sf.cif" "1dei_phases.mtz"#> [3] "2ol9_phases.mtz" "3syu.cif"#> [5] "6vww_xds_ascii_merged.hkl" "AMS_DATA.cif"#> [7] "e-65-00i60-Isup2.rtv" "shelxc.log"#> [9] "shelxd.log" "shelxe_i.log"#> [11] "shelxe_o.log" "xds00_ascii.hkl"We are interested in the MTZ file “1dei_phases.mtz”. This file can beloaded using functionreadMTZ.
Let’s load “1dei_phases.mtz” in memory. The created object is a namedlist.
filename<-file.path(datadir,"1dei_phases.mtz")objMTZ<-readMTZ(filename)class(objMTZ)#> [1] "list"names(objMTZ)#> [1] "reflections" "header" "batch_header"Each component of is either a named list, or a data frame. Morespecifically:
reflections
It’s a data frame. It contains the actual x-ray diffraction data froma given crystal.
header
It’s a named list. It contains information on the crystal used toobtain the x-ray diffraction data and on the diffraction experiment, ingeneral.
batch_header It’s a named list. It gets filled onlywhen the MTZ file
includes so-called “unmerged” data.
For the merged MTZ file we are exploring in this tutorial, thebatch_header isNULL.
headerThis is a named list. The names hint at the content of the specificcomponents.
hdr<- objMTZ$headerclass(hdr)#> [1] "list"names(hdr)#> [1] "TITLE" "NCOL" "CELL" "SORT" "SYMINF" "RESO" "NDIF"#> [8] "SYMM" "PROJECT" "CRYSTAL" "DATASET" "DCELL" "DWAVEL" "COLUMN"#> [15] "COLSRC"For example,NCOL is the number of columns of thereflections data frame,CELL contains thecrystal unit cell parameters andSYMM includes informationon the crystal symmetry.
For the file loaded we find:
print(objMTZ$header$PROJECT)#> id pname#> 1 0 HKL_base#> 2 1 sf_convertprint(objMTZ$header$CRYSTAL)#> id cname#> 1 0 HKL_base#> 2 1 cryst_1print(objMTZ$header$DATASET)#> id dname#> 1 0 HKL_base#> 2 1 data_1The only dataset in this MTZ file is calleddata_1. It wasobtained from x-ray diffraction of a crystal calledcryst_1.This crystal is one of the samples belonging to a project calledsf_convert. All MTZ files contain, by default, a datasetHKL_base which comes from the crystalHKL_base, whichis part of the projectHKL_base; this peculiar project isalways present to make sure that data observations made only of theMiller indices, are always included. More information on the header’scontent can be read in the documentation forreadMTZ orreadMTZHeader.
reflectionsData are actually contained in this component. It is a data framewhose columns have the names found in the header componentCOLUMN[,1]:
objMTZ$header$COLUMN[,1]#> [1] "H" "K" "L" "FP" "SIGFP"#> [6] "FC" "PHIC" "FC_ALL" "PHIC_ALL" "FWT"#> [11] "PHWT" "DELFWT" "PHDELWT" "FOM" "FC_ALL_LS"#> [16] "PHIC_ALL_LS"objMTZ$reflections[1:5,]#> H K L FP SIGFP FC PHIC FC_ALL PHIC_ALL FWT PHWT#> 1 0 0 4 420.23508 48.025589 411.46274 0 441.54556 0 398.92459 0#> 2 0 0 6 784.55963 53.490089 719.30823 180 780.32288 180 788.79639 180#> 3 0 0 8 266.88480 18.889168 156.35133 0 145.58990 360 388.17963 360#> 4 0 0 10 60.62275 21.459837 79.47448 0 82.45022 0 30.81676 0#> 5 0 0 14 94.31614 5.330165 42.39147 0 46.47320 360 142.01843 360#> DELFWT PHDELWT FOM FC_ALL_LS PHIC_ALL_LS#> 1 42.620972 180 1.0000000 460.96564 0#> 2 8.473511 180 1.0000000 743.42645 180#> 3 242.589722 360 0.9999999 145.39508 360#> 4 51.633461 180 0.9341952 81.84018 0#> 5 95.545235 360 0.9992544 42.30125 360A data frame is the appropriate class for observations like thecrystallographic x-ray data. One reason for this is, for instance, thatgrouping or selection according to a specific criterion can be carriedout very easily with a data frame. Let us, as an example, select part ofthe data using a condition on the Miller indices. We can be interestedto select reflections for which\(h\)has a specific value; this is shown in the following chunk of code.
# List the different values of Hunique(objMTZ$reflections$H)#> [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24#> [26] 25 26 27 28 29 30 31 32 33 34 35# Select all reflections with H=1idx<-which(objMTZ$reflections$H==1)length(idx)# 373 reflections have H=0#> [1] 373# Save these reflections in a different objectrefs<- objMTZ$reflections[idx,]# Show the first 10 selected reflectionsrefs[1:10,]#> H K L FP SIGFP FC PHIC FC_ALL PHIC_ALL#> 338 1 0 4 69.46197 7.137606 67.48492 1.800000e+02 38.35452 1.800000e+02#> 339 1 0 5 55.82940 1.932214 137.18262 9.000000e+01 120.74961 8.999999e+01#> 340 1 0 6 296.25644 22.226982 283.05109 3.617508e-06 302.89392 2.897944e-06#> 341 1 0 7 294.59216 23.033159 276.94324 2.700000e+02 294.35089 2.700000e+02#> 342 1 0 8 77.35909 3.981911 69.41461 2.083899e-06 64.56831 2.083780e-06#> 343 1 0 9 46.07975 4.606244 32.10196 2.700001e+02 30.71046 2.700001e+02#> 344 1 0 12 118.85085 2.718317 90.03316 0.000000e+00 94.01305 0.000000e+00#> 345 1 0 13 56.22640 4.300251 56.44312 8.999997e+01 53.37141 8.999997e+01#> 346 1 0 14 52.02279 4.389729 70.51323 1.800000e+02 77.82328 1.800000e+02#> 347 1 1 3 292.26788 26.617142 374.87027 1.705684e+02 279.07269 1.674537e+02#> FWT PHWT DELFWT PHDELWT FOM FC_ALL_LS#> 338 39.99242 1.800000e+02 1.6379013 1.800000e+02 0.5639557 50.18671#> 339 12.13563 2.700000e+02 132.8852386 2.700000e+02 0.9727311 127.82761#> 340 289.61896 2.897945e-06 13.2749634 1.800000e+02 1.0000000 290.93329#> 341 294.83344 2.700000e+02 0.4825439 2.700000e+02 1.0000000 268.81738#> 342 87.53786 2.083780e-06 22.9695587 2.083780e-06 0.9831176 73.09277#> 343 32.66603 2.700001e+02 1.9555779 2.700001e+02 0.6876827 31.74592#> 344 143.68866 0.000000e+00 49.6756134 0.000000e+00 1.0000000 90.22993#> 345 59.07624 8.999997e+01 5.7048264 8.999997e+01 0.9999542 56.48866#> 346 26.22218 1.800000e+02 51.6010971 8.662659e-06 0.9999989 70.44583#> 347 294.21536 1.674537e+02 15.1426640 1.674536e+02 0.9807579 297.09183#> PHIC_ALL_LS#> 338 1.800000e+02#> 339 8.999999e+01#> 340 3.346423e-06#> 341 2.700000e+02#> 342 1.979033e-06#> 343 2.700001e+02#> 344 0.000000e+00#> 345 8.999997e+01#> 346 1.800000e+02#> 347 1.674240e+02# Find out the range of FP/sigFP for the selected reflectionsrange(refs$FP/refs$SIGFP)#> [1] 2.374856 56.249997Clearly, a lot of different operations and investigations can becarried out on the selected data. A second example involves data forwhich the signal-to-noise ratio (FP/sigFP) is greater than 1, as shownin the following snippet.
idx<-which(objMTZ$reflections$FP/objMTZ$reflections$SIGFP>=1)length(idx)# 8377 reflections have FP/sigFP >= 1#> [1] 8377# The reflections can be extractedrefs<- objMTZ$reflections[idx,]# Histogram of I/sigI with ggplot2ggplot(data = refs,aes(FP/SIGFP))+geom_histogram(fill ="#5494AB",colour ="#125CD2",binwidth =3)+xlab("FP/sigFP")+ylab("Number of Reflections")+labs(title ="Signal-to-noise ratio FP/sigFP greater than 1")+theme_cry()A third example is related to the reflections’ resolution. Once ithas been calculated (cell parameters are needed to do that), data can beselected within a given resolution range (a so-calledresolutionshell). There is a function incry, calledhkl_to_reso, which calculates the resolution in angstromsfor a reflection of Miller indices\(h, k,l\).
# Extract cell parameters from headercpars<- objMTZ$header$CELLprint(cpars)#> [1] 57.4 56.3 23.0 90.0 90.0 90.0a<- cpars[1]; b<- cpars[2]; c<- cpars[3]aa<- cpars[4]; bb<- cpars[5]; cc<- cpars[6]# Resolution of reflection (1,0,0) (0,1,0) (0,0,1)hkl_to_reso(1,0,0,a,b,c,aa,bb,cc)#> [1] 57.4hkl_to_reso(0,1,0,a,b,c,aa,bb,cc)#> [1] 56.3hkl_to_reso(0,0,1,a,b,c,aa,bb,cc)#> [1] 23# Reflections with higher Miller indices have higher resolutionshkl_to_reso(10,0,0,a,b,c,aa,bb,cc)#> [1] 5.74hkl_to_reso(10,0,-20,a,b,c,aa,bb,cc)#> [1] 1.127592Once resolutions are calculated (and the calculation is done on datahaving the same order as the original reflections), data within a givenresolution shell can be selected.
In order to investigate specific ideas it might be worth to modifythe observations in a specific way and to write them out to a new MTZfile to be later handled by the tools proper of theCCP4 family of programs forcrystallography. Thecry function to write reflectionscontent to an MTZ file is calledwriteMTZ. In order to makethis a flexible function, it has been deemed appropriate to take asinput the three named lists returned byreadMTZ. These willhave been changed by the user, prior to use withwriteMTZ.By default, the MTZ file title is left unchanged and thebatch_header list is set toNULL; thus bydefault the MTZ file is assumed to be a merged-reflections file. Specialattention will have to be devoted to parts of theheaderlist, which store information on the observations. These are, forinstance,NCOL,SORT,RESO,COLUMN, etc. An example can help to clarify thisconcept.
A typical modifications of the observed data is when high-resolutionreflections are eliminated. This means that the number of reflectionsand other quantities in theheader will have to bechanged.
# Copy original data to 2 separate listsrefs<- objMTZ$reflectionshdr<- objMTZ$headerlength(refs[,1])# Number of reflections before cut#> [1] 8377print(hdr$NCOL[2])#> [1] 8377# Cut data to 5 angstroms resolution# (see previous chunk of code for resos)idx<-which(resos>=5)refs<- refs[idx,]length(refs[,1])# Now there are only 333 observations#> [1] 333# Modify specific parts of headerhdr$NCOL[2]<-length(refs[,1])# Number of reflectionshdr$RESO<-sort(1/range(resos[idx])^2)# Resolution rangehdr$DATASET[2,2]<-"Data cut to 5A"# Datasetobsmin<-apply(refs,2,min,na.rm=TRUE)obsmax<-apply(refs,2,max,na.rm=TRUE)hdr$COLUMN[,3]<- obsmin# COLUMNhdr$COLUMN[,4]<- obsmaxThe data frameCOLSRC contains the date and time atwhich the specific data columns have been generated. This can be changedwith thecry functionchange_COLSRC; the newdate and time is the current one.
# Original COLSRCprint(hdr$COLSRC)#> labels created id#> 1 H CREATED_11/10/2017_01:55:14 0#> 2 K CREATED_11/10/2017_01:55:14 0#> 3 L CREATED_11/10/2017_01:55:14 0#> 4 FP CREATED_11/10/2017_01:55:14 1#> 5 SIGFP CREATED_11/10/2017_01:55:14 1#> 6 FC CREATED_11/10/2017_01:55:14 1#> 7 PHIC CREATED_11/10/2017_01:55:14 1#> 8 FC_ALL CREATED_11/10/2017_01:55:14 1#> 9 PHIC_ALL CREATED_11/10/2017_01:55:14 1#> 10 FWT CREATED_11/10/2017_01:55:14 1#> 11 PHWT CREATED_11/10/2017_01:55:14 1#> 12 DELFWT CREATED_11/10/2017_01:55:14 1#> 13 PHDELWT CREATED_11/10/2017_01:55:14 1#> 14 FOM CREATED_11/10/2017_01:55:14 1#> 15 FC_ALL_LS CREATED_11/10/2017_01:55:14 1#> 16 PHIC_ALL_LS CREATED_11/10/2017_01:55:14 1# Change date and timehdr<-change_COLSRC(hdr)# New COLSRCprint(hdr$COLSRC)#> labels created id#> 1 H CREATED_19/06/2025_18:08:13 0#> 2 K CREATED_19/06/2025_18:08:13 0#> 3 L CREATED_19/06/2025_18:08:13 0#> 4 FP CREATED_19/06/2025_18:08:13 1#> 5 SIGFP CREATED_19/06/2025_18:08:13 1#> 6 FC CREATED_19/06/2025_18:08:13 1#> 7 PHIC CREATED_19/06/2025_18:08:13 1#> 8 FC_ALL CREATED_19/06/2025_18:08:13 1#> 9 PHIC_ALL CREATED_19/06/2025_18:08:13 1#> 10 FWT CREATED_19/06/2025_18:08:13 1#> 11 PHWT CREATED_19/06/2025_18:08:13 1#> 12 DELFWT CREATED_19/06/2025_18:08:13 1#> 13 PHDELWT CREATED_19/06/2025_18:08:13 1#> 14 FOM CREATED_19/06/2025_18:08:13 1#> 15 FC_ALL_LS CREATED_19/06/2025_18:08:13 1#> 16 PHIC_ALL_LS CREATED_19/06/2025_18:08:13 1Once all changes have been done, the list components are ready to besaved out to a new MTZ file called, in this specific instance,“new.mtz”.
# Temporary directory for outputtdir<-tempdir()fname<-file.path(tdir,"new.mtz")# Write changed data to the new MTZ filewriteMTZ(refs,hdr,fname,title="New truncated dataset")Data from the new MTZ file can either be explored using CCP4programs, or read back into R using againreadMTZ.
# Read data from "new.mtz"newMTZ<-readMTZ(fname,TRUE)#> [1] "COLUMN H H 0 11 0"#> [1] "COLUMN K H 0 11 0"#> [1] "COLUMN L H 0 4 0"#> [1] "COLUMN FP F 12.105 1095.345 1"#> [1] "COLUMN SIGFP Q 1.365 96.585 1"#> [1] "COLUMN FC F 5.035 1055.738 1"#> [1] "COLUMN PHIC P 0 360 1"#> [1] "COLUMN FC_ALL F 1.210 1136.601 1"#> [1] "COLUMN PHIC_ALL P 0 360 1"#> [1] "COLUMN FWT F 0.133 1054.090 1"#> [1] "COLUMN PHWT P 0 360 1"#> [1] "COLUMN DELFWT F 0.070 393.415 1"#> [1] "COLUMN PHDELWT P 0 360 1"#> [1] "COLUMN FOM W 0.023 1.000 1"#> [1] "COLUMN FC_ALL_LS F 1.537 1171.841 1"#> [1] "COLUMN PHIC_ALL_LS P 0 360 1"