- Notifications
You must be signed in to change notification settings - Fork5
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
License
bhrnjica/daany
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Daany v2.0 - .NETDAtaANalYtics .NET library with the implementation ofDataFrame
,Time series
decompositions andLinear Algebra
LAPACK
andBLASS
routines.
Daany Developer Guide - complete guide for developers.
The latest version of the library is built on.NET 7
and above.
In case you want to use it on.NET Framework
andStandard 2.0
, use older versions belowv2.0
, or try to build the version from the source code.
The main components with separateNuGet
package ofDaany
library are:
Daany.DataFrame
- data frame implementation in pure C#.Daany.DataFrame.Ext
- data frame extensions for additional implementation about plotting, data scaling and encoding and similar.Daany.Stat
- time series decompositions e.g. SSA, STL, ....Daany.LinA
- .NET wrapper of theIntel MKL
LAPACK
andBLASS
routines.
Daany.DataFrame
implementation follows the .NET coding paradigm rather than Pandas look and feel. TheDataFrame
implementation tries to fill the gap in ML.NET data preparation phase, and it can be easily passed to ML.NET pipeline. TheDataFrame
does not require any class type implementation prior to data loading and transformation.
Once theDataFrame
completes the data transformation, the extension methods provide the easy way to pass the data intoMLContex
object.
The following example showsDaany.DataFrame
in action:
We are going to useiris data
file, which can be found on many places on the internet. The basic structure of the file is that it contains 5 tab separated columns:sepal_length
,sepal_width
,petal_length
,petal_width
, andspecies
.TheDaany.DataFrame
class has predefined static methods to load data fromtxt
orcsv
file. The following code loads the data and createDataFrame
object:
//read the iris data and create DataFrame object.vardf=DataFrame.FromCsv(orgdataPath,sep:'\t');
Now that we have data frame, we can perform one of many supported data transformations. For this example we are going to create two new calculated columns:
//calculate two new columns into datasetdf.AddCalculatedColumns(newstring[]{"SepalArea","PetalArea"},(r,i)=>{varaRow=newobject[2];aRow[0]=Convert.ToSingle(r["sepal_width"])*Convert.ToSingle(r["sepal_length"]);aRow[1]=Convert.ToSingle(r["petal_width"])*Convert.ToSingle(r["petal_length"]);returnaRow;});
Now thedf
object has two new columns:SepalArea
andPetalArea
.
As the next step we are going to create a newData Frame
containing only three columns:SepalArea
,PetalArea
andSpecies
:
//create new data-frame by selecting only three columnsvarderivedDF=df["SepalArea","PetalArea","species"];
For this purpose, we may useCreate
method by passing tuples of the old and new column name. In our case, we simply use indexer with column names to get a newData Frame
.
We transformed the data and created finaldata frame
, which will be passed to the ML.NET. Since the data is already in the memory, we should usemlContext.Data.LoadFromEnumerable
ML.NET method. Here we need to provide the type for the loaded data.
So let's create theIris
class with only three properties since we want to use only two columns as thefeatures
and one as thelabel
.
classIris{publicfloatPetalArea{get;set;}publicfloatSepalArea{get;set;}publicstringSpecies{get;set;}}
Once we have the class type implemented we can load thedata frame
into ML.NET:
//Load Data Frame into Ml.NET data pipelineIDataViewdataView=mlContext.Data.LoadFromEnumerable<Iris>(derivedDF.GetEnumerator<Iris>((oRow)=>{//convert row object array into Iris rowvarprRow=newIris();prRow.SepalArea=Convert.ToSingle(oRow["SepalArea"]);prRow.PetalArea=Convert.ToSingle(oRow["PetalArea"]);prRow.Species=Convert.ToString(oRow["species"]);//returnprRow;}));
The whole data has been loaded into the ML.NET pipeline, so we have to split the data into Train and Test set:
//Split dataset in two parts: TrainingDataset (80%) and TestDataset (20%)vartrainTestData=mlContext.Data.TrainTestSplit(dataView,testFraction:0.1);vartrainData=trainTestData.TrainSet;vartestData=trainTestData.TestSet;
Create the pipeline to prepare the train data for machine learning:
//prepare data for ML//one encoding output category column by defining KeyValues for each categoryvardataPipeline=mlContext.Transforms.Conversion.MapValueToKey(outputColumnName:"Label",inputColumnName:nameof(Iris.Species))//define features columns.Append(mlContext.Transforms.Concatenate("Features",nameof(Iris.SepalArea),nameof(Iris.PetalArea)));
Use data pipeline andtrainSet
to train and build the model.
//train and build the model//create TrainervarlightGbm=mlContext.MulticlassClassification.Trainers.LightGbm();//train the ML modelvarmodel=transformationPipeline.Append(lightGbm).Fit(preparedData);
Once we have trained model, we can evaluate how it predicts theIris flower
from thetestSet
:
//evaluate test setvartestPrediction=model.Transform(testData);varmetricsTest=mlContext.MulticlassClassification.Evaluate(testPrediction);ConsoleHelper.PrintMultiClassClassificationMetrics("TEST Iris Dataset",metricsTest);ConsoleHelper.ConsoleWriteHeader("Test Iris DataSet Confusion Matrix ");ConsoleHelper.ConsolePrintConfusionMatrix(metricsTest.ConfusionMatrix);
Once the program is run the output shows that we have 100% accurate prediction of Iris model usingtestSet
:
Besides theDaany.DataFrame
the library contains set of implementation with working on time series data. The following list contains some of them:
- Conversion time series into
Daany.DataFrame
andSeries
- Seasonal and Trend decomposition using Loess -
STL
time series decomposition, - Singular Spectrum Analysis
SSA
time series decomposition, - Set of
Time Series
operations like moving average, etc....
WithSSA
, you can decompose the time series into any number of components (signals). The following code loads the famousAirPassengers
time series data:
varstrPath=$"{root}/AirPassengers.csv";varmlDF=DataFrame.FromCsv(strPath,sep:',');varts=mlDF["#Passengers"].Select(f=>Convert.ToDouble(f));//create time series from data frame
Now that we haveAirPasanger
time series objectts
, we can create SSA object by passing thets
into it:
//create Singular Spectrum Analysis objectvarssa=newSSA(ts);//perform analysisssa.Fit(36);
So we created thessa
object by passing the number of components that we are going to create. Once thessa
object has been created we can call theFit
method to start with time series SSA analysis.
Once we have analyzed the time series, we can plot its components. The following plot shows the first 4 components:
The following plot shows how previous 4 components approximate the actualAirPassengers
data:
At the end we can plotssa
predicted and actual values of the time series:
TheDaany.LinA
provides the ability to use Intel MKL a native and super fast math library to perform linear algebra calculations. With the combination of the previous packages (DataFrame
andDaany.Stat
) you are able to transform and analyze very complex data, solve system of linear equations, find eigen values and vectors, use least square method etc.
For more information how to use any of the implemented methods please see theDaany Developer Guide, test application implemented in the library or you can useunit test
methods which cover almost all implementation in the library.
About
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.