Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.

License

NotificationsYou must be signed in to change notification settings

bhrnjica/daany

Repository files navigation

Daany Logo

Daany v2.0 - .NETDAtaANalYtics .NET library with the implementation ofDataFrame,Time series decompositions andLinear AlgebraLAPACK andBLASS routines.

LicenseNuGet StatusNuget

Continuous Integration WindowsContinuous Integration Linux

Daany Developer Guide - complete guide for developers.

Software Requirements

The latest version of the library is built on.NET 7 and above.

In case you want to use it on.NET Framework andStandard 2.0, use older versions belowv2.0, or try to build the version from the source code.

The main components with separateNuGet package ofDaany library are:

  • Daany.DataFrame - data frame implementation in pure C#.
  • Daany.DataFrame.Ext - data frame extensions for additional implementation about plotting, data scaling and encoding and similar.
  • Daany.Stat - time series decompositions e.g. SSA, STL, ....
  • Daany.LinA - .NET wrapper of theIntel MKLLAPACK andBLASS routines.

Data Frame (Daany.DataFrame) and extensions (Daany.DataFrame.Ext)

Daany.DataFrame implementation follows the .NET coding paradigm rather than Pandas look and feel. TheDataFrame implementation tries to fill the gap in ML.NET data preparation phase, and it can be easily passed to ML.NET pipeline. TheDataFrame does not require any class type implementation prior to data loading and transformation.

Once theDataFrame completes the data transformation, the extension methods provide the easy way to pass the data intoMLContex object.

The following example showsDaany.DataFrame in action:

Data Loading

We are going to useiris data file, which can be found on many places on the internet. The basic structure of the file is that it contains 5 tab separated columns:sepal_length,sepal_width,petal_length,petal_width, andspecies.TheDaany.DataFrame class has predefined static methods to load data fromtxt orcsv file. The following code loads the data and createDataFrame object:

//read the iris data and create DataFrame object.vardf=DataFrame.FromCsv(orgdataPath,sep:'\t');

Now that we have data frame, we can perform one of many supported data transformations. For this example we are going to create two new calculated columns:

//calculate two new columns into datasetdf.AddCalculatedColumns(newstring[]{"SepalArea","PetalArea"},(r,i)=>{varaRow=newobject[2];aRow[0]=Convert.ToSingle(r["sepal_width"])*Convert.ToSingle(r["sepal_length"]);aRow[1]=Convert.ToSingle(r["petal_width"])*Convert.ToSingle(r["petal_length"]);returnaRow;});

Now thedf object has two new columns:SepalArea andPetalArea.

As the next step we are going to create a newData Frame containing only three columns:SepalArea,PetalArea andSpecies:

//create new data-frame by selecting only three columnsvarderivedDF=df["SepalArea","PetalArea","species"];

For this purpose, we may useCreate method by passing tuples of the old and new column name. In our case, we simply use indexer with column names to get a newData Frame.

Building a model by using ML.NET

We transformed the data and created finaldata frame, which will be passed to the ML.NET. Since the data is already in the memory, we should usemlContext.Data.LoadFromEnumerable ML.NET method. Here we need to provide the type for the loaded data.

So let's create theIris class with only three properties since we want to use only two columns as thefeatures and one as thelabel.

classIris{publicfloatPetalArea{get;set;}publicfloatSepalArea{get;set;}publicstringSpecies{get;set;}}

Once we have the class type implemented we can load thedata frame into ML.NET:

//Load Data Frame into Ml.NET data pipelineIDataViewdataView=mlContext.Data.LoadFromEnumerable<Iris>(derivedDF.GetEnumerator<Iris>((oRow)=>{//convert row object array into Iris rowvarprRow=newIris();prRow.SepalArea=Convert.ToSingle(oRow["SepalArea"]);prRow.PetalArea=Convert.ToSingle(oRow["PetalArea"]);prRow.Species=Convert.ToString(oRow["species"]);//returnprRow;}));

The whole data has been loaded into the ML.NET pipeline, so we have to split the data into Train and Test set:

//Split dataset in two parts: TrainingDataset (80%) and TestDataset (20%)vartrainTestData=mlContext.Data.TrainTestSplit(dataView,testFraction:0.1);vartrainData=trainTestData.TrainSet;vartestData=trainTestData.TestSet;

Create the pipeline to prepare the train data for machine learning:

//prepare data for ML//one encoding output category column by defining KeyValues for each categoryvardataPipeline=mlContext.Transforms.Conversion.MapValueToKey(outputColumnName:"Label",inputColumnName:nameof(Iris.Species))//define features columns.Append(mlContext.Transforms.Concatenate("Features",nameof(Iris.SepalArea),nameof(Iris.PetalArea)));

Use data pipeline andtrainSet to train and build the model.

//train and build the model//create TrainervarlightGbm=mlContext.MulticlassClassification.Trainers.LightGbm();//train the ML modelvarmodel=transformationPipeline.Append(lightGbm).Fit(preparedData);

Model Evaluation

Once we have trained model, we can evaluate how it predicts theIris flower from thetestSet:

//evaluate test setvartestPrediction=model.Transform(testData);varmetricsTest=mlContext.MulticlassClassification.Evaluate(testPrediction);ConsoleHelper.PrintMultiClassClassificationMetrics("TEST Iris Dataset",metricsTest);ConsoleHelper.ConsoleWriteHeader("Test Iris DataSet Confusion Matrix ");ConsoleHelper.ConsolePrintConfusionMatrix(metricsTest.ConfusionMatrix);

Once the program is run the output shows that we have 100% accurate prediction of Iris model usingtestSet:Iris Model Evaluation

Daany Statistics (Daany.Stat)

Besides theDaany.DataFrame the library contains set of implementation with working on time series data. The following list contains some of them:

  • Conversion time series intoDaany.DataFrame andSeries
  • Seasonal and Trend decomposition using Loess -STL time series decomposition,
  • Singular Spectrum AnalysisSSA time series decomposition,
  • Set ofTime Series operations like moving average, etc....

Singular Spectrum Analysis, SSA

WithSSA, you can decompose the time series into any number of components (signals). The following code loads the famousAirPassengerstime series data:

varstrPath=$"{root}/AirPassengers.csv";varmlDF=DataFrame.FromCsv(strPath,sep:',');varts=mlDF["#Passengers"].Select(f=>Convert.ToDouble(f));//create time series from data frame

Now that we haveAirPasanger time series objectts, we can create SSA object by passing thets into it:

//create Singular Spectrum Analysis objectvarssa=newSSA(ts);//perform analysisssa.Fit(36);

So we created thessa object by passing the number of components that we are going to create. Once thessa object has been created we can call theFit method to start with time series SSA analysis.

Once we have analyzed the time series, we can plot its components. The following plot shows the first 4 components:

Iris Model Evaluation

The following plot shows how previous 4 components approximate the actualAirPassengers data:

Iris Model Evaluation

At the end we can plotssa predicted and actual values of the time series:

Iris Model Evaluation

Daany Linear Algebra (Daany.LinA)

TheDaany.LinA provides the ability to use Intel MKL a native and super fast math library to perform linear algebra calculations. With the combination of the previous packages (DataFrame andDaany.Stat) you are able to transform and analyze very complex data, solve system of linear equations, find eigen values and vectors, use least square method etc.

For more information how to use any of the implemented methods please see theDaany Developer Guide, test application implemented in the library or you can useunit test methods which cover almost all implementation in the library.


[8]ページ先頭

©2009-2025 Movatter.jp