Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Protocol Inspection and State Machine Analysis
Version:0.2-7
Date:2018-05-26
Depends:R (≥ 2.10), Matrix, gplots, methods, ggplot2
Suggests:tm (≥ 0.6)
Author:Tammo Krueger, Nicole Kraemer
Maintainer:Tammo Krueger <tammokrueger@googlemail.com>
Description:Loads and processes huge text corpora processed with the sally toolbox (http://www.mlsec.org/sally/). sally acts as a very fast preprocessor which splits the text files into tokens or n-grams. These output files can then be read with the PRISMA package which applies testing-based token selection and has some replicate-aware, highly tuned non-negative matrix factorization and principal component analysis implementation which allows the processing of very big data sets even on desktop machines.
License:GPL-2 |GPL-3 [expanded from: GPL (≥ 2.0)]
NeedsCompilation:no
Packaged:2018-05-26 15:51:57 UTC; tammok
Repository:CRAN
Date/Publication:2018-05-26 22:01:47 UTC

Protocol Inspection and State Machine Analysis

Description

Loads and processes huge textcorpora processed with the sally toolbox (<http://www.mlsec.org/sally/>).sally acts as a very fast preprocessor which splits the text files intotokens or n-grams. These output files can then be read with the PRISMApackage which applies testing-based token selection and has somereplicate-aware, highly tuned non-negative matrix factorization andprincipal component analysis implementation which allows the processing ofvery big data sets even on desktop machines.

Details

Package: PRISMA
Type: Package
Title: Protocol Inspection and State Machine Analysis
Version: 0.2-7
Date: 2018-05-26
Depends: Matrix,gplots,methods,ggplot2
Suggests: tm (>= 0.6)
Author: Tammo Krueger, Nicole Kraemer
Maintainer: Tammo Krueger <tammokrueger@googlemail.com>
Description: Loads and processes huge text corpora processed with the sally toolbox (<http://www.mlsec.org/sally/>). sally acts as a very fast preprocessor which splits the text files into tokens or n-grams. These output files can then be read with the PRISMA package which applies testing-based token selection and has some replicate-aware, highly tuned non-negative matrix factorization and principal component analysis implementation which allows the processing of very big data sets even on desktop machines.
License: GPL (>=2.0)

Index of help topics:

PRISMA-package          Protocol Inspection and State Machine Analysisasap                    The ASAP Data SetcorpusToPrisma          Convert tm copus to PRISMAestimateDimension       Estimate Inner DimensiongetDuplicateData        Restores Data with DuplicatesgetMatrixFactorizationLabels                        Convert Coordinates of Matrix Factorization to                        LabelsloadPrismaData          Load PRISMA Data Filesplot.prisma             Generics For PRISMA Objectsplot.prismaDimension    Generics For PRISMA Objectsplot.prismaMF           Generics For PRISMA ObjectsprismaDuplicatePCA      Matrix Factorization Based on Replicate-Aware                        PCAprismaHclust            Matrix Factorization Based on Hierarchical                        ClusteringprismaNMF               Matrix Factorization Based on Replicate-Aware                        NMFthesis                  The Thesis Data Set

Further information is available in the following vignettes:

PRISMA Quick introduction (source)

Author(s)

Tammo Krueger, Nicole Kraemer

Maintainer: Tammo Krueger <tammokrueger@googlemail.com>

References

Krueger, T., Gascon, H., Kraemer, N., Rieck, K. (2012)Learning Stateful Models for Network Honeypots5th ACM Workshop on Artificial Intelligence and Security (AISEC 2012), accepted

Krueger, T., Kraemer, N., Rieck, K. (2011)ASAP: Automatic Semantics-Aware Analysis of Network PayloadsPrivacy and Security Issues in Data Mining and Machine Learning - International ECML/PKDD Workshop. Lecture Notes in Computer Science 6549, Springer. 50 - 63

Examples

# please see the vingette for examples

The ASAP Data Set

Description

Toy data set to show the capabilities of the PRISMA package.

Usage

asap

Format

A prisma object.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

References

Krueger, T., Kraemer, N., Rieck, K. (2011)ASAP: Automatic Semantics-Aware Analysis of Network PayloadsPrivacy and Security Issues in Data Mining and Machine Learning - International ECML/PKDD Workshop. Lecture Notes in Computer Science 6549, Springer. 50 - 63


Convert tm copus to PRISMA

Description

Converts a tm corpus object to a PRISMA object.

Usage

corpusToPrisma(corpus, alpha = 0.05, skipFeatureCorrelation = FALSE)

Arguments

corpus

a tm corpus

alpha

significance level for the feature tests. If NULL, all features are kept.

skipFeatureCorrelation

should the grouping of features based on correlation analysis be skipped.

Value

prismaData

data object representing the tokenized documents asfeatures x samples matrix.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

Examples

if (require("tm") && packageVersion("tm") >= '0.6') {  data(thesis)  thesis  thesis = corpusToPrisma(thesis, NULL, TRUE)  thesis}

Estimate Inner Dimension

Description

Matrix factorization methods compress the original data matrixA \in R^{f,N} withf features andN samples into two parts,namelyA = B C withB \in R^{f,k}, C\in R^{k, N}. The function estimateDimension estimatesk based on a noisemodel estimated from a scrambled version of the original data matrix.

Usage

estimateDimension(prismaData, alpha = 0.05, nScrambleSamples = NULL)

Arguments

prismaData

A prismaData object loaded via loadPrismaData

alpha

Error probability for confidence intervals

nScrambleSamples

The number of scrambled samples that should be used to estimate thenoise model. NULL means to use the complete data set.

Value

estDim

prismaDimension object that can be printed and plotted.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

References

R. Schmidt. Multiple emitter location and signal parameter estimation.IEEE Transactions on Antennas and Propagation, 34(3):276 – 280, 1986.

Examples

# please see the vingette for examles

Restores Data with Duplicates

Description

TheloadPrismaData function triggers a feature selection anddata combination methods which subsequently remove duplicate entries forefficient representation of the data. ThegetDuplicateData rebuilds the data matrix withexplicit representation of all duplicate entries.

Usage

getDuplicateData(prismaData)

Arguments

prismaData

prisma data loaded vialoadPrismaData

Value

dataWithDuplicates

Data matrix containing explicit copies of all duplicates.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

Examples

data(asap)dataWithDuplicates = getDuplicateData(asap)

Convert Coordinates of Matrix Factorization to Labels

Description

Given a matrix factorization objectA = B C, this function returns for eachdocument the index of the inner dimension which has the maximalcoordinate. Thus, it converts the fuzzy clustering found in thecolumns of theC matrix into a hard clustering by returning theposition with the maximal coordinate value.

Usage

getMatrixFactorizationLabels(prismaMF)

Arguments

prismaMF

a matrix factorization object.

Value

labels

vector containing the label assignment for each document.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

See Also

prismaNMF


Load PRISMA Data Files

Description

Loads files generated by the sally tool (seehttp://www.mlsec.org/sally/) and represents the data as binarytoken/ngrams x documents matrix. After loading, statistical tests areapplied to find features which are not volatile norconstant. Co-occurring features are grouped to further compactify thedata. Seesystem.file("extdata","sallyPreprocessing.py", package="PRISMA") for a Python script which generates thecorresponding .fsally file from a .sally file which reduce theloading time vialoadPrismaData considerably.

Usage

loadPrismaData(path, maxLines = -1, fastSally = TRUE,               alpha = 0.05, skipFeatureCorrelation=FALSE)

Arguments

path

path of the data file without the .sally extension. loadPrisma loadspath.sally or path.fsally depending on the fastSally switch.

maxLines

maximal number of lines to read from the data file. -1 means to readall lines.

fastSally

should the fsally file be used, which drastically decreases loading time.

alpha

significance level for the feature tests. If NULL, all features are kept.

skipFeatureCorrelation

should the grouping of features based on correlation analysis be skipped.

Value

prismaData

data object representing the tokenized documents asfeatures x samples matrix.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

References

Seehttp://www.mlsec.org/sally/ for the sally utility.

Examples

# please see the vingette for examles# please see system.file("extdata","asap.tar.gz", package="PRISMA") for# an example sally output

Generics For PRISMA Objects

Description

Print and plot generic for the PRISMA objects.

Usage

## S3 method for class 'prisma'print(x, ...)## S3 method for class 'prisma'plot(x, ...)

Arguments

x

PRISMA data loaded vialoadPrismaData

...

not used

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

See Also

estimateDimension,prismaHclust,prismaDuplicatePCA,prismaNMF

Examples

data(asap)print(asap)plot(asap)

Generics For PRISMA Objects

Description

Print and plot generic for the PRISMA dimension objects.

Usage

## S3 method for class 'prismaDimension'print(x, ...)## S3 method for class 'prismaDimension'plot(x, ...)

Arguments

x

PRISMA dimension object generated viaestimateDimension

...

not used

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

See Also

estimateDimension,prismaHclust,prismaDuplicatePCA,prismaNMF

Examples

# please see the vingette for examles

Generics For PRISMA Objects

Description

Print and plot generic for the PRISMA matrix factorization objects.

Usage

## S3 method for class 'prismaMF'plot(x, nLines = NULL, baseIndex = NULL, sampleIndex = NULL,minValue = NULL, noRowClustering = FALSE, noColClustering = FALSE, type= c("base", "coordinates"), ...)

Arguments

x

PRISMA matrix factorization object

nLines

number of lines that should be plotted

baseIndex

which bases should be plotted

sampleIndex

which samples should be plotted

minValue

cut-off value, i.e., every value smaller thanminValue won't be shown

noRowClustering

don't cluster the rows

noColClustering

don't cluster the columns

type

show the base (type = "base", i.e. theB matrix) orshow the coordinate (type = "coordinates", i.e. theC matrix).

...

not used

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

See Also

estimateDimension,prismaHclust,prismaDuplicatePCA,prismaNMF

Examples

# please see the vingette for examles

Matrix Factorization Based on Replicate-Aware PCA

Description

Efficient implementation of a replicate-aware principal componentanaylsis (PCA).

Usage

prismaDuplicatePCA(prismaData)

Arguments

prismaData

PRISMA data for which a PCA should be calculated

Value

prismaPCA

Matrix factorization object $A = B C$, in which thefactors are calculate by a replicate-aware PCA

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

Examples

# please see the vingette for examles

Matrix Factorization Based on Hierarchical Clustering

Description

A matrix factorizationA = B C based on the results of hclust is constructed,which holds the mean feature values for each cluster in the matrixBand the indication of the cluster in the matrixC for each datapoint (i.e. each data point is represented by its assigned cluster center).

Usage

prismaHclust(prismaData, ncomp, method = "single")

Arguments

prismaData

PRISMA data for which a clustering should be calculated.

ncomp

the number of components that should be extracted.

method

the method used for clustering.

Value

prismaHclust

Matrix factorization object containingB andCresulting from the hierarchical clustering of the data.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

See Also

hclust

Examples

# please see the vingette for examles

Matrix Factorization Based on Replicate-Aware NMF

Description

Matrix factorizationA = B C with strictly positiv matricesB, Cwhich minimize the reconstruction error\|A - B C\|. Thisreplicate-aware version of the non-negtive matrix factorization (NMF)is based on the alternating least squaresapproach and exploits the replicate information to speed up the calculation.

Usage

prismaNMF(prismaData, ncomp, time = 60, pca.init = TRUE, doNorm = TRUE, oldResult = NULL)

Arguments

prismaData

PRISMA data for which a NMF should be calculated.

ncomp

either aninteger orprismaDimension object specifyingthe inner dimension of the matrix factorization.

time

seconds after which the calculation should end.

pca.init

should theB matrix be initialized by a PCA.

doNorm

should theB matrix normalized (i.e. all columns have theEuclidean length of 1).

oldResult

re-use results of a previous run, i.e.B andC arepre-initialized with the values of this previous matrixfactorization object.

Value

prismaNMF

Matrix factorization object containing theB andC matrix.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

References

Krueger, T., Gascon, H., Kraemer, N., Rieck, K. (2012)Learning Stateful Models for Network Honeypots5th ACM Workshop on Artificial Intelligence and Security (AISEC 2012), accepted

R. Albright, J. Cox, D. Duling, A. Langville, and C. Meyer. (2006)Algorithms, initializations, and convergence for the nonnegativematrix factorization.Technical Report 81706, North Carolina State University

Examples

# please see the vingette for examles

The Thesis Data Set

Description

The 15 sections of a thesis (see references) as a tm-corpus.

Usage

thesis

Format

A tm-corpus.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

References

Tammo Krueger.Probabilistic Methods for Network Security. From Analysis to Response. PhD thesis,TU Berlin, 2013.http://opus.kobv.de/tuberlin/volltexte/2013/3881/


[8]ページ先頭

©2009-2025 Movatter.jp