Supporting website for Generalized Random Shapelet Forest

1. Code

1 Code

The implementation of the Generalized Random Shapelet Forest is available in GitHub (here). The implementation can be found in the files ShapeletTree.java and RandomShapeletForest.java. To run the code, several layers of code must be compiled. To make the process simpler, a command line utility for running the algorithm over data sets from the UCR Time Series Repository is provided here and a binary can be downloaded here.

1.1 Running the command line utility (uni- and multivariate data [UPDATED])

The command line utility is able to load and run uni-variate time series datasets downloaded from the UCR Time Series Repository (a sample dataset can be found and downloaded here)

The usage of the tool is simple and provides a few ways to tweak the algorithm:

usage: rsfcmd.jar [OPTIONS] trainFile testFile
 -l,--lower <arg>       Lower shapelet size (fraction of length, e.g,
                        0.05)
 -m,--multivariate      The given dataset is in a multivariate format
 -n,--no-trees <arg>    Number of trees
 -p,--print-shapelets   Print the shapelets of the forest
 -r,--sample <arg>      Number of shapelets
 -u,--upper <arg>       Upper shapelet size (fraction of length, e.g, 0.8)

For example, to run the algorithm over the synthetic_control dataset (first we download the required files), run the following:

$ mkdir rsf_example
$ cd rsf_example
$ curl -O https://raw.githubusercontent.com/isakkarlsson/rsf-cmdline/master/binary/rsfcmd.jar
$ curl -O https://raw.githubusercontent.com/isakkarlsson/rsf-cmdline/master/dataset/synthetic_control/synthetic_control_TRAIN
$ curl -O https://raw.githubusercontent.com/isakkarlsson/rsf-cmdline/master/dataset/synthetic_control/synthetic_control_TEST
$ java -jar rsfcmd.jar -n 100 synthetic_control_TRAIN synthetic_control_TEST

[UPDATED] For multivariate datasets, the file format is slightly different. A directory with a file named "classes.dat" with classes separated by commas (",") and a directory named "data" with files named from "1.dat" to "n.dat" (where n is the number of multivariate timeseries in the dataset). Each of the file in the "data"-folder contains 1 to m rows of comma (",") separated numbers for each time series dimension. (See the CharacterTrajectories dataset found in the mts_data.zip-file)

To try it out, run the following:

$ mkdir mts_example
$ cd mts_example
$ curl -O https://raw.githubusercontent.com/isakkarlsson/rsf-cmdline/master/binary/rsfcmd.jar
$ curl -LO http://people.dsv.su.se/~isak-kar/grsf/mts_data.zip
$ unzip mts_data.zip
$ java -jar rsfcmd.jar -n 100 -m CharacterTrajectories/train CharacterTrajectories/test

1.2 Multivariate data

For multivariate data, the process is slightly different (there is no default way of reading such datasets implemented in the main framework (yet)). Instead, we have implemented such reading procedures separately and made them available for download here. They are implemented in Groovy and as such require the Groovy Compiler.

The datasets used in the experiments can be found in the file-repository of Mustafa Baydogan. Since they are only available as .mat-files, we have converted them to a format which the MatlabExtractLoader.groovy-file can read. These files can be downloaded here

To run the Generalized Shapelet Forest over one of these datasets, use the following Groovy code:

@Grab(group='org.briljantframework', module='mimir-core', version='0.1-SNAPSHOT')
import org.briljantframework.mimir.classification.*
import loaders.*;

def loader = new MatlabExtractLoader(dir:"/some/dataset/dir/for_example/ECG")
def validator = ClassifierValidator.crossValidator(5) // 5-folds
validator.add(ClassifierEvaluator.INSTANCE)

def rsf = new RandomShapeletForest.Configurator(100).configure();

def (y, x) = loader.load();

println "The loaded dataset has ${x.rows()} instances"
println "Running the experiment
println validator.test(rsf, x, y).measures.mean()

An example can be download here (see the description below).

To get the example to run, Maven and Groovy must be installed and we must also clone and install the code for the learning framework (i.e., Briljant and Mimir).

First, install: maven, groovy and git using a suitable method, e.g., on OSX use Homebrew:

brew install maven groovy git

Then create a new directory and clone the Briljant and Mimir github repositories, download the required code and datasets and edit/run the example:

$ mkdir mts_example
$ cd mts_example
$ git clone https://github.com/briljant/briljant.git
$ git clone https://github.com/briljant/mimir.git
$ cd briljant
$ mvn clean install -DskipTests=true -Dgpg.skip=true
$ cd ../mimir
$ mvn clean install -DskipTests=true -Dgpg.skip=true
$ cd ..
$ curl -LO http://people.dsv.su.se/~isak-kar/grsf/mts_code.zip
$ curl -LO http://people.dsv.su.se/~isak-kar/grsf/mts_data.zip
$ unzip mts_code.zip
$ unzip mts_data.zip
$ cd code
... edit mts-test.groovy to point to the correct dataset folder ....
$ groovy mts-test.groovy

1.3 Ultra-Fast Shapelets

Since the authors of Ultra-Fast Shapelets for Time Series Classification does not make an implementation available, we have opted to implement our own version which can be found in the GitHub repository (the file UltraFastShapeletTransform.java).

To run an experiment, the following (Groovy) code can be used:

def shapelets = 100

// define the transformation
def transformation = new UltraFastShapeletTransform(shapelets, 10000)

// we will here use the random forest algorithm
def rfc = new RandomForest.Configurator(500)
rfc.maximumFeatures = (int) (Math.sqrt(shapelets) + 1)
def rf = rfc.configure()

// split the dataset into training and testing
def part = new SplitPartitioner(0.33)

// for each split (e.g., if we opt for the FoldPartitioner)
for (def p : part.partition(x, y)) {

   // Fit the transformation using the training data
   def transformer = transformation.fit(p.trainingData)

   // transform the training data using the transformer and
   // fit a random forest
   def m = rf.fit(transformer.transform(p.trainingData), p.trainingTarget)

   // transform the validation data using the same transformer
   def testX = transformer.transform(p.validationData)

   // estimate the class probabilities and predict the class labels
   def estimates = m.estimate(testX)
   def predicted = m.predict(testX)

   // compute some common classifier measurs (e.g., accuracy and AUC)
   def cm = new ClassifierMeasure(predicted, p.validationTarget, estimates, m.classes)
   println "${cm.accuracy},${cm.areaUnderRocCurve},${cm.brierScore}"
}

1.4 Contact

If you have any questions please contact isak-kar@dsv.su.se.