The Dynse (Dynamic Selection Based Drift Handler) is a Framework developed to deal with Concept Drift Problems by means of the Dynamic Classifiers Selection.
The Framework was built in Java using the MOA and Weka Frameworks.
If you are looking for the concept drift version of the PKLot benchmark extraction files and protocol, please follow this link.
The Dynse Framework is an open source project issued under the GNU General Public License 3.
The Project is hosted in the GitHub platform. You may download the project following one of the next three options:
Clone using a SSH-Key: git clone git@github.com:paulorla/dynse.git
Anonymous Clone: git clone https://github.com/paulorla/dynse.git
Download Zip: dynse.zip
The sizeofag-1.0.0.jar can be found here
Eclipse
Right click on your project on Eclipse -> Run As -> Run Configurations...
Go to the Arguments Tab
In the VM Arguments box, insert
-javaagent:[PATH_TO_SIZE_OF_FAG]/sizeofag-1.0.0.jar
Example
-javaagent:/home/user/.m2/repository/com/googlecode/sizeofag/sizeofag/1.0.0/sizeofag-1.0.0.jar
Command Line
If you are runing the project from command line (e.g. you built the project using the Apache maven), just execute the generated jar as follows:
java -javaagent:[PATH_TO_SIZE_OF_FAG]/sizeofag-1.0.0.jar -jar [DYNSE_JAR].jar
Example
java -javaagent:/home/user/.m2/repository/com/googlecode/sizeofag/sizeofag/1.0.0/sizeofag-1.0.0.jar -jar dynse.jarNebraska-Norm - The normalized version of the Nebraska Weather Dataset (the original version can be found here).
Checkerboard - The arff converted files from the checkerboard datasets (the original version can be found here).
Gaussian - The arff converted files from the checkerboard datasets (the original version can be found here).
Forest Covertype - The normalized arff of the Forest Covertype dataset (note: this dataset is hosted in the MOA repository).
Digit-Norm - The normalized version of the Digit Dataset (the original version can be found here).
Digit - The non-normalized version of the Digit Dataset (in this version the data were just put in the arff format).
Letters-Norm - The normalized version of the Letters Dataset (the original version can be found here).
Letters - The non-normalized version of the Letters Dataset (in this version the data were just put in the arff format).
The project contains a default factory to build a ready to use Dynse Configuration. To use the factory follow the example:
AbstractDynseFactory realConceptDriftFactory = new RealConceptDriftDynseFactory(); StreamDynse dynse = dynseFactory.createDefaultDynseKE(100);
The above example will create a configuration of the Dynse framework prepared for a Real Concept Drift, using a Knora-Eliminate as the Classification Engine, where each classifier is trained for every 100 supervised instances received. To create a configuration for virtual concept drifts, follow the example:
AbstractDynseFactory virtualConceptDriftFactory = new VirtualConceptDriftDynseFactory(); StreamDynse dynse = dynseFactory.createDefaultDynseKE(100);
In both scenarios, the created dynse object can be used as any MOA classifier.
The factories are prepared to build a dynse configuration using any of the implemented Classification Engines (OLA, KNORA-E, A Priori, etc.).
The default configuration is:
Nearest Neighbors: 9 for the KNORA-E (and a slack =2), and 5 for the other classification engines.
Accuracy Estimation Window Size: 4 x the number of train instances for real concept drifts, and 32 x the number of train instances for virtual concept drifts.
Base Classifier: The pool of classifiers is built using Naive Bayes classifiers.
Prunning: Only the latest 25 classifiers are kept in the pool.
The project include a serie of Testbeds using different datsets as examples.
The testbeds are available in the package br.ufpr.dynse.testbed
You will need to manually select the test you want in the method executeTests of your testbed, and you may need to download the necessary dataset and set the path to it when the dataset is not artificially generated by the moa (e.g. the Digit testbed)
If you do not want to use a default configuration, you can create your own custom configuration of the Dynse framework.
First of all, you will need to define the pruning engine. The original Dynse framework comes with 3 distinct implementations in the package br.ufpr.dynse.pruningengine: AgeBasedPruningEngine (Remove the oldest classifier), AccuracyBasedPruningEngine (Remove the worst performing classifier according to the current Accuracy Validation Window), and NoPrunePruningEngine (keep all classifiers in the pool). If you want to implement your own pruning engine check the Section Creating your own Pruning Engine
It is also necessary to define a classifiers factory, which is responsible to build the base classifiers that will be added to the pool. In the package br.ufpr.dynse.classifier.factory you can find a factory for Naive Bayes classifiers (NaiveBayesFactory) and to HoeffdingTree classifiers (HoeffdingTreeFactory). Check the Section Creating a Base Classifier Factory to verify how to build your own base classifier factory.
The next step is to instantiate a Classification Engine. The Dynse framework already have many classification engines implemented. Just choose one in the package br.ufpr.dynse.classificationengine. When instantiating a classification engine, you will also need to specify the number of neighbors considered to estimate the classifiers competence in the current accuracy estimation window. To create your own classification engine check the Section Creating your own Classification Engine.
Finally, you will need to instantiate a StreamDynse (our implementation of the Dynse Framerwork) passing the required information to the constructor, including the number of samples used to train each classifier, and the size of the accuracy estimation window in batches (note: the instances accumulated before creating a new classifier are considered a batch in our implementation).
See a complete example bellow:
import br.ufpr.dynse.classificationengine.IClassificationEngine; import br.ufpr.dynse.classificationengine.KnoraEliminateClassificationEngine; import br.ufpr.dynse.classifier.competence.IMultipleClassifiersCompetence; import br.ufpr.dynse.core.StreamDynse; import br.ufpr.dynse.pruningengine.AgeBasedPruningEngine; import br.ufpr.dynse.pruningengine.DynseClassifierPruningMetrics; import br.ufpr.dynse.pruningengine.IPruningEngine; private static final int POOL_SIZE = 50; private static final int TRAIN_SIZE = 200; private static final int ACC_WINDOW_SIZE = 2; //... public void myTestMethod(){ //AgeBasedPruningEngine - remove the oldest classifiers IPruningEngine<DynseClassifierPruningMetrics > prungingEngine = new AgeBasedPruningEngine(POOL_SIZE);//pool size set to 50 AbstractClassifierFactory classifierFactory = new NaiveBayesFactory();//naive bayes as the base learners IClassificationEngine <IMultipleClassifiersCompetence> classificationEngine = new KnoraEliminateClassificationEngine(7, 1);//K-E as the classification engine, 7 neighbors and slack variable = 1 StreamDynse dynse = new StreamDynse(classifierFactory, TRAIN_SIZE, ACC_WINDOW_SIZE, classificationEngine, prungingEngine);//train a new classifier for every 200 supervised samples //accuracy estimation window size set to 2 (i.e., 2x200 = 400 latest samples considered) //... }
The instantiated StreamDynse can be used as any MOA classifier.
Yoy may extend the Dynse framework and create your own classification engines, pruning engines, base classifiers factories, etc..
All you will need is to extend some classes and implement some interfaces, as explained in the next sections
Create a fork in the Git project, and share with us and other scientists your implementation! =)
In order to implement your own pruning engine, you will need to implement the IPruningEngine interface.
public interface IPruningEngine<T extends DynseClassifierPruningMetrics> { //returns the classifiers that must be removed from the pool public List<DynseClassifier<T>> pruneClassifiers(DynseClassifier<T> newClassifier, List<DynseClassifier<T>> currentPool, List<Instance> accuracyEstimationInstances) throws Exception; public void meassureClassifier(DynseClassifier<T> classifier) throws Exception; public void getPrunningEngineDescription(StringBuilder out); public void getPrunningEngineShortDescription(StringBuilder out); }
Your class must implement the IPruningEngine passing a template that extends DynseClassifierPruningMetrics, or the class DynseClassifierPruningMetrics itself. This class is necessary to define the metrics used in your pruning process (e.g., the classifier's age).
The pruneclassifiers method take the newest created classifier, the current pool of classifiers and the current accuracy estimation window. The method must return a list containing the classifiers that must be pruned (this list can be empty). This method should not alter the current pool.
The meassureClassifier takes a classifier (often it will be the newest classifier created by the Dynse) and measures it acccording to the DynseClassifierPruningMetrics (e.g., populate the creation time of the classifier).
The getPrunningEngineDescription just write a desription of the pruning engine in out.
The getPrunningEngineShortDescription just write a short desription of the pruning engine in out.
To see implementation examples, check the classes AgeBasedPruningEngine and AccuracyBasedPruningEngine available in the Dynse framework.
Note that it may be necessary to extend AbstractDynse in order to generate a Dynse version compatible with your pruning metrics if you are not using the original DynseClassifierPruningMetrics, or you may use some castings, which may be unsafe. This is necessary since the StreamDynse was implemented to deal only with DynseClassifierPruningMetrics due to the type erasure problem in Java.
You may use any base classifier in the Dynse Framework (e.g., SVM, Naive Bayes, KNN, etc.).
For now, we have a Naive Bayes (NaiveBayesFactory) and HoeffdingTree (HoeffdingTreeFactory) farctories implemented. If you want a factory for a different base classifier, just extend the AbstractClassifierFactory class. See next an example of the implementation of this class used in the NaiveBayesFactory:
public class NaiveBayesFactory extends AbstractClassifierFactory { private static final long serialVersionUID = 1L; @Override public Classifier createClassifier() throws Exception { NaiveBayes classifier = new NaiveBayes(); classifier.prepareForUse(); return classifier; } @Override public void getDescription(StringBuilder out) { out.append("Naive Bayes Factory"); } @Override public void getShortDescription(StringBuilder out) { out.append("NB"); } }
The most important overriden method is the createClassifier, which must create and prepareForUse a base classifier (note that this method does not train the classifier).
The methods getDescription and getShortDescription just append a full and a short description description of the factory, respectivelly.
You may find several classification engines pre-implemented in the package br.ufpr.dynse.classificationengine (OLA, KNORA-E, A Priori, etc.).
If you want to implement your own classification engine, you must implement the interface IClassificationEngine follow described:
public interface IClassificationEngine<U extends IMultipleClassifiersCompetence> extends Serializable{ public double[] classify(Instance instance, List<DynseClassifier<DynseClassifierPruningMetrics>> availableClassifiers, Map<Instance, U> competenceMappings, NearestNeighbourSearch nnSearch) throws Exception; public NearestNeighbourSearch createNeighborSearchMethod(); public boolean getMapOnlyCorrectClassifiers(); public List<DynseClassifier<DynseClassifierPruningMetrics>> getClassifiersUsedInLastClassification(); public void getClassificationEngineDescription(StringBuilder out); public void getClassificationEngineShortDescription(StringBuilder out); public void reset(); }
Your class must pass a template that extends the IMultipleClassifiersCompetence interface, or the IMultipleClassifiersCompetence itself.
The method classify take the instance that must be classified, a list containing all available classifiers in the pool, a competence mapping of the classifiers' competences and a nearest neighbors search object that can be used to find the k-nearest neighbors of the current instance. This method must return a double array representing the classification (i.e., the a posteriori probabilities) of the instance.
The method createNeighborSearchMethod must return a nearest neighbors search object. Usually it will be an instance of the moa.classifiers.lazy.neighboursearchLinearNNSearch class.
The method getMapOnlyCorrectClassifiers must return true if only the classifiers that correctly classify the intances in the current accuracy estimation window should be mapped in the competence mapping (e.g.,a KNORA-E based classificaiton engine), or return false if all classifiers should be mappped (e.g., a A Priori based classification engine).
The method getClassifiersUsedInLastClassification must return a list containing all classifiers used in the classification of the lastest test instance received.
The methods getClassificationEngineDescription and getClassificationEngineShortDescription just append a full and a short description description of the classification engine, respectivelly.
The method reset resets the classification engine to its initial state.
Question: I cannot change the location of my workspace in Eclipse
Answer: Check this link.
Question: The results generated in the current framework implementation are slightly different from the Publications
Answer: In our publications we used an older version of the MOA Framework (2014-11) and classifiers from the Weka 3-7-13. In the current implementation we use the MOA 2016-04 and its classifiers, which may generate generate a small difference in the predictions.
This difference is caused mainly (but not only) by the normalization of data, since the Weka classifiers and datasets used in previous experiments automatically normalized the information. If you normalize each batch before training/testing the results come closer to the published ones.
To overcome this, we define the Naive Bayes as the base learners in the default DynseFactory, since this base learner is less sensitive to the normalization of data. If you want to use Hoeffding Threes (or any other classifier as base learner), in the AbstractDynseFactory change the line:
public final AbstractClassifierFactory classifierFactory = new NaiveBayesFactory();
To
public final AbstractClassifierFactory classifierFactory = new HoeffdingTreeFactory();
Question: The Eclipse does not find the Main class
Answer: Just open the Main class in the Eclipse (package br.ufpr.dynse) and then Run the Project