TAGS :Viewed: 15 - Published at: a few seconds ago

[ Test and train datatasets with prediction models and TunePareto package ]

I'm trying to chose the best prediction/classification model for a concrete problem. The methodology I've been asked to follow is this:

  • Separate the data into test and training.
  • Run a concrete model with the training data and evaluate it with 10-fold cross-validation to get best parameters for that model (evaluating the error given).
  • Repeat with other models until I have the best configuration for each of the chosen classification models.
  • Finally, I have to re-run each model with their best set of parameters, where the data to train is "trainingdata" and the error given as result might come from "testdata" (please note that test data hasn't been used until now, to avoid distorsion on final comparison).

I've been doing this using the package TunePareto, which has a nice and easy to run function to run some classification models (like Naive Byes or kNN) along with 10-fold CV on a dataframe. The problem arrives with the last task I mentioned: I don't know how to use a specific dataframe as test with TunePareto. Can anybody help me with this?

I've searched for some examples but found nothing. In case TunePareto doesn't allow this I would be glad of listening about alternatives.

Thanks !!!

Answer 1


From the documentation, the function tuneParetoClassifier seems to have a parameter testDataName where you provide your independent testing dataset. Here is the definition of the function:

tuneParetoClassifier(name, classifier, classifierParamNames = NULL, predefinedClassifierParams = NULL, predictor = NULL, predictorParamNames = NULL, predefinedPredictorParams = NULL, useFormula = FALSE, formulaName = "formula", trainDataName = "x", trainLabelName = "y", testDataName = "newdata", modelName = "object", requiredPackages = NULL)