Sunday, January 11, 2015

Machine Learning - Prediction

A summary of what have worked well for me in one classification task and one regression task.
In short, Random Forest is the winner in both tasks.

1. First task
  Description: cloud detection. Satellite pictures of a particular region in the Artic from 5 different angle are provided. One is then asked to classify each pixel whether there is cloud there or not. This is a classification problem, with expert labels. Three images each 340 by 340 pixel are provided. For each pixel, there are 5 raw inputs, each is the pixel intensity from a particular angle. There are also 3 provided engineered features.
X dimension: 100000 by 8
Y dimension: 100000 by 1

Example of an image from one angle:
And the expert label: where blue means cloud, red means no cloud, and green means not sure.

 b. Result: Random Forest works the best. The box plot the Area under Curve of each classification algorithm, from 200 different cross validation.

In term of running time:

2. Second Task:
  Description: The input is 1750 images, each of dimension 128 by 128 pixel. There is a human subject in the experiment. He is shown these images, and the fMRI signal for 20 different regions of his brain is recorded. In short, fMRI measures the activity (blood flow) of the brain. The task is to use the input, to predict the fMRI response. 
X dimension: 1750 by 16384
Y dimension: 20     by 16384
In our problem, the input X is transformed using Gabor wavelet. 
An example of images shown to subject

  Result: Random Forest and Gradient Boosting Machine are the best performing methods
Where Full means using full set of inputs. Part means we pick the best 5% of inputs, using Lasso (similar to using correlation ranking).

For ease of comparison, here is the result table:

Conclusion: Random Forest seems to work very well. Gradient Boosting Machine is also good. SVM and Neural Network did not work well for me.




No comments:

Post a Comment