[ Advice needed! Cluster programs for Java? ]

I am looking for a cluster program which I can compute Java job in parallel. I looked at Rockscluster and Hadoop. The problem of using Rockscluster was that it required scripts in the Unix to run computation in parallel. However, what I wanted to do is sending jobs to workers in Java itself so that workers compute them and return the values. It is because my jobs are decided by many different users and you cannot write script before running the jobs. Also, the problem of using Hadoop was that it uses Map-reduce tool but I think my Java job is not benefited from Map-reduce scheme.

What I want is simple. I want to send jobs to workers(other computer nodes) and receive results. All my jobs sent to workers will be independent (so I don't have to worry about dependency btw jobs. simple jobs). Also, I want to implement those parallelization in Java itself. When I send multiple jobs to the scheduler, I hope scheduler sets queue and automatically send some jobs to available nodes and return the results to users. (I do not need fancy function like selecting nodes by myself to send jobs...)

For a better explanation, let me give an example below. Let's say there is a user1 who is working on Java. He is doing three computations in main() in his computer. Below is his code.

`public class Multiplecal {

public static void main(String[] args){
    Multiplecal calobj= new Multiplecal();
    int result1, result2, result3;
    result1=calobj.addtwo(5);
    result2=calobj.addthree(6);
    result3=calobj.addfour(7);
}
public int addtwo(int n){
    return (n+(n-1));
}
public int addthree(int n){
    return (n+(n-1)+(n-2));
}
public int addfour(int n){
    return (n+(n-1)+(n-2)+(n-3));
}

}`

However, user1 wants to get result1, result2, result3 by using some cluster program. If there exists an API called service then his main() code might look like below.

import service.*;

`public class Multiplecal {

public static void main(String[] args){
    Multiplecal calobj= new Multiplecal();
    int result1, result2, result3;
    result1=service.send("Multiplecal", "addtwo", 5);
    result2=service.send("Multiplecal", "addthree", 6);
    result3=service.send("Multiplecal", "addfour", 7);
}

.... }`

Service API will send each (classname, methodname and input parameters) to parallel program manager. Then parallel program manager distribute these jobs to nodes (workers). Since workers already have Multiplecal class, they can obtain results by matching classes and methods sent from service API. When workers finish their work, they return results back to user1.

What I mentioned above is just big picture of what I am trying to do. Parameter format for the parallelization do not have to be like above.If you know a nice software which sets cluster and parallelize Java jobs, please give me your valuable advice.

Thanks

Answer 1


There are a number of grid computing / cluster middleware out there. You might want to take a look at JPPF (master/worker), GridGain(map/reduce) or HTCondor