I have written a program which basically reads a file, processes its records to extract some data using a third party library and then dispatches the data to a remote server.
To speed up the process, I am creating Callable tasks that takes extracted data, and performs data dispatch step, and finally using executor service to execute tasks. Third party library I am using seems to not work well in multi threaded environment, so I have not put the data extraction step in Callable task.
Pseudo-code is as:
Iterator<Record> records = .....
List<Records> batch = ....
Data extractedData = ....
List<MyTask> tasks = ....
while(iterator.hasNext()) {
Record record = iterator.next();
batch.add(record);
extractedData.add(extractDataUsing3rdPartyLibrary(record));
if(batch.size == BATCH_SIZE) {
MyTask task = new MyTask (extractedData,....);
tasks.add(task);
extractedData.clear();
}
}
executeTasks(execuytorService, tasks);
....
....
MyTask implements Callable<Integer> {
public Integer call() throws Exception {
// dispatch extractedData
// clear extractedData;
}
}
But problem is that data extracted is memory heavy and as a result I am frequently facing out-of-memory issues.
I am thinking of an approach wherein I would check size of tasks intermittently, and if it exceeds certain threshold, I would process tasks created so far, followed by data clean up, and repeating the process.
I want to know if this is a good approach? And if yes, then what would be a good way of finding in-memory size of objects in java, as C++ like sizeof method is not available? Instrumentation API is something I came across, but it needs setting up agents.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745581532a4634282.html
评论列表(0条)