Google summer of code 2010: OpenMRS

I made pilot choice of caching framework, it was ehcache.

Few words about it.

Configuration via a xml-based file. There CacheManager, Cache and Element is center objects. When CacheManager is created, it uses any xml-based file for it`s configuration. There are possible several CacheManagers via several different configuration files.

Every CacheManager has one or several caches. Each cache has a lot of parameters like memoryEvictionPolicy, TTL for objects and so on. Some of them can be changed programmatically.

It is not a problem how to use some instrument like ehcache there is a problem how to choose what to cache, what keys use, how much memory use for caching and so on. Let`s try to analyze some methods of logic module.

On the basis of this I found out several methods which were candidates for caching. They are:

LogicService.parse(String);
LogicContext.read(Patient, LogicDataSource, LogicCriteria);
LogicContext.eval(Patient, LogicCriteria, Map);

LogicService.parse(String criteria) {

Element element = ehcache.get(criteria);

if(null != element) {

return (LogicCriteria) element.getValue();

}

// Create a scanner that reads from the input stream passed to us

// Create a parser that reads from the scanner

// Start parsing at the compilationUnit rule

// Create Abstract Syntax Tree

// Create LogicCriteria

element = new Element(criteria, logicCriteria);

ehcache.put(element);

return logicCriteria;

}

Here all is simple. Key is criteria string which we have to parse, value is logicCriteria. Only one question for this method is about using disk store or not. I think tests will answer it.

public Result read(Patient patient, LogicDataSource dataSource, LogicCriteria criteria) throws LogicException {

// trying to get result from cache

// if not cached, dataSource.read(...)
// put result to cache

// return result
}

Here we can cache dataSource.read method`s result. This method uses hibernate criteria to read data from database. Result of reading database is Map where int means patient`s id and Result is related to patient result.
For the cache value is Map resultMap, key is SomeComplexKey which consists of several objects.

public class SomeComplexKey implements Serializable {
   private LogicCriteria criteria;
   private LogicDataSource dataSource;
   private Date indexDate;
   ...
}

So for the next patients and the same criteria, indexDate and dataSource we will get cache hit.

3:
public Result eval(Patient patient, LogicCriteria criteria, Map parameters) throws LogicException {
//trying to get result from cache

//if not cached:
//getRule
//for every patient in the cohort of patients:
// rule.eval which can call dataSource.read
// put result into Map where key is patient`s id

//put resultMap into the cache

//return result for certain patient
}

This time key is SomeAnotherComplexKey and value Map resultMap where stored result for a cohort of patients. SomeAnotherComplexKey can be:

public class SomeAnotherComplexKey implements Serializable {

private Map parameters;

private LogicCriteria criteria;

private LogicDataSource dataSource;

private Date indexDate;

...

}

LogicService.eval can be called for a single patient and a cohort of patients. Let`s go through these two scenarios:
1) When LogicService.eval is called for a cohort of patients.
   LogicService.eval: logicContext is created, cohort is put into the logicContext. For every patient is this cohort LogicContext.eval is called with the same parameters except Patient parameter.
   After the first call of LogicContext.eval resultMap for all cohort is cached, result of logicContext.read is cached too. We will get cache hit for every next patient (LogicContext.eval call) in this cohort. That`s cool!

2) When LogicService.eval is called for a single patient.
   LogicService.eval: logicContext is created, patient is put into the logicContext, cohort with only this one patient is created inside this logicContext.
   LogicContext.eval is called only once.

   If we call LogicContext.eval for one patient or a cohort with (parameters1, criteria1, dataSource1, inexDate1) and then call it for another patient or cohort with the same parameters we will get cache miss and lose cached result for previous the patient or the cohort. That`s why key must contain some information related to patients.
   Each time logicContext has cohort with several patients or one patient, we can use patient`s ids . On basis of this we may include some array of patient`s identifiers into the key.

  Also LogicContext.eval can be called with indexDate as now and past time. F.e. time1 was created as indexDate=new Date(), so time1 contains time even with milliseconds, that`s why we will not get cache hit next time for past time in future. That`s why we need to store key`s indexDate only with year,moth,date may be time in hours.

  Then LogicCacheComplexKey may be next class:

public class SomeAnotherComplexKey implements Serializable {

private Map parameters;

private LogicCriteria criteria;

private LogicDataSource dataSource;

private Date indexDate; // f.e. 5.30.2010 16:00:00-000

private Set membersIds;

...

}

The next step if to define architecture of caching service which have to be full featured for logic module and cache framework independent, but this in the next article.

Google summer of code 2010: OpenMRS

Sunday, May 30, 2010

No comments:

Post a Comment

About Me

Followers

Blog Archive