Google summer of code 2010: OpenMRS: May 2010

I made pilot choice of caching framework, it was ehcache.

Few words about it.

Configuration via a xml-based file. There CacheManager, Cache and Element is center objects. When CacheManager is created, it uses any xml-based file for it`s configuration. There are possible several CacheManagers via several different configuration files.

Every CacheManager has one or several caches. Each cache has a lot of parameters like memoryEvictionPolicy, TTL for objects and so on. Some of them can be changed programmatically.

It is not a problem how to use some instrument like ehcache there is a problem how to choose what to cache, what keys use, how much memory use for caching and so on. Let`s try to analyze some methods of logic module.

On the basis of this I found out several methods which were candidates for caching. They are:

LogicService.parse(String);
LogicContext.read(Patient, LogicDataSource, LogicCriteria);
LogicContext.eval(Patient, LogicCriteria, Map);

LogicService.parse(String criteria) {

Element element = ehcache.get(criteria);

if(null != element) {

return (LogicCriteria) element.getValue();

}

// Create a scanner that reads from the input stream passed to us

// Create a parser that reads from the scanner

// Start parsing at the compilationUnit rule

// Create Abstract Syntax Tree

// Create LogicCriteria

element = new Element(criteria, logicCriteria);

ehcache.put(element);

return logicCriteria;

}

Here all is simple. Key is criteria string which we have to parse, value is logicCriteria. Only one question for this method is about using disk store or not. I think tests will answer it.

public Result read(Patient patient, LogicDataSource dataSource, LogicCriteria criteria) throws LogicException {

// trying to get result from cache

// if not cached, dataSource.read(...)
// put result to cache

// return result
}

Here we can cache dataSource.read method`s result. This method uses hibernate criteria to read data from database. Result of reading database is Map where int means patient`s id and Result is related to patient result.
For the cache value is Map resultMap, key is SomeComplexKey which consists of several objects.

public class SomeComplexKey implements Serializable {
   private LogicCriteria criteria;
   private LogicDataSource dataSource;
   private Date indexDate;
   ...
}

So for the next patients and the same criteria, indexDate and dataSource we will get cache hit.

3:
public Result eval(Patient patient, LogicCriteria criteria, Map parameters) throws LogicException {
//trying to get result from cache

//if not cached:
//getRule
//for every patient in the cohort of patients:
// rule.eval which can call dataSource.read
// put result into Map where key is patient`s id

//put resultMap into the cache

//return result for certain patient
}

This time key is SomeAnotherComplexKey and value Map resultMap where stored result for a cohort of patients. SomeAnotherComplexKey can be:

public class SomeAnotherComplexKey implements Serializable {

private Map parameters;

private LogicCriteria criteria;

private LogicDataSource dataSource;

private Date indexDate;

...

}

LogicService.eval can be called for a single patient and a cohort of patients. Let`s go through these two scenarios:
1) When LogicService.eval is called for a cohort of patients.
   LogicService.eval: logicContext is created, cohort is put into the logicContext. For every patient is this cohort LogicContext.eval is called with the same parameters except Patient parameter.
   After the first call of LogicContext.eval resultMap for all cohort is cached, result of logicContext.read is cached too. We will get cache hit for every next patient (LogicContext.eval call) in this cohort. That`s cool!

2) When LogicService.eval is called for a single patient.
   LogicService.eval: logicContext is created, patient is put into the logicContext, cohort with only this one patient is created inside this logicContext.
   LogicContext.eval is called only once.

   If we call LogicContext.eval for one patient or a cohort with (parameters1, criteria1, dataSource1, inexDate1) and then call it for another patient or cohort with the same parameters we will get cache miss and lose cached result for previous the patient or the cohort. That`s why key must contain some information related to patients.
   Each time logicContext has cohort with several patients or one patient, we can use patient`s ids . On basis of this we may include some array of patient`s identifiers into the key.

  Also LogicContext.eval can be called with indexDate as now and past time. F.e. time1 was created as indexDate=new Date(), so time1 contains time even with milliseconds, that`s why we will not get cache hit next time for past time in future. That`s why we need to store key`s indexDate only with year,moth,date may be time in hours.

  Then LogicCacheComplexKey may be next class:

public class SomeAnotherComplexKey implements Serializable {

private Map parameters;

private LogicCriteria criteria;

private LogicDataSource dataSource;

private Date indexDate; // f.e. 5.30.2010 16:00:00-000

private Set membersIds;

...

}

The next step if to define architecture of caching service which have to be full featured for logic module and cache framework independent, but this in the next article.

Choosing a java caching framework. Searching through the Internet I found several widely-used caching frameworks. I did not consider light weight caching frameworks but only general purposes caching frameworks. Also I looked to the releases page to see how projects are maintained.

I found spring-modules project, which goal is to extend the spring framework, facilitate integration between Spring and other projects.

Requirements:

Memory cache
Disk cache
Expiration time for cached objects
Good maintained

So I decided to do quick overview of the next caching frameworks met my requirements:

JBoss cache
Apache JCS
Ehcache

All these caches support memory cache, disk cache, remote cache etc., have a lot of different features and any of theme can be chosen.

JBoss cache is a replicated and transactional cache. It has two versions: Core and POJO. Core edition is basis of the library. POJO edition is built atop the Core library, also extends it.

Transactional model make us not to worry about in-memory locks on updating cache.

Supports cache listeners for handling cache events. It is thread safe. Has six eviction policies (LRU - east recently used; LFU - least frequently used; MRU - most recently used; FIFO - first-in-first-out queue; ExpirationPolicy; ElementSizePolicy).

Apache JCS is a distributed caching system. As they write it is most useful for high read, low put applications. It operates with cache regions. Each region can be configured as a different cache (memory, disk, lateral, remote) with different algorithms and has own configuration.

It is not transactional. They wrote on their web site jcs is faster than ehcache but this is not the most important thing I think. Has only LRU eviction policy out of the box, but there plugging with another policies. Has own LRUMap implementation which, as they say, is faster than LRUMap and LinkedHashMap that is provided with JDK.

Ehcache is widely used caching framework. At it`s beginning it based on the jcs. It is fast, simple and has minimal dependencies. Since version 2.0 it supports transactions. It is actively developing and has a lot of features. It is inherently not thread safe to modify the value. It is safer to retrieve a value, delete the cache element and then reinsert the value. Expiration policy is only for disk cache. Memory cache supports only LRU eviction policy.

Let`s discuss what to choose.

Google summer of code 2010: OpenMRS

Sunday, May 30, 2010

Sunday, May 16, 2010

Quick overview of java caching frameworks. Pilot choice for the OpenMRS project.

About Me

Followers

Blog Archive