Google summer of code 2010: OpenMRS: 2010

Thursday, August 5, 2010

"Soft" pencil down of coding. Begin to write documentaion

Logic caching system is at state of soft pencil down. This means I expect only commits with a javadoc.

My next step is writing documentation and adding it to the openmrs`s wiki.

The documentation will include structure of the caching system, description in details of classes and interfaces user needs to use caching system, code samples, links to different resources.

If some else documentation is needed please add a comment.

I have only one opened issue to add patch of openmrs-core, because of some small changes like adding Serializable to some org.openmrs.logoc.* classes and change ehcache library to the latest version. I plan to create ticket for this.

Tuesday, July 13, 2010

Web page for configuring logic cache

Hi!
   Last week I developed web page for configuring logic cache. Let`s look a chain of requirements to understand what and why is done.
   LogicCacheManager may have a lot of caches, each cache has it`s configuration and some features like flush cache data. If I can change cache`s configuration I want after application restart to have the same configuration, I say about storing/restoring settings.
   LogicCache is only layer between caching framework and some of them doesn`t allow to change some of their setting at runtime, so that`s why we need ability to "restart" cache to activate option. Restarting of a cache is simply recreating it, this mean I programmatically dispose and the create cache with the same name.

So, the requirements are:
1) provide access to cache`s settings through the web page;
2) make "restart" feature for current caching framework;
3) store/restore settings.

I think we two web pages: the 1st with list of all caches and the 2nd with current cache settings.

below you can see such pages.
page with all caches we use in the project:

page with certain logic cache:

That`s all.

Ah, no, I have something else. Temporary to test cache I added to the logic module tester page ability to run tests for cohort of patients:

and result page:

If we needn`t this I change these pages back.

OK, now that`s all.

Friday, June 25, 2010

Define Key for LogicCache

Hi, let`s speak about what key for LogicCache. Ok, not for LogicCache, for any cache.
Cache like a Map, there is key and related value. When we ask cache for a value we write the next code:
cache.get(key);
and there is a comparison like keyInsideCache.equals(key) so key`s class must override equals and hashCode.
A key may be stored to a disk store in the serialized form, so key`s class and all key`s properties`s classes must implement the java.io.Serializable interface.
Any key must contain a set of properties which influence to the result and make key unique.

In accordance to the requirements above LogicCacheKey was designed. Here is the class "diagram" :)

LogicCacheKey contains several properties:

patientId - marks result for certain patient;
indexDate - date, LogicContext had when result was cached. We can run LogicService.eval for the past time and now. There is method updateTime which set to zero milli-/seconds, minutes, hours because of when we put "past time" we will write dd/MM/yyyy, and no HH:mm:ss.
dataSource - only contains canonical name of the class of the LogicDataSource implementation. We don`t need any information dataSource contains.
criteria - LogicCriteria which is parsed from string representation of Rule we run eval method for. I think about how to make it string which will reprpesent LogicCriteria, like dataSource.
parameters - some additional parameters. If this map will contain some non-serializable objects will only will not have this cache-entry stored to disk store.

Monday, June 21, 2010

Writing wrapper of caching frameworks, part2

Hi this post is about the structure of the Caching System for the Logic Module. I`ve already presented my structure, listened to question/comments and something changed it.

The main goal of this caching system is to wrap a caching framework and provide the interface for required features for us. Rest upon previous post where I made a review of simplified structure of two caching frameworks I designed a structure of the Caching System for the Logic Service.

Here is it:

LogicCacheManager - utilitny class, provides a cache by its name, keeps Map of caches as well as a link to LogicCacheProvider.

LogicCacheProvider - class hides a control of the CacheManager of the certain caching framework. It is an abstract class. Contains a list of all created logicCaches.

LogicCache - the interface that is necessary to implement, which hides a cache of the certain caching framework, provides all the necessary methods: get, put, getSize etc. Some methods may throw IllegalOperationException, this means cachig framework behind it doesn`t support such features. Also it has getFeature method which take argument LogicCache.Feature type which is internal enumeration. Using "boolean getFeature(Feature)" we can ask our logicCache about what features are supported.

LogicCacheConfig - interface, provides access to manage some parameters of the cache at the runtime. For example: max size of the cache in the memory, default ttl for objects in cache etc. Like LogicCache it has internal enumeration LogicCacheConfig.Features and getFeature method to ask about what methods are supported.

Next post I`ll write about LogicCacheKey, serialization, how to test it.

Monday, June 14, 2010

Writing wrapper of caching frameworks, part1

My presentation was last week and I told about the design of the caching system for the logic service. As you know I`ll use a widely used caching framework which helps me to cover a bigger part of the requirements such:

provide an inmemory cache
provide a disk cache
expiration time for cache objects and so on.

Also one of the tasks was to design an abstraction layer for a caching framework to have opportunity to change it. We needn`t all of features of any caching framework (ehcache, jboss cache, jcs ...) and we will use only framework`s cache to get/put objects, cache manager to create caches and configuration objects to change some run time properties of the cache.

I tried to use ehcache, then I looked through the tutorial of the jboss-cache and jcs. I understood they similar in their using. All of them have configuration files to configure some central object in the meaning of the cache manager, using it we can create caches and change their configurations.

Below I provided simplified structures of ehcache and jboss-cache.

EhCache:

Using an ehcache.xml file we can create a CacheManager then we can get/create caches using the CacheManager. That`s all! we can put/get objects into/from a cache. Each cache has it`s configuration and some properties may be changed at run time.

JBoss-cache:

Using a CacheFactory and some config.xml file we can create a Cache. This cache is like a CacheManager. It has at least one ROOT node and may have a lot of them each node we can use like a cache where we can put/get objects.

There is a similar situation with jcs.

Next post I`ll write about my design to wrap caching framework.

Sunday, May 30, 2010

I made pilot choice of caching framework, it was ehcache.

Few words about it.

Configuration via a xml-based file. There CacheManager, Cache and Element is center objects. When CacheManager is created, it uses any xml-based file for it`s configuration. There are possible several CacheManagers via several different configuration files.

Every CacheManager has one or several caches. Each cache has a lot of parameters like memoryEvictionPolicy, TTL for objects and so on. Some of them can be changed programmatically.

It is not a problem how to use some instrument like ehcache there is a problem how to choose what to cache, what keys use, how much memory use for caching and so on. Let`s try to analyze some methods of logic module.

On the basis of this I found out several methods which were candidates for caching. They are:

LogicService.parse(String);
LogicContext.read(Patient, LogicDataSource, LogicCriteria);
LogicContext.eval(Patient, LogicCriteria, Map);

LogicService.parse(String criteria) {

Element element = ehcache.get(criteria);

if(null != element) {

return (LogicCriteria) element.getValue();

}

// Create a scanner that reads from the input stream passed to us

// Create a parser that reads from the scanner

// Start parsing at the compilationUnit rule

// Create Abstract Syntax Tree

// Create LogicCriteria

element = new Element(criteria, logicCriteria);

ehcache.put(element);

return logicCriteria;

}

Here all is simple. Key is criteria string which we have to parse, value is logicCriteria. Only one question for this method is about using disk store or not. I think tests will answer it.

public Result read(Patient patient, LogicDataSource dataSource, LogicCriteria criteria) throws LogicException {

// trying to get result from cache

// if not cached, dataSource.read(...)
// put result to cache

// return result
}

Here we can cache dataSource.read method`s result. This method uses hibernate criteria to read data from database. Result of reading database is Map where int means patient`s id and Result is related to patient result.
For the cache value is Map resultMap, key is SomeComplexKey which consists of several objects.

public class SomeComplexKey implements Serializable {
   private LogicCriteria criteria;
   private LogicDataSource dataSource;
   private Date indexDate;
   ...
}

So for the next patients and the same criteria, indexDate and dataSource we will get cache hit.

3:
public Result eval(Patient patient, LogicCriteria criteria, Map parameters) throws LogicException {
//trying to get result from cache

//if not cached:
//getRule
//for every patient in the cohort of patients:
// rule.eval which can call dataSource.read
// put result into Map where key is patient`s id

//put resultMap into the cache

//return result for certain patient
}

This time key is SomeAnotherComplexKey and value Map resultMap where stored result for a cohort of patients. SomeAnotherComplexKey can be:

public class SomeAnotherComplexKey implements Serializable {

private Map parameters;

private LogicCriteria criteria;

private LogicDataSource dataSource;

private Date indexDate;

...

}

LogicService.eval can be called for a single patient and a cohort of patients. Let`s go through these two scenarios:
1) When LogicService.eval is called for a cohort of patients.
   LogicService.eval: logicContext is created, cohort is put into the logicContext. For every patient is this cohort LogicContext.eval is called with the same parameters except Patient parameter.
   After the first call of LogicContext.eval resultMap for all cohort is cached, result of logicContext.read is cached too. We will get cache hit for every next patient (LogicContext.eval call) in this cohort. That`s cool!

2) When LogicService.eval is called for a single patient.
   LogicService.eval: logicContext is created, patient is put into the logicContext, cohort with only this one patient is created inside this logicContext.
   LogicContext.eval is called only once.

   If we call LogicContext.eval for one patient or a cohort with (parameters1, criteria1, dataSource1, inexDate1) and then call it for another patient or cohort with the same parameters we will get cache miss and lose cached result for previous the patient or the cohort. That`s why key must contain some information related to patients.
   Each time logicContext has cohort with several patients or one patient, we can use patient`s ids . On basis of this we may include some array of patient`s identifiers into the key.

  Also LogicContext.eval can be called with indexDate as now and past time. F.e. time1 was created as indexDate=new Date(), so time1 contains time even with milliseconds, that`s why we will not get cache hit next time for past time in future. That`s why we need to store key`s indexDate only with year,moth,date may be time in hours.

  Then LogicCacheComplexKey may be next class:

public class SomeAnotherComplexKey implements Serializable {

private Map parameters;

private LogicCriteria criteria;

private LogicDataSource dataSource;

private Date indexDate; // f.e. 5.30.2010 16:00:00-000

private Set membersIds;

...

}

The next step if to define architecture of caching service which have to be full featured for logic module and cache framework independent, but this in the next article.

Sunday, May 16, 2010

Quick overview of java caching frameworks. Pilot choice for the OpenMRS project.

Choosing a java caching framework. Searching through the Internet I found several widely-used caching frameworks. I did not consider light weight caching frameworks but only general purposes caching frameworks. Also I looked to the releases page to see how projects are maintained.

I found spring-modules project, which goal is to extend the spring framework, facilitate integration between Spring and other projects.

Requirements:

Memory cache
Disk cache
Expiration time for cached objects
Good maintained

So I decided to do quick overview of the next caching frameworks met my requirements:

JBoss cache
Apache JCS
Ehcache

All these caches support memory cache, disk cache, remote cache etc., have a lot of different features and any of theme can be chosen.

JBoss cache is a replicated and transactional cache. It has two versions: Core and POJO. Core edition is basis of the library. POJO edition is built atop the Core library, also extends it.

Transactional model make us not to worry about in-memory locks on updating cache.

Supports cache listeners for handling cache events. It is thread safe. Has six eviction policies (LRU - east recently used; LFU - least frequently used; MRU - most recently used; FIFO - first-in-first-out queue; ExpirationPolicy; ElementSizePolicy).

Apache JCS is a distributed caching system. As they write it is most useful for high read, low put applications. It operates with cache regions. Each region can be configured as a different cache (memory, disk, lateral, remote) with different algorithms and has own configuration.

It is not transactional. They wrote on their web site jcs is faster than ehcache but this is not the most important thing I think. Has only LRU eviction policy out of the box, but there plugging with another policies. Has own LRUMap implementation which, as they say, is faster than LRUMap and LinkedHashMap that is provided with JDK.

Ehcache is widely used caching framework. At it`s beginning it based on the jcs. It is fast, simple and has minimal dependencies. Since version 2.0 it supports transactions. It is actively developing and has a lot of features. It is inherently not thread safe to modify the value. It is safer to retrieve a value, delete the cache element and then reinsert the value. Expiration policy is only for disk cache. Memory cache supports only LRU eviction policy.

Let`s discuss what to choose.

Tuesday, April 27, 2010

gsoc 2010 first steps.

Hi, everybody!

This is my first blog, first blog entry. I`d like to tell you about beginnig of my open source life :)

  Today is the 2010th year, this year is special for me.
  A guy from university told me how he worked about an open source project during a summer he made an interesting project and it was interesting for me, because I only worked about commercial projects. So, I wanted to know how open source projects "live", where I can find interesting for me project. Then he said that I can work like he worked, he showed site of the Google Summer of Code (gsoc in future), list of projects of previous years. I knew about gsoc and I liked this idea!

  I began to find organizations interested for me, I found a lot interesting open source projects. I tried several communities but projects were not simple, so I decided to choose only one. All projects were interesting and may be I chose not a project but community. The most excited and funny was the OpenMRS project`s community.

  I did not do anything special I just followed advices they wrote on their website and gsoc`s FAQ. I read developer`s guide, found their trac with introductory tickets and began to work. And it`s great, because I began to understand what is open source!

I was accepted to the gsoc this year. I was so happy to know this, but today I`m not worrying so much now I see there are a lot of work, interesting work and I like this!

Google summer of code 2010: OpenMRS