The Artima Developer Community
Sponsored Link

Artima Developer Spotlight Forum
Dhananjay Nene on Cache Design

3 replies on 1 page. Most recent reply: Aug 16, 2008 9:48 AM by Will Pierce

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 3 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Dhananjay Nene on Cache Design Posted: Aug 7, 2008 8:58 PM
Reply to this message Reply
Advertisement

Writing that:

The most difficult part of caching is not necessarily learning or selecting the caching tool but the detailed analysis of the application, its architecture and its data flow, that actually precedes the caching tool selection...

Dhananjay Nene provides a methodical review of the most important design considerations when introducing caching into an application. In Factors influencing Cache Design, Nene says that understanding such principles will make it easier to select the right tools, techniques, and even languages, for highly scalable applications.

In the blog post, Nene points out that the first thing to clarify about the application is the desired scope of caching:

Various data elements are scoped differently. As an example, highly transient ... data is often transactional scoped, whereas application configuration is application scoped. It is important to understand how the various data elements are scoped...

Cache speeds depend substantially ... on the scope of data. Distributed caches such as memcached (required for application scoped data) may run much faster than direct databases but they run 100s of time slower than in process caches... which can be used for request, thread, transaction scoped data and also for session scoped data when session affinity without session failover is being maintained... Your cache needs to ensure that it does not mix up the scopes and end up returning incorrect data or end up corrupting the data..

Instead of prematurely focusing on tools and APIs, the analysis of caching strategies should take into account the languages used, and even the environments an application will run in:

The language and the runtime environment can substantially influence and constrain the choices you have from a caching perspective. CGI based / shared nothing environments (perl / php ) may not be easily be able to support process level caches. Python with WSGI can support process level caches and can support multiple threads per process, however features such as the Global Interpreter Lock may make it difficult to run more than a handful threads per process and may not allow you to easily leverage multi core CPUs without breaking out into multiple processes. Java on the other hand makes it easy to run a large number of threads in one process but will require you to pay greater attention to synchronisation, and the cost of serialising Java objects over the network can be rather high compared to the speed of Java runtime processing.

Another very important consideration is the data needs of the application, especially as it relates the timelines of the data:

[In some transactional applications], you are going to have to worry about issues like the small time window between the database record getting committed and the cache getting updated / invalidated. The more liberal the consistency requirements (ie. the easier it is to show slightly old data), the easier it is going to be on you, your cache design, and your caches... In general if you need very high data currency, caching for a single process/single node is relatively easy but can get quite difficult as you move towards a multiple process / multiple nodes scenario.

What do you think are the most important considerations when designing a caching solution for an application?


Cameron Purdy

Posts: 186
Nickname: cpurdy
Registered: Dec, 2004

Re: Dhananjay Nene on Cache Design Posted: Aug 13, 2008 10:19 AM
Reply to this message Reply
Unfortunately, the article focuses on what the "state of the art" was about seven years ago. Distributed caching today provides a far more elegant set of possible solutions than just dirty caches with remote access and invalidation. Transactionally consistent read/write in-memory caching technology across scores of servers is the norm for high-scale TP systems being built in Java and .NET today, and the same technology is used for many of the types of web applications that this article was focused on.

Peace,

Cameron Purdy | Oracle
http://www.oracle.com/technology/products/coherence/index.html

Dhananjay Nene

Posts: 132
Nickname: dnene
Registered: Jan, 2008

Re: Dhananjay Nene on Cache Design Posted: Aug 13, 2008 12:40 PM
Reply to this message Reply
Thanks, updated the original post to reflect your comments.

Will Pierce

Posts: 2
Nickname: willp
Registered: Jul, 2008

Re: Dhananjay Nene on Cache Design Posted: Aug 16, 2008 9:48 AM
Reply to this message Reply
Excellent discussion about caching and its many-layered complexities. Thank you, Dhananjay, for your thoughts!

Your comment about caches not needing to always be coarse-grained resonates with me. Specifically: "Sometimes caching results of operations ... can offer dramatically superior performance." This is the same as the concept of "Memoizing" - or caching the results of a function call based on recording the parameters passed in to the function, and using a cache to service the same request rather than perform the calculation repeatedly.

Personally, I've found this to be amazingly powerful in the euqally amazingly boring area of time/date conversion functions. In particular, I had to deal with a somewhat unpredictable set of time/date strings, that usually were composed of about 50,000 specific instances, but only 50-200 unique values. I had to convert these strings to unix epochtimes, and noticed that my code was CPU-bound, performing the same dang mktime() conversion to epochtime over and over. By using the Memoize design pattern, and caching the results of a conversion from time/datestring->epochtime(integer), I eliminated 90% of the CPU time for this process, a factor of 10 improvement.

Of course, caching does this, which is why we use it so much. But what isn't really discussed much in the literature is how to apply caching methods and eviction strategies to some of these newer techniques, like Memoized caching (caching operations based on operands)- it's definitely more complex than the hardware CPU caching of instructions and such- applications add another dimension of complexity. In my example, I had to pick a cache eviction strategy that would keep the cache size bounded (to avoid gobbling up too much memory for the cache), but still handle the "width" (cardinality) of the current set of date/times-strings, to avoid cache-thrashing, i.e. having so low a cache-hit-rate that the cache becomes a liability, rather than a gain.

If nothing else, Memoize, or caching function calls based on their parameters (operations based on their operands) deserves much more attention in the software engineering community.

Again, thanks for such a detailed discussion of application level caching, a sorely neglected topic in s/w development, IMHO.

-W

Flat View: This topic has 3 replies on 1 page
Topic: Coverity Releases Software Readiness Manager for Java Previous Topic   Next Topic Topic: Things to Know About Super, Part II

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use