Pages

Wednesday, December 1, 2010

Tuning Java applications on Google AppEngine

Why bother using AppEngine?
Introduction to AppEngine
AppEngine is the Platform as a Service (PaaS) offering from Google. It allows you to deploy and run Java and Python applications on Google's infrastructure. Java applications run in a sand boxed Servlet container that scales completely automatically. To deploy an application you just push a button in your IDE, without any worries about setting up and managing an application server.


AppEngine problems
From time to time you’ll find some very negative feedback about AppEngine. In most cases this is not because AppEngine is bad, but because applications where not designed to run on AppEngine. This article will give you some hints about how to design an application that works and performs well on AppEngine. 


Google AppEngine is one of the most interesting solutions for Java developers when looking at cloud platforms. It allows you to deploy applications within seconds and offers a broad range of services to make development easier. And it’s cheap; for small applications you don’t pay anything at all. This makes it the perfect platform to use for small scale applications that can’t be put on a dedicated server. The lack of good shared hosting solutions pushes most developers away from Java into the PHP world for this kind of small applications, but AppEngine solves that problem.
All this goodness comes at a price though. When you write applications just the same way you do for a dedicated server you’ll run into performance problems soon, even with the smallest application. That doesn’t mean AppEngine offers bad performance, but it’s architecture is so fundamentally different that you have to design your applications in a way that matches this architecture. When you write your applications specifically for AppEngine you’ll unleash it’s full power and it will be a great platform.


The two main problems
The problems that you’ll face when using AppEngine come in two flavors which will both be discussed in this article: 
  1. Instance startup times
  2. DataStore related problems
Application startup time
The first problem, instance startup times, is something you normally don’t care about. Most Java frameworks are designed to do as much processing at application startup time because you don’t restart applications often anyway. That’s very different when running on AppEngine. The core feature of AppEngine is it’s scalability. An application that gets a lot of load will automatically start extra virtual machines to spread the load over multiple machines. That’s not something that will happen only on massive loads; you’ll see instances starting very soon already. You’ll even run into this when your application doesn’t get any load at all. To not waste resources on applications that are not doing anything AppEngine will stop all instances for an application when it didn’t get any requests for about a minute. That means no matter what application you have you’ll have to deal with starting new instances. Each time an instance is started you deal with a cold startup of your application including loading and starting all frameworks you use. Every second counts now, because the user will not receive any response as long as the application is starting. 


Google announced that in AppEngine 1.4 there will be the possibility to pay for reserving instances and the availability of an API to “warm up” new instances. This solves part of the startup problem, but you’ll have to pay for it. This might be no problem for large applications, but is exactly the thing we were trying to avoid for smaller applications.


Performance gain 1 - Choose frameworks based on startup time
A lot of the frameworks that are used by a lot of Java developers add up to 25 seconds of startup time. Users will not wait for 25 seconds to see a web page. We’ll have to improve this. The most important step in this is to get rid of frameworks that take very long to startup and configure the framework you use to improve startup time. This means not every framework is a good fit for AppEngine. An unfortunate example of this is Grails. Although Grails is one of my favorite frameworks in other environments, it’s 20+ second startup time is simply unacceptable for AppEngine. So, would I advice to get rid of all frameworks and start using Servlets/JSP directly? Not really. That would set you back on productivity and code maintainability too much and it’s not necessary either. 


The two stacks I used a lot on AppEngine are the following:
  • Weld, JSF2 and JAX-RS (more or less a stripped down Java EE 6 Web Profile)
  • Spring 3.0 including Spring Web MVC
There are many good alternatives to these stacks, but always test on startup time first!
Spring still does offer significantly better startup performance after some tuning at this moment though. The Weld team is working hard on improving the startup time of Weld dramatically which will make it a perfect fit for AppEngine in the upcoming version.


Performance gain 2 - Get rid of JPA/JDO
AppEngine offers two APIs to work with the DataStore. Remember that the DataStore is not a relational database. Because of that both JPA and JDO loose some of their power. 
  • Relationship mappings are very limited.
  • Join queries are not supported
  • Polymorphic queries are not supported
  • Caching support works differently
What’s left is just basic mapping between Java classes to DataStore entities without the real power of JPA/JDO. We still have to deal with the complexity of the APIs however, and worse, with the overhead of the frameworks. Both JPA and JDO add seconds to the startup time of an application. This is bad, and because the frameworks can’t be used in their full potential it’s not really worth it. Instead we need something that more closely matches the possibilities of the DataStore. There’s a framework doing just that: Objectify. The framework uses JPA annotations to map your Java classes to Entities. The whole API to persist and query entities is completely different however and matches the low-level DataStore API much more closely. The programming model is a lot easier because the API doesn’t contain any features that the DataStore doesn’t support anyway. Even better; it only adds milliseconds to the application startup time.


Performance gain 3 - Don’t use classpath scanning
Whenever I use Spring I use annotations as much as possible to keep my XML configuration to a minimum. For declaring components I use @Controller/@Component instead of bean configuration in XML. This means that the framework must scan for annotated classes at startup however which adds some startup time. On AppEngine it’s always better to reduce scanning for classes. 
Another example is RestEasy. Normally I just let the framework scan for @Path annotations, but on AppEngine it’s better to use an explicit Application class instead. This are just two examples of frameworks I use a lot, but there are many different frameworks that give you this choice. 


Performance gain 4 - Use memcache
Caching is useful for most web applications, but AppEngine gives you a great infrastructure for it. On AppEngine you can use MemCache which is a highly scalable distributed cache.
From an API point of view MemCache is very similar to using a HashMap with methods such as put, get, delete and contains. Data in the cache can disappear any moment (it’s not persistent), but will normally live until it expires. The expiration time is something you specify when you put something in the cache. The general idea is to put as much data in MemCache as possible in a useful way. Most web applications are read-mostly, which means there are many more users reading data then writing data. 


Most people start by caching data from the DataStore. The DataStore is relatively slow (compared to a local RDMS) so that’s a quick win. Objectify even supports this declaratively with annotations. You can go a step further though. For RESTful Web Services it’s useful to place JSON strings in the cache. Converting an object graph to a JSON string costs time, so why would you do that over and over again if the data didn’t change? The same thing is true for pages. You could create a Servlet filter that simple returns a cached page (the HTML) instead of re-rendering a page with data that didn’t change.


DataStore usage
AppEngine’s DataStore is a non-relational, schemeless data store. Wait, let me repeat that: The DataStore is NOT relational. This is probably the most important thing to keep in mind while developing AppEngine applications. "No problem" you might say, "those NOSQL data stores are ultra scalable so who would ever bother about performance?" Yes, the DataStore is extremely scalable. It has to store data for a virtually infinite amount of applications that all store a virtually infinite amount of data. To be able to do that the DataStore must be distributed, so yes it's scalable. But that doesn't really go well together with traditional relational data.


Performance gain 5 - Join in-memory
Because the DataStore is so fundamentally different then a relational database you must work with it in a different way too. First of all, there are no joins. The DataStore is basically one very large table, where each row can have it’s own set of columns. If there is only one table, a join doesn’t make much sense. Of course you still need relations between entities in your application, so we have to come up with something for that. Lets take the following simple SQL query as an example:


select emp.name, dep.name FROM employee 
LEFT JOIN department ON department.id =  employee.dep_id 


A first naive approach on AppEngine could be:
  1. select all books
  2. iterate over books
  3. iterate over authorKeys for each book
  4. get author for each key
Objectify ofy = ObjectifyService.begin();
List<Book> books = ofy.query(Book.class).list();

StringBuilder sb = new StringBuilder();

for (Book book : books) {
    sb.append(book.getTitle());
    sb.append(": ");
    for (Key<Author> authorKey : book.getAuthorKeys()) {
        final Author author = ofy.get(authorKey);
        sb.append(author.getFirstname()).append(author.getLastname()).append(", ");
    }

    sb.append("<br>");
}



For each employee we simply just query again for the related department. Now we have a performance problem. If we have 500 employees, we would have 500 + 1 queries (the N + 1 problem). This approach wouldn’t perform on a relational database, and it doesn’t perform on AppEngine either. 
One approach I use a lot in this case is an “in-memory join”:
  1. select all authors
  2. build in-memory map of authors (key=authorId, value=author)
  3. iterate over books
  4. get author for book from in-memory list of authors
Objectify ofy = ObjectifyService.begin();
List<Book> books = ofy.query(Book.class).list();

StringBuilder sb = new StringBuilder();

final List<Author> authors = ofy.query(Author.class).list();
final Map<Long, Author> authorMap = new HashMap<Long, Author>();
for (Author author : authors) {
    authorMap.put(author.getId(), author);
}

for (Book book : books) {
    sb.append(book.getTitle());
    sb.append(": ");

    for (Key<Author> authorKey : book.getAuthorKeys()) {
        final Author author = authorMap.get(authorKey.getId());
        sb.append(author.getFirstname()).append(author.getLastname()).append(", ");
    }

    sb.append("<br>");
}


That seems like something very counter-initiative if you’re from the relational world. Why do something in code that the database can do for you? Well that’s the thing, the database can’t in this case. CPU cycles are relatively cheap on AppEngine, so that’s not really a bottleneck either. And the result can be cached in MemCache. Either the “joined” set of books/authors, or just the author table (e.g, if books change more often). 


This doesn’t work you would have millions of authors. You don’t want (and are impossible) to load millions of authors in memory just link 500 books to their department. In that case you can use a bulk get. This is a normal get operation, but with multiple id’s as arguments. Those objects will be loaded in one batch. The approach would be as follows:
  1. select all books
  2. build set of all required authors for all books
  3. batch get required authors
  4. iterate over books
  5. get author for book from in-memory list of authors
Objectify ofy = ObjectifyService.begin();
List<Book> books = ofy.query(Book.class).list();

StringBuilder sb = new StringBuilder();

Set<Key<Author>> authorKeys = new HashSet<Key<Author>>();
for (Book book : books) {
    authorKeys.addAll(book.getAuthorKeys());
}

final Map<Key<Author>, Author> authorMap = ofy.get(authorKeys);

for (Book book : books) {
    sb.append(book.getTitle());
    sb.append(": ");

    for (Key<Author> authorKey : book.getAuthorKeys()) {
        final Author author = authorMap.get(authorKey);
        sb.append(author.getFirstname()).append(author.getLastname()).append(", ");
    }

    sb.append("<br>");
}

In the graph below you can see the difference in performance is dramatic. For a dataset of 1000 books and 5 authors the first approach takes over 20 seconds, while the other approaches are around 200-300ms.
Performance gain 6 - De-normalize
In some cases you query two related entities so often that you would be better of by de-normalizing the data. In the example above we could get rid of all the extra code if we would just add a departmentName field to the employee entity. Is that a better approach? Well, it depends. It’s definitively faster, but you have the overhead of having to keep the two fields in sync somehow.


I hope this article helps in getting applications to run better on AppEngine. It's not hard at all, just different. And you'll get a great platform for it in return.

40 comments:

  1. Thx for this. You seem to be mixing book & employee examples--the post might be clearer if you stuck with one.

    ReplyDelete
  2. I see you use JSF2. myfaces or mojarra? Do you use a component library like richfaces or primefaces?

    Which jax-rs impl do you use?

    I read that Seam doesn't work on appengine. appengine doesn't support CDI so you use Weld?

    Thx.

    ReplyDelete
  3. For JSF2 I use Mojarra, I don't use component libraries a lot (I prefer plain jQuery in many cases) but I have good experiences with PrimeFaces on GAE. I didn't try other component libraries, but according to this post RichFaces 4 has GAE support: http://mkblog.exadel.com/2010/10/richfaces-4-m3-gae-support-new-richfaces-4-book/

    For clarity: CDI is just a specification. This specification is not supported out of the box by GAE because it's just a Servlet container. Weld is the reference implementation of CDI and works quite well on GAE because of it's Servlet support. When you're talking about Seam, I guess you mean Seam 2. With JSF 2 and CDI (Weld) you won't really need Seam any more, because they are the evolution of Seam back into the Java EE platform. Seam 3 is build on top of CDI and I guess that some of the modules will play well on GAE too. Take a look at http://seamframework.org/Seam3 for currently available modules.

    For JAX-RS I've used both RESTEasy and Jersey with success, both work well on GAE. RESTEasy has better CDI integration and slightly better startup time though on GAE, so I slightly prefer RESTEasy.

    ReplyDelete
  4. ok... your example about "DataStore usage" is good.

    but if I have an entity with 1001 rows?

    ReplyDelete
  5. Great article!. Could you provide some real numbers:
    - what's your minimal startup time you could achieve having decent framework for development?
    - 1000 authors and 5 authors take 300ms in the best case . How does it scale - what would it take to do the same with 1M books and 50k authors? 10M books...

    Thank you

    ReplyDelete
  6. Thanks for the feedback. I agree larger datasets would be a useful addition to the examples because there are some limitations in that area too. I will try to provide some examples and startup time numbers later this week.

    ReplyDelete
  7. I've also looked at this site: http://gaejava.appspot.com/

    It gives me up to 5seconds delay removing 100 records with JDO. Is that normal?

    Another thing that every time I run the test the result vary (especially JDO) case. Is that likely to be the startup time?

    ReplyDelete
  8. Just to give you a heads up, I'm working on a new article about dealing with large datasets on AppEngine. I've to spread testing over a few days because those kind of numbers eat up my daily quota very very quickly, but I'll publish within a few days. Very positive results so far!

    ReplyDelete
  9. One thing to remember when working with large datasets is that doing anything that makes your user wait results in a crappy user experience. It always boggles my mind when people talk about not being able to fetch large data sets into memory, or mutate large numbers of models or entity groups in 30 seconds. These aren't things you should be doing while the user is waiting for the next page anyway.

    Tasks took all of this stuff to the background, and that's where it should stay.

    ReplyDelete
  10. Wow!! Really a nice Article. Thank you so much for your efforts. Definitely, it will be helpful for others. I would like to follow your blog. Share more like this. Thanks Again.
    lg mobile service center in chennai
    lg mobile service center
    lg mobile service chennai
    lg mobile repair

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing.
    coolpad service center near me
    coolpad service
    coolpad service centres in chennai
    coolpad service center velachery

    ReplyDelete
  13. This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.
    apple service center chennai
    apple service center in chennai
    apple mobile service centre in chennai
    apple service center near me

    ReplyDelete
  14. This is the exact information I am been searching for, Thanks for sharing the required infos with the clear update and required points.
    oneplus service center chennai
    oneplus service center in chennai
    oneplus service centre chennai

    ReplyDelete
  15. Really very happy to say, your post is very interesting to read. I never stop myself to say something about it.You’re doing a great job. Keep it up...

    Become an Expert In DBA Training in Bangalore! The most trusted and trending Programming Language. Learn from experienced Trainers and get the knowledge to crack a coding interview, @Bangalore Training Academy Located in BTM Layout.

    ReplyDelete
  16. Good article! I found some useful educational information in your blog about Selenium, it was awesome to read, thanks for sharing this great content to my vision.
    Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

    ReplyDelete
  17. What a really awesome post this is. Truly, one of the best posts I've ever witnessed to see in my whole life. Wow, just keep it up.
    Business Analytics Training
    Business Analytics Course In Hyderabad

    ReplyDelete
  18. Very interesting blog Thank you for sharing such a nice and interesting blog and really very helpful article.

    Blue Prism Training in Bangalore

    Best Blue Prism Training Institutes in Bangalore

    ReplyDelete
  19. This post is really helpful for us. I certainly love this website, keep on it. Rajasthan Budget Tours

    ReplyDelete
  20. Saham perusahaan diterbitkan di atas kertas, memungkinkan investor untuk memperdagangkan saham bolak-balik dengan investor lain, tetapi bursa yang diatur tidak ada sampai pembentukan Bursa Efek London (LSE) pada tahun 1773. Meskipun sejumlah besar gejolak keuangan mengikuti pendirian segera dari LSE, perdagangan pertukaran secara keseluruhan berhasil bertahan dan berkembang sepanjang tahun 1800-an. cek juga markets dan Cara Investasi Saham Dengan Modal Kecil

    ReplyDelete
  21. Hi there! I just want to offer you a huge thumbs up for the great information you have here on this post. I’ll be coming back to your website for more soon.

    🌐야한동영상🌐

    ReplyDelete
  22. You’re so cool 오피헌터! I do not believe I’ve truly read through a single thing like this before. So good to find somebody with a few original thoughts on this issue. Really.. many thanks for starting this up. This site is one thing that is required on the internet, someone with a little originality.

    ReplyDelete
  23. There is visibly a bundle to know about this. I think you made certain good factors in features also. 횟수 무제한 출장

    ReplyDelete
  24. When I originally commented I clicked the 스포츠마사지 -Alert me when new remarks are added- checkbox as well as now each time a comment is added I get 4 emails with the very same remark. Is there any way you can eliminate me from that solution? Many thanks!

    ReplyDelete
  25. Nice site, nice and easy оn thе eyes ɑnd great content too.Feel free to visit my pаge : 토토

    ReplyDelete
  26. 토토사이트 Outstanding post, you have pointed out some great points, I besides think this is a very good website.Also visit my site;

    ReplyDelete
  27. 카지노사이트 Wow that was odd. I just wrote an incredibly long comment but after I clicked submit my comment didn’t show up.Grrrr… well I’m not writing all that over again. Regardless, just wanted to say excellent blog!My webpage

    ReplyDelete
  28. Hiya, I’m really glad I have found this information. Nowadays bloggers publish just about gossip and internet stuff and this is really irritating. A good web site with exciting content, this is what I ?need. Thanks for making this web site, and I will be visiting again. 카지노사이트

    ReplyDelete
  29. I think I have never seen such blogs ever before that has complete things with all details which I want. So kindly update this ever for us.
    full stack developer course with placement

    ReplyDelete
  30. There is clearly a lot to learn about this topic. I believe you've highlighted some important points and features as well.
    fullstacktrainingcenter

    ReplyDelete