Java, OSGi, Amdatu and a bit of JavaScript: Tuning Java applications on Google AppEngine

Wednesday, December 1, 2010

Tuning Java applications on Google AppEngine

Why bother using AppEngine?
Introduction to AppEngine
AppEngine is the Platform as a Service (PaaS) offering from Google. It allows you to deploy and run Java and Python applications on Google's infrastructure. Java applications run in a sand boxed Servlet container that scales completely automatically. To deploy an application you just push a button in your IDE, without any worries about setting up and managing an application server.

AppEngine problems
From time to time you’ll find some very negative feedback about AppEngine. In most cases this is not because AppEngine is bad, but because applications where not designed to run on AppEngine. This article will give you some hints about how to design an application that works and performs well on AppEngine.

Google AppEngine is one of the most interesting solutions for Java developers when looking at cloud platforms. It allows you to deploy applications within seconds and offers a broad range of services to make development easier. And it’s cheap; for small applications you don’t pay anything at all. This makes it the perfect platform to use for small scale applications that can’t be put on a dedicated server. The lack of good shared hosting solutions pushes most developers away from Java into the PHP world for this kind of small applications, but AppEngine solves that problem.
All this goodness comes at a price though. When you write applications just the same way you do for a dedicated server you’ll run into performance problems soon, even with the smallest application. That doesn’t mean AppEngine offers bad performance, but it’s architecture is so fundamentally different that you have to design your applications in a way that matches this architecture. When you write your applications specifically for AppEngine you’ll unleash it’s full power and it will be a great platform.

The two main problems
The problems that you’ll face when using AppEngine come in two flavors which will both be discussed in this article:

Instance startup times
DataStore related problems

Application startup time
The first problem, instance startup times, is something you normally don’t care about. Most Java frameworks are designed to do as much processing at application startup time because you don’t restart applications often anyway. That’s very different when running on AppEngine. The core feature of AppEngine is it’s scalability. An application that gets a lot of load will automatically start extra virtual machines to spread the load over multiple machines. That’s not something that will happen only on massive loads; you’ll see instances starting very soon already. You’ll even run into this when your application doesn’t get any load at all. To not waste resources on applications that are not doing anything AppEngine will stop all instances for an application when it didn’t get any requests for about a minute. That means no matter what application you have you’ll have to deal with starting new instances. Each time an instance is started you deal with a cold startup of your application including loading and starting all frameworks you use. Every second counts now, because the user will not receive any response as long as the application is starting.

Google announced that in AppEngine 1.4 there will be the possibility to pay for reserving instances and the availability of an API to “warm up” new instances. This solves part of the startup problem, but you’ll have to pay for it. This might be no problem for large applications, but is exactly the thing we were trying to avoid for smaller applications.

Performance gain 1 - Choose frameworks based on startup time
A lot of the frameworks that are used by a lot of Java developers add up to 25 seconds of startup time. Users will not wait for 25 seconds to see a web page. We’ll have to improve this. The most important step in this is to get rid of frameworks that take very long to startup and configure the framework you use to improve startup time. This means not every framework is a good fit for AppEngine. An unfortunate example of this is Grails. Although Grails is one of my favorite frameworks in other environments, it’s 20+ second startup time is simply unacceptable for AppEngine. So, would I advice to get rid of all frameworks and start using Servlets/JSP directly? Not really. That would set you back on productivity and code maintainability too much and it’s not necessary either.

The two stacks I used a lot on AppEngine are the following:

Weld, JSF2 and JAX-RS (more or less a stripped down Java EE 6 Web Profile)
Spring 3.0 including Spring Web MVC

There are many good alternatives to these stacks, but always test on startup time first!
Spring still does offer significantly better startup performance after some tuning at this moment though. The Weld team is working hard on improving the startup time of Weld dramatically which will make it a perfect fit for AppEngine in the upcoming version.

Performance gain 2 - Get rid of JPA/JDO
AppEngine offers two APIs to work with the DataStore. Remember that the DataStore is not a relational database. Because of that both JPA and JDO loose some of their power.

Relationship mappings are very limited.
Join queries are not supported
Polymorphic queries are not supported
Caching support works differently

What’s left is just basic mapping between Java classes to DataStore entities without the real power of JPA/JDO. We still have to deal with the complexity of the APIs however, and worse, with the overhead of the frameworks. Both JPA and JDO add seconds to the startup time of an application. This is bad, and because the frameworks can’t be used in their full potential it’s not really worth it. Instead we need something that more closely matches the possibilities of the DataStore. There’s a framework doing just that: Objectify. The framework uses JPA annotations to map your Java classes to Entities. The whole API to persist and query entities is completely different however and matches the low-level DataStore API much more closely. The programming model is a lot easier because the API doesn’t contain any features that the DataStore doesn’t support anyway. Even better; it only adds milliseconds to the application startup time.

Performance gain 3 - Don’t use classpath scanning
Whenever I use Spring I use annotations as much as possible to keep my XML configuration to a minimum. For declaring components I use @Controller/@Component instead of bean configuration in XML. This means that the framework must scan for annotated classes at startup however which adds some startup time. On AppEngine it’s always better to reduce scanning for classes.
Another example is RestEasy. Normally I just let the framework scan for @Path annotations, but on AppEngine it’s better to use an explicit Application class instead. This are just two examples of frameworks I use a lot, but there are many different frameworks that give you this choice.

Performance gain 4 - Use memcache
Caching is useful for most web applications, but AppEngine gives you a great infrastructure for it. On AppEngine you can use MemCache which is a highly scalable distributed cache.
From an API point of view MemCache is very similar to using a HashMap with methods such as put, get, delete and contains. Data in the cache can disappear any moment (it’s not persistent), but will normally live until it expires. The expiration time is something you specify when you put something in the cache. The general idea is to put as much data in MemCache as possible in a useful way. Most web applications are read-mostly, which means there are many more users reading data then writing data.

Most people start by caching data from the DataStore. The DataStore is relatively slow (compared to a local RDMS) so that’s a quick win. Objectify even supports this declaratively with annotations. You can go a step further though. For RESTful Web Services it’s useful to place JSON strings in the cache. Converting an object graph to a JSON string costs time, so why would you do that over and over again if the data didn’t change? The same thing is true for pages. You could create a Servlet filter that simple returns a cached page (the HTML) instead of re-rendering a page with data that didn’t change.

DataStore usage
AppEngine’s DataStore is a non-relational, schemeless data store. Wait, let me repeat that: The DataStore is NOT relational. This is probably the most important thing to keep in mind while developing AppEngine applications. "No problem" you might say, "those NOSQL data stores are ultra scalable so who would ever bother about performance?" Yes, the DataStore is extremely scalable. It has to store data for a virtually infinite amount of applications that all store a virtually infinite amount of data. To be able to do that the DataStore must be distributed, so yes it's scalable. But that doesn't really go well together with traditional relational data.

Performance gain 5 - Join in-memory
Because the DataStore is so fundamentally different then a relational database you must work with it in a different way too. First of all, there are no joins. The DataStore is basically one very large table, where each row can have it’s own set of columns. If there is only one table, a join doesn’t make much sense. Of course you still need relations between entities in your application, so we have to come up with something for that. Lets take the following simple SQL query as an example:

select emp.name, dep.name FROM employee
LEFT JOIN department ON department.id = employee.dep_id

A first naive approach on AppEngine could be:

select all books
iterate over books
iterate over authorKeys for each book
get author for each key

Objectify ofy = ObjectifyService.begin();
List<Book> books = ofy.query(Book.class).list();

StringBuilder sb = new StringBuilder();

for (Book book : books) {
    sb.append(book.getTitle());
    sb.append(": ");
    for (Key<Author> authorKey : book.getAuthorKeys()) {
        final Author author = ofy.get(authorKey);
        sb.append(author.getFirstname()).append(author.getLastname()).append(", ");
    }

    sb.append("<br>");
}

For each employee we simply just query again for the related department. Now we have a performance problem. If we have 500 employees, we would have 500 + 1 queries (the N + 1 problem). This approach wouldn’t perform on a relational database, and it doesn’t perform on AppEngine either.
One approach I use a lot in this case is an “in-memory join”:

select all authors
build in-memory map of authors (key=authorId, value=author)
iterate over books
get author for book from in-memory list of authors

Objectify ofy = ObjectifyService.begin();
List<Book> books = ofy.query(Book.class).list();

StringBuilder sb = new StringBuilder();

final List<Author> authors = ofy.query(Author.class).list();
final Map<Long, Author> authorMap = new HashMap<Long, Author>();
for (Author author : authors) {
    authorMap.put(author.getId(), author);
}

for (Book book : books) {
    sb.append(book.getTitle());
    sb.append(": ");

    for (Key<Author> authorKey : book.getAuthorKeys()) {
        final Author author = authorMap.get(authorKey.getId());
        sb.append(author.getFirstname()).append(author.getLastname()).append(", ");
    }

    sb.append("<br>");
}

That seems like something very counter-initiative if you’re from the relational world. Why do something in code that the database can do for you? Well that’s the thing, the database can’t in this case. CPU cycles are relatively cheap on AppEngine, so that’s not really a bottleneck either. And the result can be cached in MemCache. Either the “joined” set of books/authors, or just the author table (e.g, if books change more often).

This doesn’t work you would have millions of authors. You don’t want (and are impossible) to load millions of authors in memory just link 500 books to their department. In that case you can use a bulk get. This is a normal get operation, but with multiple id’s as arguments. Those objects will be loaded in one batch. The approach would be as follows:

select all books
build set of all required authors for all books
batch get required authors
iterate over books
get author for book from in-memory list of authors

Objectify ofy = ObjectifyService.begin();
List<Book> books = ofy.query(Book.class).list();

StringBuilder sb = new StringBuilder();

Set<Key<Author>> authorKeys = new HashSet<Key<Author>>();
for (Book book : books) {
    authorKeys.addAll(book.getAuthorKeys());
}

final Map<Key<Author>, Author> authorMap = ofy.get(authorKeys);

for (Book book : books) {
    sb.append(book.getTitle());
    sb.append(": ");

    for (Key<Author> authorKey : book.getAuthorKeys()) {
        final Author author = authorMap.get(authorKey);
        sb.append(author.getFirstname()).append(author.getLastname()).append(", ");
    }

    sb.append("<br>");
}

In the graph below you can see the difference in performance is dramatic. For a dataset of 1000 books and 5 authors the first approach takes over 20 seconds, while the other approaches are around 200-300ms.

Performance gain 6 - De-normalize
In some cases you query two related entities so often that you would be better of by de-normalizing the data. In the example above we could get rid of all the extra code if we would just add a departmentName field to the employee entity. Is that a better approach? Well, it depends. It’s definitively faster, but you have the overhead of having to keep the two fields in sync somehow.

I hope this article helps in getting applications to run better on AppEngine. It's not hard at all, just different. And you'll get a great platform for it in return.

40 comments:

Ken KellerDecember 6, 2010 at 12:55 PM
Thx for this. You seem to be mixing book & employee examples--the post might be clearer if you stuck with one.
ReplyDelete
Replies
Ken KellerDecember 6, 2010 at 12:59 PM
I see you use JSF2. myfaces or mojarra? Do you use a component library like richfaces or primefaces?

Which jax-rs impl do you use?

I read that Seam doesn't work on appengine. appengine doesn't support CDI so you use Weld?

Thx.
ReplyDelete
Replies
UnknownDecember 6, 2010 at 1:36 PM
For JSF2 I use Mojarra, I don't use component libraries a lot (I prefer plain jQuery in many cases) but I have good experiences with PrimeFaces on GAE. I didn't try other component libraries, but according to this post RichFaces 4 has GAE support: http://mkblog.exadel.com/2010/10/richfaces-4-m3-gae-support-new-richfaces-4-book/

For clarity: CDI is just a specification. This specification is not supported out of the box by GAE because it's just a Servlet container. Weld is the reference implementation of CDI and works quite well on GAE because of it's Servlet support. When you're talking about Seam, I guess you mean Seam 2. With JSF 2 and CDI (Weld) you won't really need Seam any more, because they are the evolution of Seam back into the Java EE platform. Seam 3 is build on top of CDI and I guess that some of the modules will play well on GAE too. Take a look at http://seamframework.org/Seam3 for currently available modules.

For JAX-RS I've used both RESTEasy and Jersey with success, both work well on GAE. RESTEasy has better CDI integration and slightly better startup time though on GAE, so I slightly prefer RESTEasy.
ReplyDelete
Replies
FelipeDecember 7, 2010 at 12:58 PM
ok... your example about "DataStore usage" is good.

but if I have an entity with 1001 rows?
ReplyDelete
Replies
UnknownDecember 7, 2010 at 1:04 PM
Great article!. Could you provide some real numbers:
- what's your minimal startup time you could achieve having decent framework for development?
- 1000 authors and 5 authors take 300ms in the best case . How does it scale - what would it take to do the same with 1M books and 50k authors? 10M books...

Thank you
ReplyDelete
Replies
UnknownDecember 7, 2010 at 1:42 PM
Thanks for the feedback. I agree larger datasets would be a useful addition to the examples because there are some limitations in that area too. I will try to provide some examples and startup time numbers later this week.
ReplyDelete
Replies
UnknownDecember 7, 2010 at 1:42 PM
I've also looked at this site: http://gaejava.appspot.com/

It gives me up to 5seconds delay removing 100 records with JDO. Is that normal?

Another thing that every time I run the test the result vary (especially JDO) case. Is that likely to be the startup time?
ReplyDelete
Replies
UnknownDecember 12, 2010 at 8:17 AM
Just to give you a heads up, I'm working on a new article about dealing with large datasets on AppEngine. I've to spread testing over a few days because those kind of numbers eat up my daily quota very very quickly, but I'll publish within a few days. Very positive results so far!
ReplyDelete
Replies
JayDecember 13, 2010 at 12:58 PM
One thing to remember when working with large datasets is that doing anything that makes your user wait results in a crappy user experience. It always boggles my mind when people talk about not being able to fetch large data sets into memory, or mutate large numbers of models or entity groups in 30 seconds. These aren't things you should be doing while the user is waiting for the next page anyway.

Tasks took all of this stuff to the background, and that's where it should stay.
ReplyDelete
Replies
for ict 99January 25, 2016 at 11:12 PM
Great Article

Java Training in Chennai | Online Java Training
ReplyDelete
Replies
service careFebruary 23, 2019 at 2:25 AM
Wow!! Really a nice Article. Thank you so much for your efforts. Definitely, it will be helpful for others. I would like to follow your blog. Share more like this. Thanks Again.
lg mobile service center in chennai
lg mobile service center
lg mobile service chennai
lg mobile repair
ReplyDelete
Replies
Deepali MFebruary 26, 2019 at 11:30 PM
This comment has been removed by the author.
ReplyDelete
Replies
service careMarch 2, 2019 at 12:52 AM
Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing.
coolpad service center near me
coolpad service
coolpad service centres in chennai
coolpad service center velachery

ReplyDelete
Replies
service careMarch 4, 2019 at 3:43 AM
This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic.
apple service center chennai
apple service center in chennai
apple mobile service centre in chennai
apple service center near me
ReplyDelete
Replies
service careMarch 5, 2019 at 1:16 AM
This is the exact information I am been searching for, Thanks for sharing the required infos with the clear update and required points.
oneplus service center chennai
oneplus service center in chennai
oneplus service centre chennai
ReplyDelete
Replies
ReshmaAugust 26, 2019 at 11:28 PM
Thanks for sharing this unique information with us.Keep update like this.
DevOps Training in Velachery
DevOps Training in Anna Nagar
DevOps Training in Tambaram
DevOps Training in T Nagar
DevOps Training in Vadapalani
DevOps Training in Adyar
DevOps Training in OMR
DevOps Training in Thiruvanmiyur
DevOps Training in Porur
ReplyDelete
Replies
Bangalore Training AcademyNovember 28, 2019 at 7:17 AM
Really very happy to say, your post is very interesting to read. I never stop myself to say something about it.You’re doing a great job. Keep it up...

Become an Expert In DBA Training in Bangalore! The most trusted and trending Programming Language. Learn from experienced Trainers and get the knowledge to crack a coding interview, @Bangalore Training Academy Located in BTM Layout.
ReplyDelete
Replies
RashikaMay 6, 2020 at 10:22 PM
Good article! I found some useful educational information in your blog about Selenium, it was awesome to read, thanks for sharing this great content to my vision.
Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery
ReplyDelete
Replies
Career Programs ExcellenceMay 30, 2020 at 7:23 AM
What a really awesome post this is. Truly, one of the best posts I've ever witnessed to see in my whole life. Wow, just keep it up.
Business Analytics Training
Business Analytics Course In Hyderabad
ReplyDelete
Replies
Training for IT and Software CoursesJuly 7, 2020 at 2:25 AM
Very interesting blog Thank you for sharing such a nice and interesting blog and really very helpful article.

Blue Prism Training in Bangalore

Best Blue Prism Training Institutes in Bangalore
ReplyDelete
Replies
Training for IT and Software CoursesJuly 7, 2020 at 2:26 AM
I have recently visited your blog profile. I am totally impressed by your blogging skills and knowledge.

MongoDB Online Training

MongoDB Classes Online

MongoDB Training Online

Online MongoDB Course

MongoDB Course Online
ReplyDelete
Replies
Aditi GuptaSeptember 17, 2020 at 10:46 PM
This post is really helpful for us. I certainly love this website, keep on it. Rajasthan Budget Tours
ReplyDelete
Replies
JoshJanuary 20, 2021 at 8:10 PM
Infertility specialist in chennai
Sexologist in chennai
Sexologist doctor in chennai
ReplyDelete
Replies
JackMarch 1, 2021 at 2:28 AM
Saham perusahaan diterbitkan di atas kertas, memungkinkan investor untuk memperdagangkan saham bolak-balik dengan investor lain, tetapi bursa yang diatur tidak ada sampai pembentukan Bursa Efek London (LSE) pada tahun 1773. Meskipun sejumlah besar gejolak keuangan mengikuti pendirian segera dari LSE, perdagangan pertukaran secara keseluruhan berhasil bertahan dan berkembang sepanjang tahun 1800-an. cek juga markets dan Cara Investasi Saham Dengan Modal Kecil
ReplyDelete
Replies
UnknownJune 30, 2021 at 9:21 PM
what is contrave

silicon wives

sky pharmacy

atx 101 uk

macrolane buttock injections london

hydrogel buttock injections

buying vyvanse online legit

buy dermal fillers online usa

mesotherapy injections near me

xeomin reviews

ReplyDelete
Replies
yasul.topSeptember 5, 2021 at 6:22 AM
Hi there! I just want to offer you a huge thumbs up for the great information you have here on this post. I’ll be coming back to your website for more soon.

🌐야한동영상🌐
ReplyDelete
Replies
ophunter.netSeptember 5, 2021 at 6:24 AM
You’re so cool 오피헌터! I do not believe I’ve truly read through a single thing like this before. So good to find somebody with a few original thoughts on this issue. Really.. many thanks for starting this up. This site is one thing that is required on the internet, someone with a little originality.
ReplyDelete
Replies
massage.blueSeptember 5, 2021 at 6:27 AM
There is visibly a bundle to know about this. I think you made certain good factors in features also. 횟수 무제한 출장
ReplyDelete
Replies
gunma.topSeptember 5, 2021 at 6:33 AM
When I originally commented I clicked the 스포츠마사지 -Alert me when new remarks are added- checkbox as well as now each time a comment is added I get 4 emails with the very same remark. Is there any way you can eliminate me from that solution? Many thanks!
ReplyDelete
Replies
LwinxOctober 2, 2021 at 8:12 AM
cover coin hangi borsada
cover coin hangi borsada
cover coin hangi borsada
xec coin hangi borsada
ray hangi borsada
tiktok jeton hilesi
tiktok jeton hilesi
tiktok jeton hilesi
tiktok jeton hilesi
ReplyDelete
Replies
neha jainNovember 18, 2021 at 10:57 PM
MPPSC Coaching in Indore
ReplyDelete
Replies
chocobeee325December 2, 2021 at 10:09 PM
Nice site, nice and easy оn thе eyes ɑnd great content too.Feel free to visit my pаge : 토토

ReplyDelete
Replies
chocobeee325December 2, 2021 at 10:10 PM
토토사이트 Outstanding post, you have pointed out some great points, I besides think this is a very good website.Also visit my site;

ReplyDelete
Replies
chocobeee325December 2, 2021 at 10:11 PM
카지노사이트 Wow that was odd. I just wrote an incredibly long comment but after I clicked submit my comment didn’t show up.Grrrr… well I’m not writing all that over again. Regardless, just wanted to say excellent blog!My webpage

ReplyDelete
Replies
chocobeee325December 2, 2021 at 10:12 PM
Hiya, I’m really glad I have found this information. Nowadays bloggers publish just about gossip and internet stuff and this is really irritating. A good web site with exciting content, this is what I ?need. Thanks for making this web site, and I will be visiting again. 카지노사이트

ReplyDelete
Replies
AnonymousMay 17, 2022 at 7:00 AM
tül perde modelleri
numara onay
mobil ödeme bozdurma
nft nasıl alınır
Ankara Evden Eve Nakliyat
TRAFİK SİGORTASI
dedektör
web sitesi kurma
aşk kitapları
ReplyDelete
Replies
AnonymousMay 20, 2022 at 7:14 PM
폭스나인 폭스나인 폭스나인
ReplyDelete
Replies
traininginstituteAugust 25, 2022 at 3:03 AM
I think I have never seen such blogs ever before that has complete things with all details which I want. So kindly update this ever for us.
full stack developer course with placement

ReplyDelete
Replies
iteducationcentreApril 7, 2024 at 10:53 PM
Super Post. It was worth reading.
Java course in Pune
ReplyDelete
Replies
fullstack training centerAugust 31, 2024 at 9:52 PM
There is clearly a lot to learn about this topic. I believe you've highlighted some important points and features as well.
fullstacktrainingcenter
ReplyDelete
Replies

Add comment

Pages

Wednesday, December 1, 2010

Tuning Java applications on Google AppEngine

40 comments: