Great Expectations

What Can Happen When You Think You're a Framework Developer Too

Frameworks. We all use them, some of them we love, some of them we hate, but we use them.

There are popular ones like Spring and Hibernate, not so popular ones like Guice and iBatis.

Popularity aside, we use frameworks because we [easily] leverage the functionality they bring to the table so we don't have to recreate the wheel.

Heck, even Liferay has its framework stuff, not just Service Builder but there's Liferay MVC, workflow, assets, etc.

There is a side effect I see a lot from developers who have been using frameworks for a long time... They spend so much time coding using a framework, they start to think that they too can easily create a framework.

Too many times I've seen this end in miserable failure.

Let me share an anecdote...

I happened to be helping a client with some performance and stability issues. They mentioned that all of their portlets were based on a framework. They told me, but it was one that I hadn't heard of before. So I asked, "Can I see it?" And that's where the fun began.

The framework itself took a basic Spring-like approach. Classes would be decorated with an @Controller-like annotation and methods would be decorated with an annotation for the type of portlet lifecycle event they were for and a key used for the lookup.

The main portlet controller would basically search for all classes with the annotation and build out maps for the lifecycle event and key value with the corresponding Method to invoke (the whole system was based on reflection).

To create a new portlet using this framework, you wouldn't need a Portlet class yourself, you could just start off with the annotated controller class and the portlet.xml defined to use the framework's Portlet class.

Sounds easy enough, but digging a little further revealed a ton of issues.

First, the component classes were all expected to inherit from a base class, but the base class (and the subclasses) retained state information, namely the request object, the response object, the portlet configuration object, ...

I'm sure the original developer thought it would be easier for the various component implementors to be able to rely on all of this context information just being there rather than passing it around.

That ended up being a really bad idea though; on each incoming portlet request, the framework portlet would instantiate a brand new controller class, passing in the relevant context details. The component instance would be used to service the request, but since it was bound to the current request context, it couldn't be reused and was discarded, left for the garbage collector to clean up.

The code passed unit tests, passed system test, passed all kinds of QA reviews... Heck, the code worked fine when initially deployed to production.

But over time performance and stability issues started showing up. Eventually the nodes would crash w/ an OOM exception. The framework team said the issue was all Liferay's fault, but in order to mitigate the issue the client had to supplement with front side caching (to reduce backend calls) and perform regular restarts in off hours (started out being weekly, then dropped to every 4 or 5 days, then down to every other day, and now they're on the cusp of restarting daily).

So now I'm in looking around the client's code, I'm seeing all of this, and I have to tell them that I can't really help them.

"Why not? You're the Liferay guy, help us tune the environment so we don't have these problems..."

But I can't. And now I have to additionally tell them that their framework, the internal one they built, that they have a lot of time and money and sweat and tears invested in, that this framework is the source of all of their problems.

"What do you mean? It has passed all of the tests..."

I then had to spend time talking about what was going on at scale...

If a page had 10 of these portlets on it, one of them might be getting an Action event, but all 10 would be getting a Render event. So there would be 11 new objects instantiated on each page. After the request was completed, 11 objects would then be subject to garbage collection...

When there are only 10 incoming requests, that is 110 new objects created that needed to be disposed. So when they were testing, even over a relatively long timeframe, there was no apparent issue because there was no significant multiplier involved.

Even after production deployment, site adoption was low enough such that the accumulated impact of the allocation took a long time to become apparent.

It was only after site adoption grew to the point where concurrent requests were coming in, thousands of objects were being created and were discarded for GC. It didn't help that the objects themselves retained week references so the GC couldn't get rid of all of them.

Eventually the impact of the controller creation and eventual GC was simply too much and the whole node eventually would die with the OOM errors.

Fronting with a caching appliance helped, because that took a bunch of the requests away from the portal so the inevitable crash was somewhat delayed.

Restarting the nodes on a regular basis effectively reset the memory back to zero, so it was more of a preemptive step to avoid the node crashing at a bad time.

So this custom framework that everything they had was based on? The one the portlet developers thought was easy to use and adopt? Well now they know it is unsalvageable junk. Everything based on the framework now needs to be rewritten as all code in the framework depends upon the contained context data, so refactoring would likely be more work and error prone than just starting over.

Let's just say they were not happy campers. I was happy they didn't want to kill the messenger...

I guess the moral here is that it really is hard to be a framework developer. There's just so much that can go wrong. You can have a solid, working implementation but if it suffers from scaling issues, it's not really solid after all.

I guess if you think you can write your own framework, I'd suggest that you stop, take a breath, then ask yourself what kind of trouble you'll be in or worse, how much trouble your company or site will be in, if you get something really wrong and need to start over. Are you willing to pay that bill when/if it comes due?

If not, you're always safer (and more protected) when you build off of an established, tested and well adopted framework. Sure you might not like some of the framework design decisions, but those can be easier and cheaper to live with.

Blogs

Well... that's some moralizing piece of advice. Thanks for sharing, David!

Yeah, I know it can be hard picking up a framework, especially for large ones like Spring Portlet MVC. But there's a lot of time, experience, knowledge, testing, documentation, security and performance testing that gets baked into those frameworks, all things you don't get if you go it alone.

I'm pretty sure if *it* (the framework) had been started as open source from the beginnings (or at least opened at a point) they would have discovered the fundamental issues faster and their project wouldn't have spreaded well.

 

I can imagine they're even asking questions on public forums with also sharing pieces from their code to ask for help, but without seeing the whole stuff as a whole thing nobody really could catch these and point out that they should not going into that direction.

 

So for me one of the most important takeaway is that good frameworks are usually emerging from the open source space, not from internal projects.

Great article and definitely highlights something I find myself asking clients more often than I'd like to -

 

"If the feature/solution is already in the product you bought/selected (Liferay), why are you gravitating towards writing your own?" 

 

Most of the time I think it's more a case of ignorance, though I don't think that that is not done maliciously. It certainly feels like timelines for projects are getting shorter and shorter and that the tasks that were the jobs of many in the past are being wedged into just a few. So the pressure to deliver and the gap in knowledge definitely don't help. I couldn't agree more with what you have written though and if nothing else if makes a great case for peer review activities or more XP style coding. 

 

Thanks for sharing David.