Friday, 31 January 2014

Zero Downtime - Application Deployment head aches


Zero Downtime - Application Deployment head aches 

During the design and development phase of any problem, one of the most common things over looked is the deployment and management strategy. The PM is concerned about delivery, developer is concerned about application architecture, manager is concerned about the costs of development and operations... they are too busy fighting existing fires.

Over the years in development I have come across many deployment strategies and pitfalls around choosing one option over the other. Whilst I have been advocate of "Zero Downtime", practically achieving it totally different subject matter. Whilst you can employ various clever techniques but with alot of application functionality moves to network (like cached javascript) or user browser, addressing clean deployment strategy is becoming much more important upfront.

Whilst there are many tools in market (like LiveRebel), cost and introducing another system into the ecosystem introduces risks.

Below are some of the deployment techniques that I have come across and major benefits and concerns : 

Note: This article doesn't go in detail of individual technique, it just highlight some options. There might be more techniques which i have no highlighted here, if you know please do share.

Apache + Tomcat X

Single Tomcat deployment: 

If you are servicing your application with a single tomcat, downtime is a must as deploying on a single instance will take out all sessions and will not service any request till the server comes up.
 

Multi tomcat cluster behind an Apache Proxy:

When an application is being serviced by multiple tomcats behind an apache proxy (using mod_jk), key considerations should be given to session management. Think about:


  1. Sticky Sessions
  2. Session Replication

Using Sticky Session with Session Replication gives absolute zero downtime for deployments.  One node can be taken down and all users on that server will be moved over to next node. Key consideration for this strategy is:
  1. Session replication comes with application complexity and memory issues. More memory is require on each server to store complete system session information (or have buddy server).
  2. Session replication increases node to node communication which may have an impact on CPU performance.
  3. You might be better off taking a high on user session that compromise server performance.
What ever option is taken, remember someone needs to manage this so plan up front.


Apache + Jboss AS 7
 
Jboss AS 7 comes in two major flavors: standalone mode and domain mode. Whilst this is independent to the concept of clustering, the deployment strategy is very much dependent on the choices taken upfront.

JBoss in Standalone Mode:

In a Standalone mode, the server is responsible for its own resources and doesn't communicate with other nodes in the same cluster. Pros and Cons of this are:
  • Pro: The server is independent and can be taken down as and when required.
  • Pro: Suitable for environments where either there is a single standalone server (or may be two).
  • Con: If the number of server increase, management and deployment becomes an issue.
  • Con: Roll out of configuration or application needs to be done separately.
  • Con: Rollback is managed manually.

In terms of achieving Zero Downtime deployment, this model lends best to zero customer impact as with tomcat mentioned above, each server can be taken down if the sessions are shared between nodes. If the sessions are are not shared, the users will be logged out from their session and will be asked to log in to another node automcatically. 


This strategy has a potential of bouncing a single user N-1 times if N is number of nodes in a cluster.

JBoss in Domain Mode:

For cluster containing various nodes and has a potential of horizontal scaling (adding more nodes over time), domain mode is best suited as a deployment strategy. Below are some benefits:
  • Pro: Management of all server is done through a single Master Node and changes are propagated to client nodes.
  • Pro: No single point of failure for the application.
  • Pro: Deployment rollback are done automatically.
  • Con: At the time of deployment, the cluster can be made unavailable as application is deployed on all nodes by master node.

To achieve zero downtime for any deployment in domain mode much more thinking and planning in required.
This can be partially achieved by deploying parallel cluster and deploying the new application on this cluster whilst the old cluster services client requests. Once the parallel cluster is deployed and verified, the secondary cluster is made the primary cluster and the old cluster is taken down. The old cluster can act as a rollback strategy in case something goes wrong on new cluster.

All the above can be achieved by Master Node Jboss Console. Key considerations for this strategy are:
  • Increased requirement of memory to support two clusters.
  • Least stressful time on server should be picked on the server.
  • The deployment timelines are increased as more activities are required.
  • If the application is load balanced through load balancers, additional load balancer configuration is required for parallel deployment.
  • Automated process is required for switching between one application cluster to another.
  • Separate DNS entries are required for secondary service. This can be used to verify that the application is deployed correctly.
At the end of the day, it is the decision between business, cost and technical ability of the team to achieve Zero downtime and accept if any customer impact is acceptable or not.




4 comments:

  1. Hello,
    I am interested in Apache + Jboss AS 7 option.
    I have a JBoss AS7 cluster with 2 nodes. I am looking to 'update' my application with zero down time.

    One question I have is, once the deployment in first node is over it need to re-connect to the cluster, from this moment to detaching the second node the application will not be consistent, because of 2 versions of application.

    What is your suggestion on that?

    ReplyDelete
    Replies
    1. Hi Seby,

      I assume you are talking about JBoss AS in Standalone Mode. If the two cluster nodes are configured and started in Standalone mode, there will be no communication between the applications on the nodes as it happens in the domain mode. So if you have configured sticky sessions on your apache, then all request going to the updated server will have access to the new application. I have not encountered any problems doing in production applications for year, do you have a much specific case? the only thing to be careful about is if the application database is incompatible i.e. you upgrade the application on one node, upgrade the database to start the new application and breaks the old application as it is not compatible with it.

      In domain mode, better setup a parallel cluster as even though it is resource intensive, it is a much cleaner approach.

      Hope this helps.

      Delete
  2. Thank you for the reply. Apology for not providing the exact details in my question. Here is the details.

    I have a JBoss AS7 cluster with 2 nodes. I am looking to 'update' my application with zero down time. [This application is not yet deployed in production it is in development stage]

    In my case, in some scenarios I need to go for a unplanned deployment. This scenario can happen in some baseness logic change [This can happen once in a year or two, In all such cases there will not be no session structure update, no database structure change etc.]

    Following are the configurations/steps in my plan to achieve this.

    Configurations:

    1.Session Replication is enabled.
    2.Load balancing is done with Apache mod_jk.
    3.JBoss AS 7.1 in standalone mode.

    Steps:

    1. Take out node1 from the cluster.
    2. Deploy new war file in node1.
    3. Attach node1 in cluster.
    4. Take out node2 from the cluster.
    5. Deploy new war file in node2
    6. Attach node2 in cluster.

    A) During step1 through step2, customer request will be served from node2. Since session replication is enabled, node2 can also serve the request of those users who were on node1[users before detaching node1].
    B) On step3, node1 is back into cluster and it will get session replicated from node2. So node1 is good to handle request of node2 users as well.
    C) Now I am taking out node2 for deploying the new version of application.

    Questions:

    The time lag between step 3 and step 4 can cause inconsistency, because two versions of application will be up and running at the same time. How can I mitigate this inconsistency?

    Can you give some idea on the time lag that can happen between step 3 and step4 ? Remember step 4 can happen only when the session replication is completed.

    Is the point B) a valid statement?

    What is the trigger for a session replication?

    Can I force a session replication ?

    [This is a Spring based web application]

    Thank You.

    ReplyDelete
  3. Hi Shehzad Nagi, can you pls comment on my reply

    ReplyDelete