Showing posts with label Cloud. Show all posts
Showing posts with label Cloud. Show all posts

Wednesday, October 23, 2013

Business Continuity, Disaster Recovery, Resiliency

Court disaster long enough and it will accept your proposal.”  Mason Cooley

Independent Software Vendors (ISV's) like any organization must engage in the activities of Business Continuity Planning (BCP) also called Business Continuity and Resiliency Planning (BCRP). This is especially critical for ISV's that provide Software as a Service (SaaS) solutions to their customers. The process identifies exposure to internal and external threats that can disrupt or worse interrupt the operations that are the lifeblood of the business.  Once these risks are identified a recovery plan is developed to return the business back to full operations. Once a recovery plan is in place the business can evaluate, with the knowledge of the risks identified, what hard and soft assets can be applied to prevent a disruption from occurring in the first place, improving resiliency of the business.

Some objectives for the BCRP we listed in our last planning cycle:

  • Identify risks, critical production components and the impacts of their failure.
  • Establish systems to monitor the health of these critical production components.
  • Document recovery procedures to restore critical production components in the event of failure in a time frame that does not breach the customer End User License Agreement (EULA) or Service Level Agreement (SLA). - These recovery procedures also assist in avoiding confusion experienced during an outage.
  • Identify personnel that must be notified in the event of an outage.
  • Create a plan to communicate with key people during the recovery and an escalation procedure.
  • Establish a testing procedure to validate the recovery plans.
  • Establish a process in which the plan can be maintained, updated and tested periodically.
  • Serve as a guide for the IT or Network Services Team.

The BCRP process should be considered cyclical, something that should be executed at least once a year. The BCRP Cycle is composed of 3 main phases (the 3 R's),  Risk Analysis, Recovery or Solution Design and Resiliency or Maintenance. At a more detailed level we can define the BCRP Cycle using the diagram below.
Risk analysis
This phase should identify exposure to internal and external threats that can disrupt or worse interrupt the operations of the company. Establishing 'Severity Levels' is useful in this phase.  For example:


  • Level 1 Disaster Recovery - Severe Outage
    This level is assigned to those risk scenarios where the disruptions are as the name implies disastrous and affect the availability of any component that interrupts operations completely and cannot be fixed at the production site and operations have to be moved to a new location. These disruptions by definition cannot be resolved at the Production Site and will result in instant escalation (chain of notification and approval) o move the Production Site to the Disaster Recovery site, a significant declaration.  Think of a meteorite hitting your data center, or as actually happened this year at a local data center here in DC, a back hoe cutting your data centers internet trunk. Here the potential for EULA/SLA breach is high depending on the fail over time to the new site.
  • Level 2 Operational Recovery - Outage
    This level is assigned to those risk scenarios where the disruptions affect the availability of any component that interrupts operations completely but can be fixed at the Production Site. Rhe recovery plan for these risk scenarios should be well within the time that might lead to a breach of  any customer EULA/SLA.  There will be an escalation procedure in place that will move a Level 2 risk to a Level 1 risk should recovery take longer than expected.
  • Level 3 Offline Recovery - Redundant Outage
    This level is assigned to those risk scenarios where the disruption has no effect on operations, i.e. a pooled web server goes down and the load balancer automatically takes it out of circulation.  The minimal impact of these outages are usually due to built in resiliency, never the less the outage must be addressed to bring the system back to complete health.


Recovery / Solution Design
First it's important to define the two types of recovery planning that were outlined in the Severity Level definitions above:

  • Disaster Recovery
    The process of establishing procedures to recover operations in a location other than the primary production facility after a declaration of disaster.
  • Operational Recovery
    The process of establishing procedures to recover production in the same location and does not require a declaration of disaster.
This phase should produce procedures and identify soft and hard assets to recover from the disruption and bring the business back to full execution after a disruption in both the Disaster and Operation Recovery scenarios.

It is also important to define 2 important numbers which in the end will have a significant impact on the solution design:

  • Recovery Point Objective (RPO) - the acceptable latency of data that will not be recovered (usually driven by transaction volume and speed).
  • Recovery Time Objective (RTO) - the acceptable amount of time to restore operations (usually driven by EULA/SLA financial impact).


Implementation
This phase should establish the necessary monitors, documentation, communication protocols, testing and fail over environments to successfully execute and test the Recovery / Solution Design.  Disaster Recovery will require a fail over operations site be put in place at a different location, this site might be a good candidate to conduct testing and validation of the Recovery / Solution Design rather than jeopardize production.

Testing and Validation
This phase should execute the Recovery / Solution Design through forced disruptions (on the fail over or test platform) which if successful will validate the recovery plans and provide metrics for impact on customer EULA/SLA.  Part of the validation process is to benchmark your Disaster recovery fail over site to make sure it's performance and through put are acceptable, remember your recovering for all your customers, not just a select few.

Maintenance and Resiliency
This phase is a post mortem, what did you learn from the Testing and Validation Phase, what parts in the Recovery & Solution Design had to be modified because they didn't work, this information should update the BCP.  It's also useful here since you have become so educated on your operations, and risks that can impact your operations is to use this knowledge to identify production components that may be candidates for resiliency improvements.  Where these improvements include capital expenses these should be included in the next budgeting cycle using your BCP to make the business case for the expenditure.

Good Luck!




Thursday, October 10, 2013

Moving Legacy Applications to the Cloud - Hybrid Approach with Virtualization

In my last post, "Moving Legacy Applications to the Cloud - Transposition versus Rewrite", I discussed a challenge many ISV's and Corporate IT departments are grappling with today, and that is moving legacy applications to the Cloud. One effective option discussed in that post was the approach of transposing code based on frameworks and tools that are available today versus a monolithic rewrite of the code.  Another option available today is the Hybrid Approach using Virtualization.

The term Hybrid in this case means a cloud application that has two kinds of components that produce the same results as if there were only one kind of component. Specifically a value-add cloud application can be built around the legacy application replacing some of it's parts that leverage the Cloud capabilities while other parts of the legacy application can remain as is and exist in the cloud and be streamed down to the user on demand.  

As an example consider a real world example, a hybrid cloud offering we just delivered to the market we'll call the The Tax Planner for the Web.  Tax Planner is a desktop application that is the #1 market leading product in professional tax planning software. The product was first delivered to the market as a programmed chip for the HP Calculator, yes you read that right.  It was then rewritten to run on the personal computer. It is a very complex application with many of the characteristics of a legacy application that I won't go into here.  The application must be updated each year for new tax legislation and retains the previous years tax calculations, in fact it supports tax calculations going back to 1987.  In addition, numerous spreadsheets or worksheets are used for data entry, basically mirroring the information you would enter into a complete Form 1040, so think Schedule A, C, D etc. The customer being tax professionals and accountants are comfortable with the spreadsheet metaphor for data entry and this is challenging to replicate well in a Cloud application before HTML5. So a monolithic rewrite to move the application to the Cloud to meet market demand is a challenging project.

The solution, a Hybrid Cloud Application that leveraged the Cloud to provide features customers were looking for in a Cloud Application and leveraged the desktop to run the remainder of the legacy application to provide rich and responsive user experience.  The legacy application uses a file metaphor to store clients planning data, so there wasn't the rich Client Management that customers were requesting.  Customers were interested in having backup capabilities that didn't require keeping track of files.  Customers were interested in collaboration and workflow capabilities across their client data. So these were features that made sense moving to the Cloud.  On the other hand clients were not interested in learning a new user interface to enter plan data, think navigating the worksheets that comprise a complete Form 1040.  They were happy with that part of the program.  

To develop this Hybrid Cloud Application we needed a capability to stream down a virtualized version of the shipping program to the desktop on demand.  The overall concept was to develop a native Cloud application that would include login/security, robust Client Management and other value add features we could easily develop for the Cloud and then when the user navigated to the client and clicked on it to open we would stream down the data entry part of the legacy application with the client data.  In fact, the same shipping desktop application would be used in the Cloud application as shipped for our conventional desktop customers, we would just deal with File Open and Save through a new DLL that would make web service calls to the Cloud to display,open and save clients from the Cloud server.

To accomplish the virtualization and streaming functionality we conducted a broad market search for an appropriate technology.  I won't go into all the available vendors but we chose technology from Spoon previously Xenocode developed by Code Systems Corporation, founded by by former Microsoft engineers and researchers. The technology enables application virtualization, portable application creation, and digital distribution. 

We use Spoon Studio which packages software applications into portable applications, single executable files that can be run instantly on any Windows computer.  It only emulates the operating system features that are necessary for applications to run to reduce resource overhead.  Virtualizes portable applications run independently from other software, so there are no conflicts between them and other programs, ie no DLL conflicts.  

We deploy the virtualized application in Spoon Server which we call when the user selects a client to stream the application down to the users desktop.  Spoon provides browser plugins for all popular browsers that handle the client side of receiving and launching the streaming application.  Once launched we create a communication link for our web services back to our Cloud application for further processing.  A couple interesting features here, the initial download is fast but subsequent launching of the application can be almost instantaneous because the plugin is smart enough to determine whether the current sandboxed application is up-to-date, if it is it just launches the application rather than streaming it down again.  This also insures that the application is always the latest up to date version which is critical for Tax Planners.

The Hybrid Cloud application was a big hit with our customers, they loved the cloud features and were amazed they didn't have to learn a new interface.  We were also recognized by the industry earning a finalist position in the CPA Practice Advisor 2013 Innovation Awards.

You can learn more about Spoon on their website and you can try a virtualized application yourself their as well.


Wednesday, September 25, 2013

Moving Legacy Applications to the Cloud - Transposition versus Rewrite


Moving legacy applications to the cloud is an issue many ISV's and Business IT departments are grappling with today.  Although there are a variety of technologies available to create new cloud applications, engaging in a monolithic rewrite of a legacy application in new technologies may not be a viable option both from a cost and time-to-market perspective. In addition, the modern target platforms for the next generation of the legacy application is much more complex than the original target platform when the legacy application was first built.  New target platforms are  characterized by multi platform access, device independence and mobile enabled user interfaces.

There are two options that are worth considering to address this challenge, neither of which requires a rewrite but both can be supported in agile best practices and continuous deployment.

Transposition versus Rewrite

The first option is to approach the legacy application as a real estate developer might approach an old house purchased as an investment.  Although I am not a real estate agent I would assume the questions might be similar to questions we ask when considering the evolution of a legacy application to new target environments.  Is the house in such bad shape that it must be torn down and built back up from scratch, ie is it riddled with termites, is the foundation cracked.  Also combined with this analysis, is there really a budget to build a new house and still be profitable?, was there time factored into the investment to build a new house, or was a quick turnaround built into the investment returns.  Alternatively if the answer is no the house isn't a tear down, then the question is what needs to be done to the house on a more limited time frame and budget to modernize it to have the features that sell in the new real estate market of today?, ie does it need the kitchen remodeled?, should some walls be removed to increase the size of the master bedroom?.

Now lets put these questions in the context of a legacy application.  Do we need to start from a clean slate because nothing is salvageable, ie the application is so buggy and unpredictable with crashes so frequent as to render it useless.  In addition we need to ask if we have the budget to rewrite the application and does the time to market fit in the investment case.  Alternatively if the answer is no, then the question is what needs to be asked what can be done to the legacy application on a more limited time frame and budget to modernize it to operate in the new target environments of today.  And that is the idea of transposition, a unique paradigm that combines concepts from migration, rewrite and virtualization combined with a set of supporting technologies and integrated development environments into a single solution that reduces the time to market and budget required to transpose the legacy application into a new modern application that can run on the modern target environments of today.  The leader in transposition techniques, a company headquartered in Israel - GizMox, refers to transposition as 'computer aided rewrite that reproduces an application that runs on one computing architecture, to an equivalent HTML5 application that will work multi-browser on multiple devices.

The transposition magic is contained in their product Instant CloudMove and Visual WebGui.  Below is the full transposition process.

The Transposition Process
The first phase of transposition is an assessment phase.  GizMox has made available an assessment wizard that you can download and run on the source code of your legacy application. The tool analyzes the source of the legacy application using a virtual compiler to identify flow and dependencies which it uses to build an Abstract Syntax Tree or AST object model. It then employs out-of-the-box syntax translation and mapping of the tree functions to new functions or packages, native to the new target environments where possible to provide metrics . The assessment provides a comprehensive report including:  an automation level assessment and a breakdown of the required resources and packages.
Following this assessment report GizMox offers a more thorough analysis of your application which outlines accurate costs & establishes a detailed work plan including recommended personnel and Time to Market.  Also offered is a free trial of the Instant CloudMove which actually transposes the application so you can develop your own Proof of Concept (POC) or you can contract GizMox to develop a POC based on a representative module of 10,000 lines of code, GizMox will even help you create a representative module from your application..

The Instant CloudMove set of tools transposes most of the original legacy application and user interface code to its new (web-based) environment automatically, approximately 80-85%. The transposition is executed by a sophisticated engine that translates source language into intermediate target language. By transposing into intermediate language the original legacy application code and the target code are isolated from each other, so work can continue on with the original legacy code and be translated back in should a merge be necessary eliminating the need for code freeze of the original source application.  This frees the transposition team to work iteratively at their own pace to deliver the highest quality application.  It's also important to mention that using intermediate code enables a 'push button' generation of code for the desired target environment.  So working the intermediate code new retargeted applications can be generated at any time.

 And because it integrates with Microsoft Visual Studio IDE, it provides the flexibility to customize, upgrade and add pieces of code as you go. There are even sophisticate pattern matching capabilities you can use to create your own mapping packages for the 15-20% of the code that is not automatically transposed.

Using simple drag and drop actions, you can redesign legacy user interfaces to new front-ends such as HTML5 using another GizMox product Visual WebGui (VWG).  VWG extends ASP.NET APIs to support rich user interfaces and incorporates Ajax connections for further enhancements. By targeting VWG, you are practically targeting an enhanced ASP.NET application with an HTML5 user interface.  Once your application is transposed into VWG HTML5, you will be able to inherit part or all of the generated forms and redesign them into mobile and tablet form factors using the VWG Visual Studio integrated designer.

I have been involved in many rewrites of legacy applications for many target environments over the years and these are the kind of tools and translators that each development team would create in-house to 'transpose' the legacy application to run in a new target environment.  In most cases legacy applications have embedded rules and calculation engines and in the extreme cases the engines have a time dimension that is built up over years.  Rewriting these rather than transposing them can be a risky proposition, especially when the customer can't tell the difference.


Here are some tables from the GizMox site that make the case for transposition over standard rewrite.

Comparing Instant CloudMove to a standard rewrite.

Factor
Standard Rewrite
Instant CloudMove
Quality  of code
High
High
Level of automation
Low
High
Time to market
Long
Short
Ability to estimate resources, costs and completion times
Low
High
Built-in capability of adaption to tablet and mobile 'touch' devices
Requires multiple skillset
Built-in drag 'n drop simplicity,
Level of MS Visual Studio integration
Varies
Very high
Risk
High
Low
Cost
High
Considerably Lower




Below is a dollar-for-dollar comparison of smart rewrite against traditional rewrite showing rate of progress and cost per line (LOC)

Source
Rate of Progress in Lines of Code per Developer per Day
Cost per Line $



Cutter Consortium
185
12.3 - 18.5
Gartner
170
N/A
BNA Software
284
N/A
Tactical Strategy Group
50
$15
Forrester
N/A
$6-23
Fedora Linux
N/A
$52
HP
100
$10-30
Gizmox Instant CloudMove
3,000 - 6,000
$0.35 -1.2
Published Figures for LOC a Day and Cost per Line for System Rewrite