Best Practices

BPM employs, as may be required, best practice benchmarking to establish design criteria and strategic approach to determine the gaps between our client's "as is" system and the desired "to be" state. We maintain a data base of best practices in a wide variety of systems disciplines. And, as the specifics of the situation require, we conduct tailored best practice evaluations on behalf of a client.

Sample results for one benchmark participant (which just happens to be one of the largest, most successful, e-commerce sites in the world) is presented below, as an example. This case study, which maintains the confidentiality of the client and participant, illustrates the value and the comprehensiveness in the BPM best practice approach.

We invite you to compare your own site's performance the design principles and metrics presented in this actual benchmark below.  We know you will benefit greatly by doing so. We also ask you to extrapolate the value of applying this same technique, analogously, to the challenges you may be facing in this or other systems disciplines. Though the benchmark below is specific to e-commerce infrastructure, we have applied this same approach for numerous clients in such diverse areas as telecommunications billing design, operations management process design, enterprise software cost evaluation , data center budgeting, and other related areas. We have learned the incredible power of this approach. We invite you to as well.

 

 

 Criteria  

 Best Practice  Sample Result

      Internet Operations and Architecture 

 

Business value

(Actual participant)

 

 

3 years to develop and achieve, from no e-commerce applications to current state where e-Commerce accounts for:

  • 77% of revenue ($9 Billion/year)

  •  80% of customer support transactions

  • product cycle time reduced from 3 weeks to 2-3 days

  • 5% shipments never touch hands (handled electronically)

  •  $650 Million annual saving  (mostly thru headcount reduction

Applications

  • Customer service

  • Product catalog

  • Sales/order entry

  • Customized configurator

  • Tracking delivery of order to door

Volume

  • 12 million hits/day

  • growth is 25% per month

  • 3500 orders/day (avg size $45K/order)

  • 6000 configurable items in online catalog

Architecture

Ř   Design principle: design based on segregation of function, allocating of distinct functions to distinct servers.

Ř    Design principle: application design is fat client (significant user/appl interaction takes place in the  browser, resulting in full transaction gets sent to / processed by server)

Ř    Design principle: uses a caching engine for inbound and outbound traffic

Ř    Design principle: design such that you are dependent on any IP specific design

Ř    Design principle: design so that multiple data base engines hit same data base concurrently

Ř    Design principle: design for cloned instances of the same service

Ř    Design principle: design not to have mixed packet sizes running at the same time

Ř    Design principle: you must design for persistence (state must be know at any and every point that the internet connection might get interrupted). This is necessary due to unreliable nature of the internet connection. Our preferred approach is to capture state and send this info to a state server; if you get disconnected you can retrieve state form the state server. (this is the internet version of checkpoint restart).

Ř    Design principle: avoid sequential design--launch (and process) multiple queries concurrently; this is superior to/faster than sequential design. It also enables the screen to be painted faster (as the screen can be painting concurrently with a longer running query being processed).

Ř    Lesson learned: using load director is not the way to go. Big IP and Resonate are alternative load balancing product solutions that enable you to distribute load across sites so (Vs load director who does not). Therefore this the preferred approach Vs Load Director.

Ř    Lesson learned  segregation of workloads is the only way to predict and manage capacity

Ř    Lesson learned  found caching for inbound traffic highly valuable (reduced I/O by 50%)

Ř    Lesson learned  choose large capacity boxes (eg E10000s). It is easier/less cost to manage a smaller # of large capacity devices. Opposite approach larger # of smaller boxes adds an administrative burden (eg contrast with Dell which adds 70 NT servers per month)

Ř    Lesson learned:  manage capacity by dynamically balancing load real time both across configurations and across processors within a single configuration.

Product set

  • Apache for Web server

  • Java, Visual Café, EJB (Enterprise Java beans, standardized APIs)

  • CGI, Perl, XML, HTML

  • Corba objects (from Phisigenics)

  • Rational Rose

  • Raft (from Foglight) for dynamic load management and monitoring

  • Keynote for external Web testing/pinging in/out of US

  • ITO (from HP) for umbrella performance management

  • EMAN (from Cisco) for software robot like testing

  • IIOP (comm protocol, placed inside firewall)

Change management process design

 

  • Content changes managed daily

  • Executable changes managed monthly

  • Hot changes to application 24x7 managed without taking system down

  • Have found they need to move to “more traditional” change control (slower)

  • Use “certification” procedures/training     
            -
    constructed/teach certification classes     
     
           - programmers must be certified      
        - no s/w is permitted to be installed unless reviewed by certified Java programmer

  • Network implements “freeze periods” around each quarter end and each month end during financial  reporting and manufacturing periods (to minimize business impact of an outage)

  • All network changes must be submitted >24 hours in advance; enters change management meeting for implementation scheduling (8am daily change management meeting)

  • Design supports: taking 1 box down and using 2nd box to support the load using via Load Directors to redirect the traffic to provide for 24x7 without preventative maintenance windows

Reliability design

  • Fail over designed within a data center and within network

  • If data center fails, its hot spare automatically/transparently takes over

  • If network fails, redundant routing dynamically reroutes around failure

  • Real time replication implemented via, LDAP, Oracle Symmetric replication services, Oracle Parallel Server, home grown store and forward

  • Network built with redundant physical topology. The network arch supports every node with 2 routes in a triangle design; if one leg of the network to any node goes down there is a redundant physical path to get to it

Scalability design

Supports horizontal scalability design via:

  • Segregation of application functions by (FTP, Content, Search, e-Commerce)

  • Dynamic load balancing achieved through load directors (used to dynamically balance load between multiple E10000s), a home grown application to dynamically configure/reconfigure individual CPUs within a single E10000. This software manages/balances capacity variances among regions through dynamic reconfiguration of CPUs to regions within a single E10000 complex.

  • Quarterly capacity reviews

Availability metrics

  • 99.95% host

  • 99.85% across all applications

  • Target is 6 sigma (99.999%)

  • Application groups measured by application availability (accountability widely communicated)

Support requirements

  • 20 staff manage all Web sites (includes web masters and system administrators)

  • 12 hour shifts, 3 days per week

  • 5 boxes managed/system administrator

 

Again, although the best practices benchmark presented above is specific to e-commerce infrastructure, we have applied this same approach for numerous clients in such diverse areas as telecommunications billing design, operations management process design, enterprise software cost evaluation , data center budgeting, and other related areas. We have learned the incredible power of this approach. We invite you to as well.

To see the outline that drove this particular best practice benchmark click: benchmark outline