Manageability Maturity Model

Manageability Maturity Model

The Manageability Maturity Model is useful as a planning and self assessment tool.  The way to use the tool is to look at the indications to determine where your product manageability level is, then look at the impact that being at this level has on customers and TCO.
 
Then look at the kinds of investments that are reasonable, as well as common issues as teams enter each level.  This tool is useful for engaging with the MP Best Practices team in a discussion of ways to improve with near term, mid term and long term investments, and is useful in framing trade-offs with different approaches.
 
Level 0
Chaotic
Level 1
Documented
Level 2
Basic
Level 3
Standardized
Level 4
Rationalized
Level 5
Dynamic
Indications
Instrumentation exists
Goal is developer level diagnostic & Trace
Self Documenting
All of level 0
Events have KB articles; Event Viewer links
Knowledge articles focus on trouble-shooting
Rudimentary MP possible, but afterthought if present
Instrumentation supports trouble-shooting.
Details are symptomatic, not focused on causes or with regard to action possible.
All of level 1
Symptom biased instrumentation
Few root causes are identified by direct instrumentation
KB articles focus on diagnosis, then repair and remedy
Transients look like problems
False alarms exceed actionable alerts
Change up in instrumentation approach
Root causes starting to be identified by instrumentation
Instrumentation focuses on issues not just symptoms
MP grade is mostly actionable
Matching knowledge article starts to include proactive maintenance steps
Tier 1 starts to close more alerts than Tier 2
Root cause issue detection supported by instrumentation
MP enables high levels of automation
Tasks support automated repair via human driven “launch”
Focus on prevention starting to appear
 
All of 4
MP contains diagnosis and remediation actions linked to health state transitions
Health state is highly actionable and customers can turn on “automate this” when they come to trust a  monitor’s accuracy
Focus on proactive
Pro-Packs included
Characteristics
How-To by Google
Product Specific Books are primary resource
MSDN & Blog articles
Product Specific WebSites and blogs for self-serve community knowledge
 
All of level 0
MMD -> TechNet documentation of events
Manual trouble shooting is normal
Event Viewer link
MP’s are 2K5 conversions or equivalent.
High noise rate if monitored
MMD-> MP converter used
MP contain few problem specific diagnostics and tasks
MP automates Technet KB presentation
Events trigger alerts
MP increases costs due to high tuning costs and high false alarm rates
Knowledge articles  focus on avoiding the problems
Monitors outnumber rules in MP
Actionable alert ratio > 50%
Common issues managed by tasks
Management pack reduces outage duration
SLA becomes the focus for operations
Change control in place
Diagnosis starting to become automated (causality by model)
Tier 1 handles most issues
Actionable alert ratio > 70%
Common issues automatically repaired by tasks
Prolonged Outages uncommon
MP automates diagnosis and have links to tasks
Management packs drive real time decisions
Many issues handled by resolvers and tasks launched by tier 1
Actionable alert ratio > 85%
Tricky issues become manageable by T1
MP automates repair
Correlation used to disambiguate symptomatic measures
Customer
Experience
Bring Specialists
Product Expertise required
Random events to learn
Manual trouble shooting
General admin certified expertise required for each product
Browsing event log entries required to detect issues
Escalation to product specialists
MP’s are noisy, mostly consist of rules that trigger alerts
High MP tuning costs
Thresholds monitored by MP’s are tunable
MP generates many alarms, most of which are not actionable
Customers using MP catalog extensively
Mix of MP’s becomes a cost of ownership issue
Alert floods take down monitoring systems
 
 
Issue detection cause direct alerts that are actionable
Actionable alerts outnumber non-actionable
MP tuning costs coming into “reasonable” levels
Consolidation rules are present
Customers learn which MP’s are noisy and avoid them
Trust in monitoring systems emerging
Perceptions of product manageability = “ok”
Few noise level alerts
Tuning out of the gate a small investment
Perception of product manageability = “good”
Small staff can run many applications and hosts
Skill levels required to do Tier 1 are optimized
Product specific SLM included in MP
Problems detected as they happen, before outage conditions
MP tuning costs are minimum
Customers equate great management experience with product quality
Task worker UI intuitive and helps to see broader business state
MP provides product specific rollups for dashboards and SLM
Impact
High reliance on product specific expertise
Must hire MS certified product specialists to manage portfolio
MSFT servers have dubious reputation for manageability
Slight cost reduction
MSFT is seen as helping with costs
Source of information centralized, “feels good”
MP enables automating some of operations manual efforts
Tuning costs become concern
Closing monitors vs closing alerts is next frontier in cost
Costs increase because non-actionable alerts happen frequently
Instrumentation is ambiguous, wish for root cause analysis
Mix of MP’s becoming a cost and performance concern
Health monitors are considered accurate
Costs come back into reasonable levels
Ops manager seen as major cost saver when instrumentation and MP maturity are reasonable
Number of ambiguous alarms (requires escalation) < 50%
Small staff can manage thousands of machines, hundreds of applications
Escalations are automated/tracked.
IT management costs are competitive advantage
“Noise” less than 20% of all alarms
IT management costs are best in breed
Flexible capacity management lowers capital costs
MSFT advantage due to best of breed integrated management
Escalations and workflows can be fully automated and triggered by monitors
 

 

Leave a Comment
  • Please add 3 and 3 and type the answer here:
  • Post
Wiki - Revision Comment List(Revision Comment)
Sort by: Published Date | Most Recent | Most Useful
Comments
  • Fernando Lugão Veltem edited Revision 5. Comment: added tags

  • Dan Rogers edited Revision 4. Comment: Messed up the back link

  • Dan Rogers edited Revision 3. Comment: Messed up the back link

Page 1 of 1 (3 items)
Wikis - Comment List
Sort by: Published Date | Most Recent | Most Useful
Posting comments is temporarily disabled until 10:00am PST on Saturday, December 14th. Thank you for your patience.
Comments
  • Dan Rogers edited Revision 3. Comment: Messed up the back link

  • Dan Rogers edited Revision 4. Comment: Messed up the back link

  • This is a great model, thanks so much.

    Does Microsoft have anything like a "template" that the developers could fill out and provide to the SCOM team?  Meaning, as the developers are building their applications with monitoring in mind, they could be filling out this template with certain event IDs to monitor for....ect.  

    Thanks,

    Tom

  • Fernando Lugão Veltem edited Revision 5. Comment: added tags

Page 1 of 1 (4 items)