Model Maturity Web

An (evolving) rubric for gap assessment & road mapping

Richard Arthur
10 min readNov 16, 2020
Model Maturity Web for assessing scientific software (See also PDF-slides)

For nearly two decades, I have found myself in the position of communicating opportunities to advance the state of the art in science and engineering through the use of digital models.

These opportunities clash with centuries of well-established empirical practices for scientific discovery and technology development via the scientific method. However, the scientific method in itself does not preclude the use of digital models in the way we have traditionally performed physical testing and experimentation — so long as the results can be sufficiently trusted and understood as a proxy for the real world problem.

Goal: A Rubric for Model-based Strategies

The existence of a framework through which we can clearly assess and describe the readiness of a modeling approach enables us to communicate:

  1. identified risks and gaps (as opportunities for improvement)
  2. comparative strengths (across alternative approaches, including physical)
  3. aspirational capabilities (to pursue as a pragmatic target or pilot study)

to guide decision-making relative to strategies, investments, prioritization of resources, and responsible limitations on use.

It is essential the rubric consider multiple factors, not attempt to derive an over-simplified score that misleads without considerable further explanations and caveats. Therefore, the rubric should communicate loosely-coupled factors independently and simultaneously.

Additionally, the rubric should afford a variety of acceptable targets based upon context of use. Therefore, the weighting or pass/fail across different factors should be adjustable.

Form and Function

Regarding the form of the visualization, my personal experience employing a “spiderweb” chart with diverse audiences over the past decade would indicate it sufficiently if not exceptionally intuitive for the task. Additionally, the Excel “Radar” chart can be used to easily create, modify and render assessment instances. The center of the web is a zero score and the outer rim is the best score (typically a 5). The chart below contains two solution options: red performs poorly compared to green.

Red scores: 1 (Confidence, Robustness, Productivity, Flexibility), 2 (Realism, Sustainability, Scalability) & 3 (Accuracy) vs. Green: 3 (Productivity), 4 (Realism, Confidence, Robustness, Scalability) & 5 (Accuracy, Sustainability, Flexibility)

It is also my experience that this chart is most compelling when it has exactly eight axes. I won’t debate this here — just try alternatives and I think you’ll agree. As a bonus, this also compels a necessary simplification of any rubic with more than eight factors. The eight in my first exposure to this kind of chart were aimed at System Performance [Whitworth, et.al., CACM 2006].

CACM May 2006/Vol 49, №5

Some of the factors within that 2006 model even map nicely to the most recent form. In fact, the rubric has evolved through at least 5 intervening iterations, including the version on pages 16–18 of the 2019 Explore report from the U.S. Council on Competitiveness.

Sparing a dive into specifics of the various contributing frameworks to date, I encourage reviewing the references at the end of this article. Leveraging the exceptional prior work from Dept. of Energy, NASA, ASME, and the Software Engineering Institute at CMU provides a sound foundation on which to base our factor attributes. (Most notably, Sandia’s Predictive Capability Maturity Model/PCMM). Some of the ongoing discussion stems from academic debate regarding which factors merit primary axes, and the taxonomical label and scope of their category.

The pragmatic function of the rubric is clarity and completeness in assessing and communicating the risks, gaps, strengths, and aspirational targets as described in the goals above. Specifically, three functional categories for the rubric to characterize are:

  1. TRUST in the approach: Model Competence
  2. VALUE in adopting: Cognitive Augmentation
  3. DURABILITY of that TRUST and VALUE: Architecture

While technically all problem-solving employs models (even if solely mental models), the dramatic and sustained improvements in microelectronics over the past 50 years have consequently driven breakthroughs in data storage and processing transforming scientific research and engineering development.

The digital paradigm fundamentally changes the rules from physical approaches — virtual representations can be created, replicated and transmitted near-instantaneously, nearly free. This potent swap of electrons for material can significantly reduce traditional costs and constraints.

Digital allows study under a “microscope” spanning otherwise impossibly large or miniscule scales of space or where time can be stopped or even reverse and flow backward. But to fully exploit these digital advances, the data and their processing by capable computational models must underlie collaboration and consistently span stages of product lifecycles.

Model Competence

Perhaps the greatest perceived risk in shifting from physical to digital models is the well-justified distrust in model results often cited through the adage “Garbage In — Garbage Out.”

Therefore, before any other considerations, our rubric must consider attributes of the model to test its credibility in the context of the intended use. Specifically, how can we characterize trust in the modeling approach?

Can we assert a “Region of Competence” for a model: where its use is numerically stable (ROBUSTNESS) with minimal simplifying constraints (REALISM) and quantify the bounds of uncertainty (CONFIDENCE) of results with validated, predictive ACCURACY?

Formalisms around “VVUQ” (q.v.) — verification, validation and uncertainty quantification embody an essential component to delivering model competence. Sandia’s PCMM, (q.v.) — predictive capability maturity model is an exemplar rubric for these factors. Further, employing model competence requires expertise in numerical and measurement disciplines to apply correctly, unless wholly embodied into the models.

Cognitive Augmentation

No matter how good the model may be, it is only valuable if it can be used efficiently and effectively. Here we consider the net benefit to the end-user (and ease to expand user community) as well as effort needed for continuity of use by the support team. In effect — can we rely upon stable operational availability and consistently beneficial throughput?

Can we affirm the labor invested in software management and end-use of the model will: yield PRODUCTIVITY from efficient workflows, reduce waste/rework and improve quality (SUSTAINABILITY)?

The software Capability Maturity Model (q.v.) and Agile development practices like continual test/integration/deployment contribute to implementation stability and quality, while the UX (user experience) practice can significantly improve human-machine interaction and workflow productivity.

Architecture

The final pitfall to examine considers the modeling software and implementation approach within the organic systems architecture spanning the software ecosystem and ever-advancing disruptions in the underlying hardware infrastructure. If indeed we trust the approach and finds its use valuable, is the approach agile to change?

Can we assure the implementation will perform capably on current and emerging HPC hardware (SCALABILITY), is interoperable with other software and feature-extensible (FLEXIBILITY)?

In short — cleverly balancing adoption of frameworks and standards for compatibility, portability and optimization vs. added effort to employ design abstractions and hedge implementation against a vast set of speculative architecture features.

Holistic Collaborative Assessment: Co-Design

Through these eight goals, we then form our spiderweb as a lens into the trustworthiness of the modeling approach, the ease with which it can be deployed, used and supported to yield valuable throughput in the science and engineering — and then the architectural adaptability relative to both hardware and software. The knowledge and expertise to make such assessments spans domain expertise, computational science and systems engineering — components of what the national labs have called Co-Design.

Co-design refers to collaboration between technical stakeholders to more effectively navigate the problem-solution trade-off space. A domain expert understands nuances of the inputs, the problem itself, requisite fidelity, etc. A computational scientist considers performance, scalability, usability, and productivity across numerical methods options: algorithms, data structures, frameworks, libraries, etc. exploiting computing architecture advances to break through prior bottlenecks impeding scale of problem size or performance efficiencies. Finally, a systems architect can guide procurement or even influence vendor roadmaps toward hardware infrastructure well-tuned for the problem sets seeking to surpass established barriers.

Scoring Factors

Assignment of scores should be guided by leveraging well-established 4-to-5 graded frameworks such as CMM or PCMM. Abstractly, we can measure a score in a generalized form:

By example, axis-by-attribute axis, consider an aspirational question and potential descriptions of the extremes — exemplary and poor:

Realism: Can solve problems relevant to collaborative decision-making
with acceptable, minimal or no assumptions / abstractions / simplifications?

Accuracy: Can cite results to key decision-making stakeholders without needing additional supporting data?

Confidence: Can credibly/sufficiently bound the error on solution results informing decision-making?

Robustness: Can assert limitations on valid parametric solution space over which the model can be applied and assess input sensitivities?

Productivity: Can DevOps modify/build/test/install / Can users set up problems, run & interpret results (with minimal instruction /effort — modern tools, processes, scripts)?

Sustainability: Can confidently add new DevOps to manage / Can confidently expand users to apply (with minimal instruction /effort /rework — modern UX & build/test/install/execute)?

Scalability: Can reduce wall-clock time and/or increase problem size/complexity and/or efficiently apply over expanding problem ensemble with additional hardware (and affordable software costs)?

Flexibility: Can adopt/adapt to new architecture ecosystems and use case workflows (with minimal instruction / effort — modern design for build / interoperate + leverage community)?

FAIR Data (and Software) Principles

Data have become a critical currency in scientific legitimacy, underlying essential aspects of the Scientific Method such as independent reproducibility and advancing community knowledge. To this end, four guiding principles have been identified and encouraged through academic and government programs — composed into the acronym FAIR.

FAIR data principles should be considered in maturity assessments; at minimum regarding Sustainability.

Findable: unique and persistent identifier + search-assisting metadata (crucial for Productivity, contributing to Sustainability)

Accessible: persistently retrievable (w/authorization) via standard protocols (crucial for Flexibility, contributing to Sustainability)

Interoperable: defined structure to integrate with diverse workflows / apps (crucial for Flexibility, contributing to Productivity and Sustainability)

Reusable: descriptive metadata of provenance, licensing, & use of standards (crucial for Sustainability, contributing to Flexibility and Robustness)

Conclusion and Opportunity

Digital technology can deliver immense value in providing tools that are particularly effective against the “seven deadly wastes” (or Muda) of LEAN. Reusable virtualized assets counter physical inventory, production lag and overproduction. Automation-driven consistency and instant situational visibility increase productivity and reduce waste, defects and rework. Model-based digital thread workflows streamline enterprise “motion” and reduce dependency-based waiting/hand-offs and underuse of talent.

Advances in computing hardware have driven development of sophisticated software systems now blossoming in the connected data and processes of the digital thread — powering the transformational practice of digital engineering. But the reliance upon the models — trust in the models, value from the models, durability of investment in the modeling approaches, these are what urge the assessment of modeling maturity.

The Modeling Maturity Web is proposed as a tool for assessing these factors to advise in selection between alternative strategies, to set targets for desired state of practice and to guide incremental steps to be taken when setting a roadmap toward those targets.

Additional References

--

--

Richard Arthur

STEM+Arts Advocate. I work in applying computational methods and digital technology at an industrial R&D lab. Views are my own.