Linda Northrup just gave an interesting talk at ICSE 2013 about ultra-large scale systems (ULS).
My takeaway from this talk are the following points:
- ULS refers to systems with large volumes of most of the following factors all combined together synergistically to increase complexity: source code in multiple languages and architectures, data, device types and devices, connections, processes, stakeholders, interactions, domains (including policy domains) and emergent behaviors.
- ULS systems run in a federated manner; they are on all the time, with inevitable failures handled and recovered locally, so as not to effect the system as a whole. The analogy to the functioning of a city (where fires occur every day) was very apt.
- Build-time and run-time are one-and-the-same: Pieces of a system need to be replaced on the fly, and dynamic updating and reconfiguration needs to be possible.
- They inevitably involve 'wicked' problems with inconsistent, unknowable requirements that change as a result of their solution.
- Development can neither be entirely agile (due to the need to co-ordinate some aspects of the system on a vast scale), nor follow traditional 'requirements-first' engineering. On the other hand, parts of a system can be developed in an agile manner.
- All areas of software engineering and computer science research can be used to help solve issues in ULS. Examples include HCI studies of how diverse groups of users use diverse parts of such systems, or computational intelligence applications to such systems.
She gave some examples including the smart grid, climate modelling, intelligent transportation and healthcare analytics. Actually It is not clear to me that climate modelling necessarily fits the definition. It may have large volumes of code, and run in a distributed manner, with federated models, and quite a few stakeholders and policy domains, but do a majority of the other factors above apply? Perhaps.
From my perspective, key to ensuring that ULS systems can be build and work properly are to apply the following techniques and technologies. However, in order to do this we need to properly educating computer scientists and software engineers with knowledge about these items that we know today, but which is not universally taught, and hence not applied:
- Model driven development (with tools that generate good quality code in multiple languages and for multiple device types)
- Distributed software architecture and development
- Rugged service interfaces so subsystems can be independent of each other, and have failsafe fallbacks
- Test-driven development: Where requirements are unknowable, it is still possible to specify those parts of systems that can be understood with rigorous tests. Subsystems so-specified can then be confidently plugged together as requirements evolve.
- Spot-formality: Formal specification of parts of a federated ULS system that are critical to safety, the economy, or the environment.
- Usability and HCI to ensure that the human parts of the system interacts with the non-human parts effectively.
My Umple research helps address item 1, and is moving towards addressing items 2, 3 and 5. We deploy item 4 and 6 in the development of Umple.