Wednesday, May 29, 2013

My policy on link spam in comments on my blog

More and more often I receive emails to moderate 'link spam', in other words links embedded in a comment on my blog that are primarily or solely intended for 'search engine optimization'.

The comments often say something like 'Great blog, good points'. Sometimes they are actually well-thought-out comments on the material in the post, and are attached to a relevant post. However I do not accept comments with links unless the comment and all the links meet the following criteria:

  1. The text on which the link is placed makes it clear to the reader of the link where it points, e.g. your company, your product, yourself or an informational site.
  2. The comment itself says something relevant, and is not there for the sole purpose of exposing the link.
  3. I believe that the linked page has relevant information (or a product or service) that matches the subject of my blog post and adds value to the post. The information does not have to agree with what I have said; in fact I welcome argument and contradiction.
  4. If the comment mentions a product, service or company then what is being marketed is something that I am not morally opposed to and think readers of the post could potentially benefit from (although I would not ever endorse or even verify products or services in links).
  5. The poster uses a verifiable identity. They must give their email, some other legitimate means of contacting them, or else the linked page or site needs to list a person with this name when searched. I sometimes will contact the person to verify it is them.
  6. The site being linked to is, in my opinion and at the surface level, legitimate and respectable and neither plastered with advertisements nor poorly crafted.

Here's an example: Today there was a comment from an accounting company on my post about solar energy. Upon visiting the company's website it seems the company provides services to help people cost-justify solar installations. Points 3, 4 and 6 seemed to be satisfied, so I would have accepted the link if the other rules had been followed.

Here is the text of the comment, however:

Hi, nice post. Well what can I say is that these is an interesting and very informative topic on solar energy financial management. Thanks for sharing your ideas, its not just entertaining but also gives your reader knowledge. Good blogs style
too, Cheers!
This kind of 'flattery' adds little of relevance. I would have accepted it as friendly encouragement if there were no links, but the presence of a link makes such wording violate rule 2, since there is no additional useful information.

To add something even slightly useful and not break rule 2, the poster could have said, "People buying solar installations may need help doing the needed financial analysis; companies like ours can help with that."

The link was buried under "solar energy financial management".  Since the page linked was not a general page about that topic (e.g. a wikipedia page or some other pure unbiased information site) then rule 1 is being violated. To avoid breaking rule 1, the linker needed to put the link on the name of the company.

Furthermore, the person leaving the comment gave the name of a person, but a search yielded no such person at the company in question, violating rule 5.

I suggest that bloggers in general adopt rules similar to mine.

Friday, May 24, 2013

UML in Practice talk at ICSE: And How Umple Could Help

I just  finished attending the ICSE talk by Marian Petre of Open University, entitled "UML in Practice"

She conducted an excellent interview-based study of 50 software developers in a wide variety of industries and geographical locations. Her key question was, "Do you use UML".

She found that only 15 out of 50 use it in some way, and none use it wholeheartedly.

A total of 11 use it selectively, adapting it as necessary depending on the audience. Of this group use of diagram types was: Class diagrams: 7, sequence diagrams: 6, activity diagrams: 6, state diagrams: 2 and use case diagrams: 1.

Only 3 used it for code generation; these were generally in the context of product lines and embedded software. Such users, however, tended not to use it for early phases of design, only for generation.

One used it in what she called 'retrofit' mode, i.e. "Not unless the client demands it for some reason".

That leaves the 35 software developers who do not use it (70%). Some reported historical use, and some of these did in fact model using their own notation.

The main complaints were that it is unnecessarily complex, lacks and ability to represent the whole system, and has difficulties when it comes to synchronization of artifacts. There were also comments about certain diagram types, such as state machines being only used as an aid to thinking. In general, diagram types were seen as not working well together.

She did comment on the fact that UML is widely taught in educational programs.

My overall response to this paper is, 'bingo'. The paper backs up research results we have previously published, which served as a motivation for the development of Umple.

Features of Umple that are explicitly designed to help improve UML adoption include:
  • Umple can be used to sketch (using UmpleOnline) and the sketch can become the core of high quality generated code later on.
  • It is a simplified subset of UML, combatting the complexity complained about in the Petre's research.
  • It explicitly addresses synchronization of artifacts by merging code and UML in one textual form: UML, expressed textually is just embedded in code, with the ability to generate diagrams 'on the fly', and edit the code by editing either the code or those diagrams.
  • It integrates diagram types: State machines work smoothly with class diagrams, for example.
  • Diagrams like state machines finally become useful in a wide variety of systems, not just embedded systems.
I hope that if Umple can become popular, then in a few years, we could do a study like this and report quite different results.

Scaling up Software Engineering to Ultra-Large Systems: Thoughts on an ICSE Keynote by Linda Northrup

Linda Northrup just gave an interesting talk at ICSE 2013 about ultra-large scale systems (ULS).

My takeaway from this talk are the following points:

  • ULS refers to systems with large volumes of most of the following factors all combined together synergistically to increase complexity: source code in multiple languages and architectures, data, device types and devices, connections, processes, stakeholders, interactions, domains (including policy domains) and emergent behaviors.
  • ULS systems run in a federated manner; they are on all the time, with inevitable failures handled and recovered locally, so as not to effect the system as a whole. The analogy to the functioning of a city (where fires occur every day) was very apt.
  • Build-time and run-time are one-and-the-same: Pieces of a system need to be replaced on the fly, and dynamic updating and reconfiguration needs to be possible.
  • They inevitably involve 'wicked' problems with inconsistent, unknowable requirements that change as a result of their solution.
  • Development can neither be entirely agile (due to the need to co-ordinate some aspects of the system on a vast scale), nor follow traditional 'requirements-first' engineering. On the other hand, parts of a system can be developed in an agile manner.
  • All areas of software engineering and computer science research can be used to help solve issues in ULS. Examples include HCI studies of how diverse groups of users use diverse parts of such systems, or computational intelligence applications to such systems.

She gave some examples including the smart grid, climate modelling, intelligent transportation and healthcare analytics. Actually It is not clear to me that climate modelling necessarily fits the definition. It may have large volumes of code, and run in a distributed manner, with federated models, and quite a few stakeholders and policy domains, but do a majority of the other factors above apply? Perhaps.

From my perspective, key to ensuring that ULS systems can be build and work properly are to apply the following techniques and technologies. However, in order to do this we need to properly educating computer scientists and software engineers with knowledge about these items that we know today, but which is not universally taught, and hence not applied:

  1. Model driven development (with tools that generate good quality code in multiple languages and for multiple device types)
  2. Distributed software architecture and development
  3. Rugged service interfaces so subsystems can be independent of each other, and have failsafe fallbacks
  4. Test-driven development: Where requirements are unknowable, it is still possible to specify those parts of systems that can be understood with rigorous tests. Subsystems so-specified can then be confidently plugged together as requirements evolve.
  5. Spot-formality: Formal specification of parts of a federated ULS system that are critical to safety, the economy, or the environment. 
  6. Usability and HCI to ensure that the human parts of the system interacts with the non-human parts effectively.

My Umple research helps address item 1, and is moving towards addressing items 2, 3 and 5. We deploy item 4 and 6 in the development of Umple.

Sunday, May 19, 2013

Some lessons from MiSE at ICSE

I just finished attending the two-day Modeling in Software Engineering workshop at the International Conference on Software Engineering in San Francisco.

Here are some of the take-away lessons for me (these do not necessarily reflect the ideas of the speakers, but my interpretations and/or extensions of their ideas)

Industrial use of modeling: There was very interesting discussion about the use of modeling in industry, but there seem to be two key and related directions for such use: Michael Whalen on Saturday gave lots of examples of the use of Matlab and SImulink in various critical systems (and particularly the use of StateFlow). Lionel Briand, on the other hand talked about using UML and its profiles to solve various engineering problems, again, however, he mostly focused on critical systems. In a panel he pointed out that most of the Simulink models he had worked with are just graphical representations of what could just as well be written in code (i.e. with little or nothing in the way of additional abstraction).

What struck me was that both presenters, and others, seemed to embrace what I might call 'scruffy' modelling: Briand talked about users adapting UML to their needs, and others talking about SImulink as  a tool that does not have the formal basis of competing tools, but nonetheless serves its users well.

Many people in the workshop pointed out that we need to boost the uptake of modelling. Various ways to achieve this were emphasized:

  • Improve education of modelling
  • Build libraries of examples, including exciting real-world ones, and ones that show scaling up
  • Make tools that are simpler and/or better so more 'ordinary' developers will consider taking up modelling
  • Allow modeling notations to work with each other and other languages and tools

It turns out that all four of these have long been objectives of my Umple project. So it seems to me that if the Umple project pushes on at its present pace, we stand to have a big impact.

Speaking of Umple, I gave a short presentation that seemed to be well received, although my personal demonstrations to a number of participants seemed much more effective with people appearing to be quite impressed. The lessons from this is that people really can see the advantages of our approach, but a hands-on and personal approach may work best, as a way to help people see the light.

Context: Another theme of the MiSE workshop that repeatedly appeared was 'context'. Briand pointed out that understanding the problem and its context is critical before working on a model-based solution; the modelling technique to be used will depend deeply on this context. Context can be requirements vs. design, or the specifics of the domain, the fact that space systems must be radiation hardened, or some aspect of the particular problem.

In my opinion, they are certainly right: Understanding the context is critical, and the tool, notation or technique needs to be selected to fit the context, However I also believe that we need to work on generalities that can apply to multiple contexts, in the same manner that general-purpose programming languages can be used in multiple contexts. For example the general notion of concept/class generalization hierarchies can be applied in almost every context, whether it be modeling the domain, specifying requirements for the types of data to be handled, or designing a system for code generation. I think state machines can also be applied in a wider variety of contexts, where people currently do not apply them: They are applied in many real-time systems, and they have been applied for specifying the navigation in user interfaces. But in my experience they can be applied in systems such as in this Umple example.

Testing: An interesting theme that came up several times related to testing: It was pointed out that it is worthwhile to generate tests from a model, but it also must be respected that in the context of a model used to generate code, these tests serve only to verify that the code generator is working properly! Such tests do not validate the model. Additional testing of the system is always essential.

Semantics and analysis: There was a lot of agreement that the power of modeling abstractions can be leveraged to enable analysis of the properties of systems. To do this however, it seems to me that semantics needs to be pinned down and better defined. 'Scruffy' use of UML and simulink seem to detract from these possibilities. Again, one of the objectives of Umple is to select a well-defined subset of UML, to define the semantics of this very well, and and to be able to analyse system designs in addition to generating systems from the models.