Thursday, January 13, 2011

Some Commentary on a Recent eetimes Article and a Referenced Paper

Some Commentary on a Recent eetimes Article and a Referenced Paper


An eetimes article in their online edition of a 1/10/2011, “Rapid development and reusable design for the connected car” by Kristopher Cieplak, was of interest to me.  See http://www.eetimes.com/design/embedded-internet-design/4212037/Rapid-development-and-reusable-design-for-the-connected-car?Ecosys.  The article refers to a framework that seemed to have similarities to that of my Exploratory Project.  Further, it referenced a paper by another QNX Software Systems author Ben VandenBelt, “Persistent Publish/Subscribe for Embedded Industrial Applications” that can be located at www.qnx.com.  This later paper is about the framework.

Some paragraphs of interest are as follows along with my interspersed comments concerning some of them.  In my comments I use application to refer to either a server or client.  An application can be a server to particular applications of the network and a client of others.  An application can refer to a partition or a separately loadable Windows process.  An application consists of numerous components that can also be a server to other components of the application and a client of others; that is, the component can be moved from one application to another without the need to modify the component.

The VandenBelt paper mentions “send/receive/reply (or synchronous) messaging” which is one of the options that can be selected for mC.  However, VandenBelt states that “Send/receive/reply messaging closely couples sender and receiver.  Every server communicates directly with its clients, and must know how to respond to all client messages.  With messaging thus closely coupled, a change to one software component may require changes to other software components, slowing or hindering system development and increasing system fragility.” 

That is not a problem with mC since the application that treats any particular message topic can be modified with the set of applications determining the application and software component that treats a message at power up via the subscription/registration messages and with the reply returned to whatever component sent the request. 

However, the point is made that “as a system increases in scale and as diverse components are added to it, that system rapidly grows in complexity and becomes increasingly brittle – difficult to upgrade and scale while ensuring performance and reliability.”  mC has never had to grow much so the subscription/registration messages might become a problem.  And, of course, as a one person experiment, there has never been anything done, as yet, concerning an application dropping offline, coming back online, etc. 

mC does use direct point-to-point connections – at least it doesn’t use a series of applications such that one (or more) application is merely passing the message along towards its eventual destination – if that’s what is meant by “direct”.  Of course it doesn’t do that for asynchronous messages either.  It requires that each application with its set of components can communicate directly with each of the other applications that are either its clients or its servers (or both).

The VandenBelt paper refers to an Embedded Design article “Making the case for commercial communication integration middleware” by Jerry Krasner.  It only seems to mean by “direct” that the applications are “socketed” together and that the old applications that now use the new application have to be reworked to take into account the protocol being used by the new application. 

The mC framework supports an integrated set of applications where each uses the same protocol.  However, non-compliant applications can be added to the network by adding a translation component to a compliant application.  For instance, a Display Application was added to the exploratory project to interface to Windows to display Windows forms and controls and transmit click, enter, etc events to compliant applications for treatment.

To avoid the need to modify existing components, except any that must treat the Windows event or produce a modification of a form or control, a translation component was added to each application that must directly interface with one of the Display Application “layers”.  These translation components are receive/transmit components that receive the non-standard message (non-standard in regard to the mC framework protocol) and convert it to contain the standard protocol header while retaining the data of the received message and then publishing it to be forwarded by the framework to the component (in whichever application it might reside in) that had subscribed for it.

The reply, if any, and other messages being sent to the non-standard application are first sent to the transmit sub-component of the translation component (which has previously subscribed for them) to be converted to the protocol of the non-standard application and then sent to it. 

In this way the non-standard application stays as is and the treating component stays as is except for the addition of the translation component to the application of the treating component.  If such a minimal change requires that all the unchanged components be retested, then, to avoid this problem, it should be possible to insert a new translation application (i.e., partition) that takes the place of the translation component.  Then the message would be routed from the non-standard application to the translation application and from it to the standard protocol application and vice versa.

Other paragraphs of interest to me from the Ben VanderBelt paper are:
An Object-based System


Publishing is asynchronous, PPS objects are integrated into the PPS filesystem pathname space.  Publishers modify objects and their attributes, and write them to the filesystem.  When any publisher changes an object, the PPS service informs all clients subscribed to that object of the change.  PPS clients can subscribe to multiple objects, and PPS objects can have multiple publisher as well as multiple subscribers.  Thus, publishers with access to data that applies to different object attributes can use the same object to communicate their information to all subscribers to that object.

This seems directly similar to Upon Demand topics – objects as referred to in the VanderBelt paper – of the mC framework where the consumer component can subscribe to be notified when a new instance of the topic is published.

Push or Pull?

In its default implementation, the QNX PPS service acts as a push publishing systems; that is, publishers push data to objects, and subscribers read data upon notification or at their leisure.

However, some data, such as packet counts on an interface, changes fare too quickly to be efficiently published through PPS using its default push publishing.

QNX PPS therefore offers an option that allows a subscriber to change PPS into a pull publishing system.  When a subscriber that opened an object with this option issues a read() call, all publishers to that object receive a notification to write current data to the object.  The subscriber’s read blocks until the object’s data is updated, then returns with the new data.

With this pull mechanism, the PPS subscriber retrieves data from the publisher at whatever rate it requires – in effect, on-demand publishing.

Persistence

A Persistent Publish/Subscribe service maintains data across reboots. …

System Scalability

With PPS, publisher and subscriber do not know each other; …

The loosely-couple PPS messaging model also simplifies the integration of new software components.  Since publisher and subscriber do not have to know each other, developers adding components need only to determine what these new components should publish, and what data they need other PPS clients to publish.  No fine-tuning of APIs is required, and system complexity does not increase as components are added.


With PPS, components do not even need to be aware of each other’s existence on the system.

The pull publishing concept represented by
“QNX PPS therefore offers an option that allows a subscriber to change PPS into a pull publishing system.  When a subscriber that opened an object with this option issues a read() call, all publishers to that object receive a notification to write current data to the object.  The subscriber’s read blocks until the object’s data is updated, then returns with the new data.

With this pull mechanism, the PPS subscriber retrieves data from the publisher at whatever rate it requires – in effect, on-demand publishing.”
seems interesting.  mC has a concept that somewhat corresponds.  It uses Publish/Subscribe for the Topic Delivery Protocol (versus Point-to-Point) as well as Most Recently Published (MRP) for Topic Permanence.  (Note:  Permanence in mC is different than Persistence.)  For this combination, mC prepares a MRP instance of the published topic each time the publishing component executes and writes the topic.  However the subscribers are not notified.  Instead, each one gets the most recently published data when it reads.  Therefore, the subscriber retrieves the most current data at its own rate.  Since a low priority subscriber could still be reading a previously published instance, a non-used buffer is utilized to publish a new MRP message instance.

The following is extensive quoting the eetimes article in regards to features where the QMX framework seems similar to my exploratory project or could be applicable to it.

Persistent Publish/Subscribe

PPS is an object-based service with publishers and subscribers in a loosely-coupled messaging architecture.  Any PPS client to the service can be a publisher only, a subscriber only, or both a publisher and a subscriber, as required by the implementation.

Publishers and subscribers only need to be able to read and write objects and their attributes in the PPS file system pathname space. Subscribers must of course know what objects and attributes interest them, and publishers must know what objects and attributes may interest subscribers, but neither publisher nor subscriber needs to know anything more about other parts of the system. Objects are written to permanent storage, offering persistence across reboots.

We implemented PPS to handle messaging between Adobe Flash applications and all data source publisher components; that is, for Webkit (browser), Bluetooth, GPS, audio volume control, etc. Chief among the advantages offered by the PPS messaging model is that the API between the components is consistent and loosely coupled.

Just as PPS allows us to redesign our HMI without touching the underlying applications, it allows us to add new componentsvehicle telematics or ITS awareness, for instance — to our QNX CAR implementation without the time-consuming development usually required with other messaging paradigms. All that is needed is that all parties know what they need to publish, and what they need to read in from PPS. Further, this architecture ensures that other components do not require changes to accommodate new additions — changes which, as every software developer knows, at best require exhaustive testing, at worst introduce errors.

PPS thus provides us with a messaging model that allows us to add new components to our in-vehicle system with minimal effort beyond the HMI work required so that users can use the new capabities offered.

Resource separation

The technologies we chose for QNX CAR offer two techniques (beyond standard process and memory protection) for managing the impact of new applications on an in-vehicle system. First, the Adobe Flash-based HMI allows us to run a secondary Flash player whose virtual machine serves as a "sandbox" for running untrusted applications. Second, the QNX  Neutrino RTOS offers adaptive partitioning, a unique technology that dynamically offers unused CPU time to processes that need it, but guarantees resources to critical processes.

Secondary Flash player

To ensure that a newly introduced application does not introduce problems to our system, we chose to implement a secondary Adobe Flash player. The player is reserved for untrusted applications; that is, for applications that we have not confirmed will run cleanly without adverse effects on the reliability and performance of other applications, or of the system as whole.

Like all Flash players, this secondary player runs in its own virtual machine environment, and hence can be neatly separated from the rest of the system.  Applications in this secondary player’s virtual machine environment can not deprive applications in the primary player or other components in the system of the resources they require.

This simple approach allows us to try virtually any Flash application we choose, without worrying that it might bring down the system. In fact, any developer could write an application and run it on this secondary player without danger. He could use adaptive partitioning to ensure that the secondary Flash player did not interfere with applications in the primary Flash player, and that the applications this secondary player ran would not starve the system.

For a more detailed description of QNX PPS, see Ben VandenBelt, "Persistent Publish/Subscribe for Embedded Industrial Applications". QNX Software Systems, 2010. www.qnx.com.

Partitioning

Resource partitions are the traditional method implemented in OSs to protect different applications or groups of applications from each other. They are virtual walls that prevent one application from corrupting another application, or from starving it of  resources. The primary resource protected by partitions is CPU time, but partitioning can be used to protect other shared resources, such as memory or file space (disk or flash).

Fixed partitioning guarantees that processes will get the resources specified by the system designer, but lacks flexibility.

Rigid partitions

Traditional, rigid partitions are relatively easy to set, and they are effective; they guarantee that each application receives the resources specified by the system designer. They may not always be the best solution in embedded systems, such in-vehicle systems, however.

In-vehicle systems are embedded systems, which means that the boards they run on are usually constrained by power, heat and cost considerations. Power requirements are less stringent for in-vehicle systems than for consumer devices, such as smart phones. On the other hand, however, these systems are expected to run for the lifetimes of their vehicles, during which they can expect a steady increase in the load they are expected to handle, as more and more technologies, capabilities and applications become available.

In short, an in-vehicle system OS must not only be able to guarantee resources to critical processes, it must also not waste resources. When an application requires more resources than the original design foresaw, the OS must find a way to not provide them. It must not leave resources waiting unused because they are reserved for critical processes. It must offer them to whatever process needs them, but be able to get them back when they are need to ensure the system always meets designed reliability and performance requirements.

Adaptive partitioning

To meet these apparently conflicting requirements typical of embedded systems, the QNX  Neutrino Real Time OS (RTOS) implements adaptive partitioning. Adaptive partitioning is more flexible than traditional fixed partitioning models. It guarantees time to specified processes, just like traditional partitions. However, unlike traditional partitions, which are fixed, adaptive partitioning automatically adapts partitions to runtime conditions.

An adaptive partition is a set of threads that work on a common or related goal or activity. Like a static partition, an adaptive partition has a budget allocated to it that guarantees its minimum share of the CPU's resources, but it responds to dynamically changing runtime loads in order to always make full use of all available CPU cycles.

Adaptive partitioning is more flexible than rigid partitions. It is a set of rules that protect specified threads and groups of threads, and is an excellent solution for dynamic embedded systems.

First, unlike a static partition, an adaptive partition is not locked to a fixed set of code in a static partition; developers can dynamically add and configure adaptive partitions, as required.

Second, an adaptive partition behaves as a global hard realtime thread scheduler under normal load, but can still provide minimal interrupt latencies even under overload conditions. Third, with adaptive partitioning, the thread scheduler distributes a partition's unused budget among partitions that require extra resources when the system isn't loaded.

Adaptive partitioning is thus ideal for dynamic systemssystems that must perform consistently despite rapidly changing demands made on them.

Conclusion

The advent of the connected car brings in-vehicle systems into the rapid delivery cycle of consumer device technologies. Automobile manufacturers and tier one suppliers can design their in-vehicle systems both to mitigate the challenges of reconciling the development cycles of software and steel, and to open the door to new opportunities.

Our experience with QNX CAR suggests that a system with an Adobe Flash Lite user interface communicating with underlying components through PPS messaging is the most effective solution. It facilitates user interface branding, localization and customization without impact on underlying components, and it simplifies the addition (both during development and in the future) of new applications and components.

Running a secondary Flash player allows untrusted applications to be downloaded and used while protecting other applications and the system.

The QNX Neutrino RTOSs adaptive partitioning technology is also available to make maximum use of available computing power; more flexible than rigid partitions, adaptive partitioning offers available CPU cycles to processes that can use them, while guaranteeing resources to critical processes and applications.

So much of the QNX CAR system is the RTOS that is not applicable to the Exploratory Project since it is Windows based.  Just the portions about easily adding or moving applications seems directly applicable and similar.  So it is an example of the usefulness of this ability.  While, the adaptive partitioning might be something that could be added to an aircraft system is it could only be used for Level C or D applications.  I would suppose that it could never be used for more critical applications so such an operating system would need to limit adaptive scheduling to particular applications.