RE: IoT OS

From: Rang, Anton <anton.rang_at_isilon.com> Date: Fri, 22 Jan 2016 17:08:32 +0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:02 UTC

>Say all objects are connected peer to peer with wifi, some of them are connected to internet through gsm network or wifi to a box.
>These object are moving in space, and for some reasons, connections are dynamical and can be severely impaired or lost.
>
>They have incoming local streams of data (eg HD videos, accelerometer, GPS, other wifi and gsm signals, etc).
>
>I would like to abstract the CPU layer, storage layer, and internet connection so that in realtime results of one of my objects are saved
>if this object dies, so that if one of the object giving internet access to the group loose its connection, the redundancy allows the group of object not to lose internet connection.
>
>Can I consider these as different load balancing layers ? Do you recommend to implement this at the kernel layer or at an API layer ?
>Can I see that as a lightweight cluster ?
>
>I think the API is more flexible, especially if I have an heterogeneous (by CPU, OS) set of connected object. However, working at the kernel level allows existing programs not to be rewritten.
>What are your thoughts ?

===

OK, I think I understand your question now.

This isn't the right list for it, though I'm not sure where the right place to go would be -- it's not FreeBSD-specific, in any case. There are academic research groups looking into this type of problem; for instance, in the area of sensor networks (ACM Transactions on Sensor Networks covers some of these areas). There may be USENET groups which cover this area.

To cover your three areas, which I think require somewhat different solutions --

(a) CPU layer.  I don't really recommend trying to abstract this.  You could use a virtual machine to hide the underlying architecture, and checkpoint state periodically, but this is likely to slow down execution too much to be useful.  If the issue that a service may become unavailable, I'd recommend a middleware layer which can detect this and recover by starting a new instance of the service. Middleware layers like ZeroMQ, and clustering software, may be a useful starting point.  This does mean that stateful connections (like reading a video stream) won't recover cleanly, though; the client would need to reconnect to attach to the new instance of the service.  If you really need that, it's going to be hard.

(b) Storage layer.  Look into highly-available clustered storage solutions.  If you can use key-value or some other simplified storage model, do it.  There are clustered file systems but probably none freely available that would work on the scale you envision and give decent performance.  There are more alternatives if you're flexible about the format in which you're storing data (e.g. replicated object stores).

(c) Networking layer; or internet. If you can drop & re-establish a connection, or if every node has its own IP address (IPv6), this should be pretty straightforward; software could detect loss of connection and change the routing used to go through a different system. If not, you'll be a bit limited since mirroring TCP state between nodes would be too slow. This is a case where the existing operating system kernels are likely to do most of what you need; you simply need to add a layer to detect routing problems and select a new internet gateway appropriately.

I'd avoid implementing any clustering within the kernel, in part because if you have a wide variety of objects you may not want the same kernel on all of them, and in part because debugging & recovery is much harder. You're unlikely to want to run most existing software on such a system anyway (especially if they have relatively weak processors); you're better off writing to a set of clustering APIs for storage and state, at least. For networking, as mentioned, you can likely use the existing TCP stack & just add controls to redirect traffic as needed.

-- Anton