Reactive Data Processing (The Story)

Posted by moodyharsh on 2016-06-30

RDP
Issues / Feedback

Genesis


I came across Pure Data ~ 2 years back.

This year I came across the notions of asset pipelines and streams
thanks to node.js and gulp.

This struck me as peculiar as pipelines are a part of shell
programming as well.

I investigated further and I found a multitude of names like

  1. Pipes and Filters
    1.1 Streams
  2. Component Oriented Programming
  3. Data Flow Programming

The following seemed related as well

  1. Event Oriented Programming
    4.1 State Machines
  2. Workflow Engines
  3. Messaging
    6.1 OOP
    6.2 Parallelism
  4. Spreadsheets

More importantly, in existing Engineering domains
Data Flow is the norm.

  1. Electronics
    8.1 DSP
  2. Avionics

Making Music has taught me a bit of DSP.
I had to learn hands-on what tweaking a DSP filter felt like.
I had an understanding of effect chains and mutli-track recording.

I found some free time in my hands so I decided to implement a
tiny framework for Data Flow.

I read javelin’s source code and felt confident enough to attempt
this. javelin implements a powerful Spreadsheet Engine in very
few lines. It is a part of hoplon webframework.

I decided to call it Data Processing rather than Data Flow Processing
as a homage to the pre-computing Data Processing Machines of IBM.

I disliked Software at this point, having found that most
Music Softwares pale in comparison to the feel of Hardware Synths.
I used Breadboard and Electronics analogies instead Software’s.

I set out to answer

  • Can Low Level ideas Scale ?
  • Are they Readable ?

As of now, I believe Low > High on even more accounts.

Implementation


I scanned Pure Data’s implementation and implemented
inlets/outlets and objects(Systems) in js.

I added abstractions of Namespace and Symbol, taken from Lisp, for addressability.

At this point I felt a dire need for State and Data Storage handling.

Having implemented a prototype of Entity Systems earlier
I added it to RDP.

State was simply implemented as this["foo"] = ... since
Systems are Live Objects.

I started writing an example app called Esti.

The first challenge was to define Data.
I settled on the definition of Data as a Map<Slot, Scalar>.

Slot is a string.
I modelled Scalar after Lisp / Perl.

The second challenge was UI development.

Although I love Pure Data’s visual Live Object Oriented Programming,
I decided for a more traditional UI with Reactjs. DSP like Data Flow can be done
in a traditional UI. Music Softwares are example of that.

I added a Message Bus for decoupling UI from Data Flow.
When Systems were done processing Data they could trigger an Impulse on the Bus.
The Bus would then interrupt another System.

In Electronics terms, messages are like Discrete Signals.
Pure Data also has a similar notion.

The Application was pretty straightforward.
It had a readable list if/else statements in one place, unlike OOP.

During development, I discovered many important properties of Data flow based design

  1. Diagram Generation

,
It is difficult to explain to a stranger about what is happening here.
But to someone who is co-developer / manager ?

One can easily point out things like

  • What systems are working
  • Where the problems are
  • How does the data flow
  • Where new ones are needed
  1. Simulation

This.

That’s 330 task events, each of which has a clock that ran atleast
randomInt(3600, 7200) ticks to simulate the app in ~ 100 lines of js
.

Since Data between Systems is communicated explicitly, it is very easy to fake it.
This advantage is also gotten with pure Messaging.

  1. Step Debugging ?

This is like an Electronics Engineer reading the input and output
signals of a Component.

I feel

Electronics is like Broadway and Software is more of a Movie.

  1. RPC

I call this “mirroring” where Signals are spread across interested Systems.
In this case, Signals by the User.



I picked a Japanese symbol (~ transfer) for a logo.


Limbo


At this point, I discovered

  1. A book
  2. NoFlo
  3. fbp

I interacted with the fbp community and discovered
that RDP comes under the reactive spectrum. Noflo is
both reactive and classic.

Hole in the Flow


RDP 0.3 has a major flaw.

As I used this to hold State, A System could only be a part of
one Data Flow at a time.

This seems fine for single user applications but for
multi user applications (servers, games ..) it causes data corruption.

Making RDP functional felt like yet another puritan way of avoiding the problem.

It was hopeless until I found out about Monsoon: an explicit token-store architecture.

TL;DR Systems can have as much State as they want want
as long as a new Token is issued for each Data Flow.

A Token has a Frame analogous to a

  1. A Session Object (more)
  2. A Stackframe (less)
  3. A Continuation (lesser)

This makes RDP unique as it implements Data Flow Programming on top
of a Data Flow Processor Architecture.

Nojs


C++ style inheritance is needed for RDP.
js sucks for this.

Coffeescript’s model of OOP doesn’t translate well
for other js programmers.

I ported to ruby because of this.
Thanks to opal I can target js as well.

I renamed Data to Datron.
It has a merge method d1.merge(d2), which I hope makes it more like electrons.

Viva Low Level!


The move from Assembly to High Level has resulted in a
loss of significant number of Abstractions

  1. Memory Management

Manual management and Layout leads to efficient Data Structures.
High Level languages are plagued by overuse of Maps.

Low Level programmers are more keenly aware of Memory Corruption, Leaks
Estimation and Calculation.

Since a good Data Structure can reduce Algorithmic Complexity,
Memory Management can affect perceived Speed and Power Consumption.

  1. Interrupts

Assembly languages have First Class Event Oriented Programming built-in.
High Level languages use external APIs.

In this sense, Go’s Channels are a step backward.

  1. Advanced Data Structures

Built-in support for matrices, queues, stacks, bounded buffers, caches …

Aren’t High Level languages narrow in providing just
Array operations ?

  1. Awareness of OS and Concurrency

Again, built-in support.
High Level languages create a rigid shield between the programmer
from the very system they are trying to code.

  1. 1-1 Mapping with the Host

No action at a distance.
This makes Debugging straightforward.


Co-routines are of the highest order of flexibility
and it’s just one of Assembly Programmer’s Hacks.

Have we forgotten how the Wheel was Invented ?