> Programming Techniques

Sometimes in the middle of a forum discussion, I find an opportunity to go off on a lesser-known topic I find particularly interesting, or one I've pursued extensively in the past. But even for people who've read my posts, it might be hard to distinguish my personal passions from other, more context-sensitive recommendations I make. So here's some pure, unsolicited enthusiasm, in the form of a list of projects I've worked on and the approaches they exemplify.

Project: Era

A module system for cross-platform programs. This project is incomplete, but the roadmap toward a first usable version seems straightforward.

Meaning-preserving modularity

Software updates are frequent—and frequently frustrating, as they discontinue entire features, introduce bugs, or merely change some incidental details that other systems shouldn't have depended upon... but did.

Mathematics tames this problem pretty well. A mathematical field might drift away from one methodology and toward another, but that doesn't make the old material any less rigorous.

I propose we develop our software so that it uses hard-to-break techniques to keep itself well-defined in the face of external changes. If a program's "import" criteria determine a mathematically unique entity, but do so in a nonconstructive way, then the implementation (construction) of that entity can be maintained as a separate module. If a program's imports refer to a cryptographically identifiable author, then the imported definitions are almost certainly going to be unique unless there's been an intentional attack or the author has made a mistake.

If a program's imports just refer to some simple author identifier string, then name collisions are quite possible, just as they are with the module systems in common use today. My goal isn't to stop us from building systems with flimsy imports like this, but to make it stand out as a red flag.

Era is my implementation of this kind of module system. Furthermore, Era will be platform-independent; if differences between platforms exist, they'll be represented as different sets of always-installed modules.

Weak expressiveness, strong meta-theoretical usefulness

I believe the programmers of the future will know what they need better than we do, and therefore any long-lived tool must be flexible enough to encompass their needs.

Many language designers use that reasoning as justification to make their languages very expressive, emphasizing such features as reflection, late binding, self-modifying code, and code generation.

At this point I use this reasoning as justification to prohibit programs from breaking through module boundaries. Programmers should have the ability to publish modules which other programmers don't have the ability to misuse. This way they can develop improved versions without fear that some users rely on petty internal details.

I think this is point of view is essential for developing a meaning-preserving module system. I'm making Era's most primitive operations very inexpressive, so that I can rigorously guarantee meaning-preserving modularity. I even hope to make it significantly less expressive than at least one well-established proof assistant (e.g. Coq, Agda, or Twelf), so that people writing proofs or verified software can share their code as Era modules. (We'll see how well I succeed at that.)

I also tend to deemphasize expressiveness when I'm thinking about potential IDE support or pondering the idea of a language for writing drama-managed interactive stories.

Project: Underreact

A framework for reactive programming in JavaScript.

Reactive demand programming

David Barbour has been building a programming model he calls reactive demand programming (RDP), and Underreact is my implementation of this model.

RDP is a reactive model, where instead of functions transforming values, we have behaviors transforming time-varying signals. Unlike other popular reactive models (namely functional reative programming (FRP)), RDP rejects instantaneous events, feedback loops, and local integration/accumulation operators, and it goes out of its way to permit behaviors to have side effects (prolonged over time) and a first-class distributable representation. RDP also ensures outputs are equal in duration to inputs ("duration coupling"), which makes it easier to trace them back an original cause and easier to let off the gas.

RDP's continuous-time semantics easily map to the ways a program actually interacts with the world. In contemporary programming languages, programmers must integrate keyboards, screens, and other sensors and actuators using event handlers and instantaneous side effects. This requires programmers to explicitly model histories and other intermediate state if they want to coordinate events over a period of time. But since programmers' primary building blocks use instantaneous events, the task of managing this state becomes almost as complex as the original problem.

RDP nips this vicious cycle in the bud by modeling I/O interfaces and state resources in continuous time in the first place. It's easy to express a behavior that happens as long as a button is held down, or as long as the button is in a different position than it was five milliseconds ago.

This isn't a complete explanation of RDP, and I won't be surprised if this is confusing. Perhaps it would help to read David Barbour's own explanation in the Sirea readme.

External state

Technically, external state is already an essential part of RDP. However, David Barbour's descriptions of RDP's external state seem to be somewhat inconsistent with the way I see it, so I'll elaborate here.

RDP treats state resources as being external to the main program semantics. So if the program enters a malfunctioning state, the information to diagnose it won't be encapsulated inside the running program itself. This makes it easier to augment RDP state resources with abilities like crash tolerance and dashboard visualization.

I'm interested in technologies that will reduce ethical confusion into the far, far future. I'm concerned that even our communication avenues will become ethically controversial, due to their (someday) exploitation or negligent byproduct of independent moral agents. External state is a nice way to circumvent this possibility. If our networks are stateless, then if something happens over the network and we're looking to assign blame or praise, chances are we won't assign it to the network itself. After all, the network doesn't remember what it "did," and it won't repeat the action unless similar inputs are given. Thus, we're unlikely to ever reach the conclusion that a stateless network is a person, or that it otherwise deserves the kinds of treatment we extend toward people.

Graphs, not ASTs

I think ASTs are an inconvenient consequence of our need to represent programs as one-dimensional text. I'm interested in making a programming language that reduces the need for ASTs by parsing the text directly into something like a multiplicative linear logic (MLL) proof net (which can be visualized very neatly as a box-and-arrow graph).

I haven't gotten very far with this idea yet, but it has influenced the way I think about my designs: I like to reason about programming language features in terms of data-control flow, even when that flow crosses the compile-time/run-time divide, distributes across multiple machines, or coils in fractal patterns due to state or recursion. Underreact is a project where I can potentially combine multiple kinds of data-control flow semantics into one set of operators, so it's also going to be the main compile target target if and when I develop a graph-targeting textual syntax.

Reactive staged programming

Recently, David Barbour and I hit upon an interesting spin on RDP, where we can potentially pass around first-class collections where each element exists at a different stage of a multi-stage program. Perhaps the term "data-control flow" would be more apt than "reactive" here, but think of it this way: We observe some values when stage A runs, some values when the generated stage B runs, etc., so we react to the execution of each generated program.

The crux of this technique is that each program stage is generated by recording the execution path of its predecessor stage, rather than constructing it from first-class data structures.

This is pretty interesting to me, because I've long been frustrated with the use of quasiquotation for multi-stage programming. Variable hygiene is cumbersome to enforce, and this makes it awkward to write abstractions that operate in multiple stages. But if we don't need to do explicit quasiquotation—if we just need to provide inputs that live in a later stage—then abstractions are very easy to write.

The catch is, this will be cumbersome to specify as a language. If we branch on a value in a later stage, the semantics in an earlier stage is that we follow all the branches' execution paths, so that they can all generate the program branches that might run in the later stage. For the earlier-stage branch code to know which branch it's on, it needs an earlier-stage value that represents a prediction of the later-stage value. (If I were only thinking of branching on booleans, maybe this would be easier....)

I think this is ultimately what Underreact will become someday, once the details are all ironed out.

Project: Lathe

A set of general-purpose utilities in JavaScript, Arc, and Racket. The Arc utilities are extensive but stale, and the JavaScript utilities are what I focus on the most.

Predicate dispatch

As multimethod dispatch techniques go, predicate dispatch is the most rascally dynamic of the bunch. At run time, each extension can run some code to decide whether it will handle the current call, much like the code comprising an if statement's condition expression. Of course, unlike an if statement, the extensions don't have to be listed all at one place in the code.

Predicate dispatch is a good way to manage code organization in a standalone project, but it does nothing to manage the chaos of an undisciplined community of developers. These are some guidelines I follow to live with the chaos:

  • No overlaps. If a single method call could be handled by either of two extensions, but only one is chosen, chances are the choice will be inappropriate and surprising in a few reasonable use cases. This tends to mean a module should only provide extensions for cases it actually introduces, such as cases involving its own custom data types.
  • If no extension will respond to a call, that's a fatal program error. The program should not detect this case and use a fallback behavior. After all, that fallback behavior constitutes a case that will overlap with future extensions.
  • The predicates should be computationally simple. If a method promises a particular limit of resource usage (e.g. time-complexity), the extensions might break that promise if their collective dispatch-deciding code surpasses the limit.

Parts of Lathe have been developed under the ideal that a community could build up around this kind of self-discipline.

More recently, I've set my sights on formal enforcement of forward-compatibility, using these informal policies as guidance. See my Era module system. Era isn't to the point of having this kind of extensibility, but I am developing it with multiple kinds of extensibility in mind.

Self-organizing precedence rules

Before I developed those guidelines for predicate dispatch, I was somewhat influenced by the design of Inform 7. The problem domain of Inform 7 is interactive fiction, where the concepts at play are not only fuzzy, but downright expected to be inexact and cleverly unique. An Inform 7 world is developed using broad-stroke declarations and some finer-grained amendments. Overlaps abound.

With overlaps comes a need for precedence control. Andrew Plotkin from the IF community has noted that Inform 7's mechanisms for dealing with the precedence of these overlaps would be more expressive if story authors could define their own custom precedence rules. His presentation culminates in the observation that once the precedence algorithm itself is extensible this way, authors will want to resolve overlaps in precedence rules by defining more even precedence rules.

After considering the corner cases, I've come up with a pretty reasonable way to build this turtles-all-the-way-down model: Each precedence rule takes a set of rules and returns a DAG of precedence judgments on those rules. Once we've calculated every DAG, we merge them one by one, trimming certain edges along the way to avoid cycles. Our trimming decisions are based on whether one precedence rule is already known to have precedence over another, or—brace yourself—whether one rule judges another rule's precedence and isn't judged in return. This last point lets a hierarchy of meta-rules and meta-meta-rules arise organically!

If you'd like a more in-depth description of my algorithm, for now you'll have to read the Arc implementation (circularly-order) or JavaScript implementation (circularlyOrder) in Lathe's source code.

Like predicate dispatch, this self-organizing precedence rule system may be a very expressive basis for prototyping other precedence systems, but it's probably too open-ended to provide effective support for an undisciplined developer community. It even faces an expressiveness paradox: The precedence rules won't have much to go on if the rules they're trying to judge are opaque (e.g. procedures), and yet any choice of metadata will seem like pointless boilerplate if no precedence rules pay attention to it. The metadata needs to strike some nice middle ground for this technique to really pay off.

If some language does take the plunge and use a precedence system like this one, I don't really have any discipline advice to share, except to say that I strongly recommend to avoid overlaps in the first place!

I keep this technique in the back of my mind for whenever I want to model something as fuzzy as an interactive fiction world. I've particularly thought about using it as a basis for TCG rules. In TCGs, it's common for a new card's text to cause contradictions when it's combined with old cards, so overlap resolution is a necessity.

Project: static site generator

This very website. It just so happens I improve the implementation much more often than I update the content....

Runs on mostly browser-compatible Javascript

There's no groundbreaking insight here. I prefer to make code reusable when possible, and a program that runs in a Web browser can be used from any Web-capable device, with no installation.

Since Web browsers don't provide much in the way of filesystem access or shell script integration, I write my static site generator code so it runs on Node.js too.

I code mostly in JavaScript itself, rather than using a language that compiles to JavaScript. I like the fact that it's standardized, and the kinds of modularity, abstraction, and syntax I pursue are barely possible on the client-side Web in the first place (e.g. they require <iframe> acrobatics), let alone possible in the kind of "idiomatic" code most languages strive to generate. :)

Minimalistic, programmable preprocessor

When I work with textual syntax, I like to use syntaxes that are easy to describe directly in terms of character sequences, and I like to avoid escape sequences and other cruft. Because of this, I tend to choose syntaxes that are based on nested square brackets (no backslash proliferation when nesting!), with a whitespace-delimited word at the beginning of each nested region to establish a meaning (and hence a parser) for the rest of the contents. It's very lisp-esque, but with more room for programmatic command over the parsing process.

I once worked on standalone languages (Blade and Penknife) with this kind of syntax built in. Now I instead handle it with a JavaScript library, chops.js, which is maintained as part of Lathe.

In my static site generator, I use a DSL for generating HTML markup, and chops.js gives me exactly the kind of syntax I want for this purpose. I also use chops.js as a preprocessor to give JavaScript a multi-line string syntax.

Extension dependency resolution

Sometimes all the details of a program are close at hand (inductive), and all it takes is some data processing to execute the sucker. Other times, the program is a potentially infinite expanse (coinductive), and we just want to find what we're looking for and get the heck out.

A while back, I began making Blade, a compiled language with user-definable syntax. The design didn't fully congeal, let alone the implementation, but the challenges it presented were interesting. Any module could contribute an extension element to a multiset, and any module could ask for the final value of the multiset before continuing its own contributions. Hypothetically, a typical module would start parsing and then immediately wait, because its parser wouldn't be fully defined yet. But if a module is waiting, then how can we know any multiset is complete?

When the system is already this dynamic, my answer is to give modules a way to make assertions about what contributions they're going to make in the future. Properly used, these assertions can provide enough information for the language to make progress on its dependency resolution.

I didn't apply this technique much in practice until I refactored my static site generator to be more declarative in structure. Now each page and each bundle of functionality lives in its own extension file, and these files import and contribute to global multiset names, rather than coupling on global filenames. A page can almost trivially be added or deleted from the site by adding or removing its file from the project, although hyperlinks to that page in the rest of the site will have to be updated manually.

I haven't developed this idea to the point where it's part of an elegant and solid module system for community use, but I consider it a promising ingredient for such designs.

I've often considered this kind of technique for my Era module system, but I usually find a way to avoid it, since Era modules have a rather rigid top-level structure.

Project: Cairntaker

An incremental garbage collector written in JavaScript, with support for ephemeron tables and interning of all immutable values.

I haven't tested Carintaker at all yet, and I bet it's pretty slow too, but it's been a good exercise in surpassing many of JavaScript's semantic limits (limited stack size, limited-precision integers, no ephemeron support, etc.), and I'll start from here once I set out to write an abstraction-leak-free language atop JS.

Everything's an ephemeron table

At least conceptually, every value handled by Cairntaker is a mutable or immutable ephemeron table (aka weak map). Much like lisps get away with representing many data structures using cons cells, Cairntaker does the same with tables. This gives ephemera a chance to be a seamless part of a language's reachability semantics, well-integrated with an egal notion of equality (for key lookup).

Cairntaker lets programs manipulate simple JavaScript data chunks like strings and numbers, but it treats them as though they're ephemeron tables, just like anything else. As far as a program is concerned, Cairntaker's comparison operation discovers some obscure key on these tables whose value happens to uniquely distinguish each of these data chunks. Cairntaker's tables are not enumerable, so it's easy to suspend one's disbelief that this underlying table representation actually exists, and that we could look up this key if only our program had a reference to it.

Other pursuits

I haven't made much out of the following technique yet, but I might return to it at some point.

Partial evaluation

It's mostly just a side interest, but my "Fexpress" project aims to show that fexprs, when used the way I (and others?) would use them in practice, can be compiled to untyped lambda calculus with no need for an interpreter at run time. Furthermore, if there are a few odd use cases that do make an interpreter necessary, I expect them to impact only local regions of the compiled program.

I haven't found a working system yet, but this project has taken me on a journey through topics of partial evaluation and staged programming.

Elsewhere on this page I describe reactive staged programming, which might end up connecting to this in some way.

My Sites

© 2012–2013 Ross Angle (rocketnia)

This page was last updated 3-Nov-2013.