Home

Foolproof HTML,
what's under the hood?

Foolproof HTML editing is based on rows & tokens. Rows gets parsed to a tree, based on their indentation.

This model is very similar to how whitespace sensitive short hand template languages like Pug, Haml & Slim work. Instead of free form text though, everything is in discrete blocks, atoms, or whatever you want to call them.

I don't know much about the theory of programming languages, but comparing to programming: I think this is comparable to nested blocks (that have identifiers) and expressions inside the blocks. I think abstract syntax trees work a bit like this too, except they are explicit trees, where as this is an indented row structure that gets parsed to a tree. There are examples in a bit.

I think this is that case of: Know enough to be dangerous. I'm trying to be pragmatic and do this in a way that allows me to make progress, instead of trying to do it "the right way" and get stuck.

A document is an ordered list of rows.

Each row could have different rules applied to it based on the file language, what type the row is and where it is in the document. HTML element type rows could have children (rows that are indented more than the row we're looking at), but HTML text type rows could only have siblings, just like in the browser DOM. With CSS, an @import rule could not have children and could not be a child of any block, but othet block types could be nested.

Each row has an ordered list of tokens.

Each token gets parsed with its own kind of logic, so if a token is an html tag name, it should have autofill based on its location in the document, so an if you were making an element inside a <ul>, it would automatically offer an <li>. In CSS, a background-image value could be parsed as a gradient for example, and the editor could offer a tailored UI or suggestions for that. For these kinds of context specific helpers, many existing editors already have lots of useful stuff. (Like chrome devtools has for editing css colors.)

This structure could work for stuff other than HTML

I'm trying some examples with other languages too, here we go.

If you think of the structure presented as javascript, here's the gist of it.

token = {type: '', content: ''}
row = {indentation: 0, type: '', tokens: [token, token, ...]}
rows = [row, row, ...]

Take this bit of javascript.

function name () {}

That's a block that has a type of function and two tokens: keyword & identifier. It would be something like this in Foolproof

rows = [
  {
    indentation: 0,
    type: 'function',
    tokens: [
      {type: 'keyword', content: 'function'},
      {type: 'identifier', content: 'name'}
    ]
  }
]

This next JSON block has a type of array and two strings as its children.

"people": [
  "John",
  "Jason"
]
rows = [
  {
    indentation: 0,
    type: 'array',
    tokens: [
      {type: 'key', content: 'people'},
    ]
  },
  {
    indentation: 1,
    type: 'string',
    tokens: [
      {type: 'string', content: 'John'},
    ]
  },
  {
    indentation: 1,
    type: 'string',
    tokens: [
      {type: 'string', content: 'John'},
    ]
  }
]

The array row must be given the type of array, so we know how to treat it and its children. Not sure if its token needs to be named key, maybe the key is just a string like the string tokens in the string rows, that's how the JSON spec describes keys. Haven't figured everything out yet. Not sure how this would work for empty arrays and stuff, these examples are quite cherry picked.

Let's move on to HTML.

<label for="e">Type in your email</label>
<input id="e" type="email" value="john@jason.com">

That html row would be expressed like this.

rows = [
  {
    indentation: 0,
    type: 'tag',
    tokens: [
      {type: 'tag-name', content: 'label'},
      {type: 'attribute-name', content: 'for'},
      {type: 'attribute-value', content: 'e'}
    ]
  },
  {
    indentation: 1,
    type: 'text',
    tokens: [{type: 'text', content: 'Type in your email'}],
  },
  {
    indentation: 0,
    type: 'tag',
    tokens: [
      {type: 'tag-name', content: 'input'},
      {type: 'attribute-name', content: 'id'},
      {type: 'attribute-value', content: 'e'},
      {type: 'attribute-name', content: 'type'},
      {type: 'attribute-value', content: 'email'}
      {type: 'attribute-name', content: 'value'},
      {type: 'attribute-value', content: 'john@jason.com'}
    ]
  },
]

If the editor could support it, a token could be of any type. Like you could have php type token in there and editing that could be like a codemirror instance with php specific snippets & shortcuts.

There's probably a ton of unsolved problems here, but my gut feeling is that this model would work well for html & css type languages. I've seen quite a few whitespace sensitive pre-processed html & css languages⁺ that look like this on the surface, that's what gives me hope that this is the right way.

Why not a tree?

Well, I tried it, the first working prototype I built was completely tree based. It was kind of the situation I descibed in the beginning there. It felt "correct", but not easy to understand or develop. At first glance it doesn't look that different from a row based model, but when you start doing indentation or multiple selections, things just go nuts.

— Ville / @sakamies


Notes