|
|
-
-
- Copied from the old Etherpad. Found in /infrastructure/ace/
-
- Goals:
-
- - no unicode (for efficient escaping, sightliness)
- - efficient operations for ACE and collab (attributed text, etc.)
- - good for time-slider
- - good for API
- - line-ending aware
- X more coherent (deleting or styling text merging with insertion)
- - server-side syntax highlighting?
- - unify author map with attribute pool
- - unify attributed text with changeset rep
- - not: reversible
- - force final newline of document to be preserved
-
- - Unicode bad:
- - ugly (hard to read)
- - more complex to parse
- - harder to store and transmit correctly
- - doesn't save all that much space anyway
- - blows up in size when string-escaped
- - embarrassing for API
-
-
- # Attributes:
-
- An "attribute" is a (key,value) pair such as (author,abc123456) or
- (bold,true). Sometimes an attribute is treated as an instruction to
- add that attribute, in which case an empty value means to remove it.
- So (bold,) removes the "bold" attribute. Attributes are interned and
- given numeric IDs, so the number "6" could represent "(bold,true)",
- for example. This mapping is stored in an attribute "pool" which may
- be shared by multiple changesets.
-
- Entries in the pool must be unique, so that attributes can be compared
- by their IDs. Attribute names cannot contain commas.
-
- A changeset looks something like the following:
-
- Z:5g>1|5=2p=v*4*5+1$x
-
- With the corresponding pool containing these entries:
-
- ...
- 4 -> (author,1059348573)
- 5 -> (bold,true)
- ...
-
- This changeset, together with the pool, represents inserting
- a bold letter "x" into the middle of a line. The string consists of:
-
- - a letter Z (the "magic character" and format version identifier)
- - a series of opcodes (punctuation) and numeric values in base 36 (the
- alphanumerics)
- - a dollar sign ($)
- - a string of characters used by insertion operations (the "char bank")
-
- If we separate out the operations and convert the numbers to base 10, we get:
-
- Z :196 >1 |5=97 =31 *4 *5 +1 $"x"
-
- Here are descriptions of the operations, where capital letters are variables:
-
- ":N" : Source text has length N (must be first op)
- ">N" : Final text is N (positive) characters longer than source text (must be second op)
- "<N" : Final text is N (positive) characters shorter than source text (must be second op)
- ">0" : Final text is same length as source text
- "+N" : Insert N characters from the bank, none of them newlines
- "-N" : Skip over (delete) N characters from the source text, none of them newlines
- "=N" : Keep N characters from the source text, none of them newlines
- "|L+N" : Insert N characters from the source text, containing L newlines. The last
- character inserted MUST be a newline, but not the (new) document's final newline.
- "|L-N" : Delete N characters from the source text, containing L newlines. The last
- character inserted MUST be a newline, but not the (old) document's final newline.
- "|L=N" : Keep N characters from the source text, containing L newlines. The last character
- kept MUST be a newline, and the final newline of the document is allowed.
- "*I" : Apply attribute I from the pool to the following +, =, |+, or |= command.
- In other words, any number of * ops can come before a +, =, or | but not
- between a | and the corresponding + or =.
- If +, text is inserted having this attribute. If =, text is kept but with
- the attribute applied as an attribute addition or removal.
- Consecutive attributes must be sorted lexically by (key,value) with key
- and value taken as strings. It's illegal to have duplicate keys
- for (key,value) pairs that apply to the same text. It's illegal to
- have an empty value for a key in the case of an insertion (+), the
- pair should just be omitted.
-
- Characters from the source text that aren't accounted for are assumed to be kept
- with the same attributes.
-
- Additional Constraints:
-
- - Consecutive +, -, and = ops of the same type that could be combined are not allowed.
- Whether combination is possible depends on the attributes of the ops and whether
- each is multiline or not. For example, two multiline deletions can never be
- consecutive, nor can any insertion come after a non-multiline insertion with the
- same attributes.
- - "No-op" ops are not allowed, such as deleting 0 characters. However, attribute
- applications that don't have any effect are allowed.
- - Characters at the end of the source text cannot be explicitly kept with no changes;
- if the change doesn't affect the last N characters, those "keep" ops must be left off.
- - In any consecutive sequence of insertions (+) and deletions (-) with no keeps (=),
- the deletions must come before the insertions.
- - The document text before and after will always end with a newline. This policy avoids
- a lot of special-casing of the end of the document. If a final newline is
- always added when importing text and removed when exporting text, then the
- changeset representation can be used to process text files that may or may not
- have a final newline.
-
- Attribution string:
-
- An "attribution string" is a series of inserts with no deletions or keeps.
- For example, "*3+8|1+5" describes the attributes of a string of length 13,
- where the first 8 chars have attribute 3 and the next 5 chars have no
- attributes, with the last of these 5 chars being a newline. Constraints
- apply similar to those affecting changesets, but the restriction about
- the final newline of the new document being added doesn't apply.
-
- Attributes in an attribution string cannot be empty, like "(bold,)", they should
- instead be absent.
-
-
-
-
-
- -------
- Considerations:
-
- - composing changesets/attributions with different pools
- - generalizing "applyToAttribution" to make "mutateAttributionLines" and "compose"
|