Non puoi selezionare più di 25 argomenti Gli argomenti devono iniziare con una lettera o un numero, possono includere trattini ('-') e possono essere lunghi fino a 35 caratteri.

133 righe
5.7 KiB

4 anni fa
  1. Copied from the old Etherpad. Found in /infrastructure/ace/
  2. Goals:
  3. - no unicode (for efficient escaping, sightliness)
  4. - efficient operations for ACE and collab (attributed text, etc.)
  5. - good for time-slider
  6. - good for API
  7. - line-ending aware
  8. X more coherent (deleting or styling text merging with insertion)
  9. - server-side syntax highlighting?
  10. - unify author map with attribute pool
  11. - unify attributed text with changeset rep
  12. - not: reversible
  13. - force final newline of document to be preserved
  14. - Unicode bad:
  15. - ugly (hard to read)
  16. - more complex to parse
  17. - harder to store and transmit correctly
  18. - doesn't save all that much space anyway
  19. - blows up in size when string-escaped
  20. - embarrassing for API
  21. # Attributes:
  22. An "attribute" is a (key,value) pair such as (author,abc123456) or
  23. (bold,true). Sometimes an attribute is treated as an instruction to
  24. add that attribute, in which case an empty value means to remove it.
  25. So (bold,) removes the "bold" attribute. Attributes are interned and
  26. given numeric IDs, so the number "6" could represent "(bold,true)",
  27. for example. This mapping is stored in an attribute "pool" which may
  28. be shared by multiple changesets.
  29. Entries in the pool must be unique, so that attributes can be compared
  30. by their IDs. Attribute names cannot contain commas.
  31. A changeset looks something like the following:
  32. Z:5g>1|5=2p=v*4*5+1$x
  33. With the corresponding pool containing these entries:
  34. ...
  35. 4 -> (author,1059348573)
  36. 5 -> (bold,true)
  37. ...
  38. This changeset, together with the pool, represents inserting
  39. a bold letter "x" into the middle of a line. The string consists of:
  40. - a letter Z (the "magic character" and format version identifier)
  41. - a series of opcodes (punctuation) and numeric values in base 36 (the
  42. alphanumerics)
  43. - a dollar sign ($)
  44. - a string of characters used by insertion operations (the "char bank")
  45. If we separate out the operations and convert the numbers to base 10, we get:
  46. Z :196 >1 |5=97 =31 *4 *5 +1 $"x"
  47. Here are descriptions of the operations, where capital letters are variables:
  48. ":N" : Source text has length N (must be first op)
  49. ">N" : Final text is N (positive) characters longer than source text (must be second op)
  50. "<N" : Final text is N (positive) characters shorter than source text (must be second op)
  51. ">0" : Final text is same length as source text
  52. "+N" : Insert N characters from the bank, none of them newlines
  53. "-N" : Skip over (delete) N characters from the source text, none of them newlines
  54. "=N" : Keep N characters from the source text, none of them newlines
  55. "|L+N" : Insert N characters from the source text, containing L newlines. The last
  56. character inserted MUST be a newline, but not the (new) document's final newline.
  57. "|L-N" : Delete N characters from the source text, containing L newlines. The last
  58. character inserted MUST be a newline, but not the (old) document's final newline.
  59. "|L=N" : Keep N characters from the source text, containing L newlines. The last character
  60. kept MUST be a newline, and the final newline of the document is allowed.
  61. "*I" : Apply attribute I from the pool to the following +, =, |+, or |= command.
  62. In other words, any number of * ops can come before a +, =, or | but not
  63. between a | and the corresponding + or =.
  64. If +, text is inserted having this attribute. If =, text is kept but with
  65. the attribute applied as an attribute addition or removal.
  66. Consecutive attributes must be sorted lexically by (key,value) with key
  67. and value taken as strings. It's illegal to have duplicate keys
  68. for (key,value) pairs that apply to the same text. It's illegal to
  69. have an empty value for a key in the case of an insertion (+), the
  70. pair should just be omitted.
  71. Characters from the source text that aren't accounted for are assumed to be kept
  72. with the same attributes.
  73. Additional Constraints:
  74. - Consecutive +, -, and = ops of the same type that could be combined are not allowed.
  75. Whether combination is possible depends on the attributes of the ops and whether
  76. each is multiline or not. For example, two multiline deletions can never be
  77. consecutive, nor can any insertion come after a non-multiline insertion with the
  78. same attributes.
  79. - "No-op" ops are not allowed, such as deleting 0 characters. However, attribute
  80. applications that don't have any effect are allowed.
  81. - Characters at the end of the source text cannot be explicitly kept with no changes;
  82. if the change doesn't affect the last N characters, those "keep" ops must be left off.
  83. - In any consecutive sequence of insertions (+) and deletions (-) with no keeps (=),
  84. the deletions must come before the insertions.
  85. - The document text before and after will always end with a newline. This policy avoids
  86. a lot of special-casing of the end of the document. If a final newline is
  87. always added when importing text and removed when exporting text, then the
  88. changeset representation can be used to process text files that may or may not
  89. have a final newline.
  90. Attribution string:
  91. An "attribution string" is a series of inserts with no deletions or keeps.
  92. For example, "*3+8|1+5" describes the attributes of a string of length 13,
  93. where the first 8 chars have attribute 3 and the next 5 chars have no
  94. attributes, with the last of these 5 chars being a newline. Constraints
  95. apply similar to those affecting changesets, but the restriction about
  96. the final newline of the new document being added doesn't apply.
  97. Attributes in an attribution string cannot be empty, like "(bold,)", they should
  98. instead be absent.
  99. -------
  100. Considerations:
  101. - composing changesets/attributions with different pools
  102. - generalizing "applyToAttribution" to make "mutateAttributionLines" and "compose"