module BatRope:Heavyweight strings ("ropes")sig
..end
This module implements ropes as described in Boehm, H., Atkinson, R., and Plass, M. 1995. Ropes: an alternative to strings. Softw. Pract. Exper. 25, 12 (Dec. 1995), 1315-1330.
Ropes are an alternative to strings which support efficient operations:
All operations are non-destructive: the original rope is never modified. When a
new rope is returned as the result of an operation, it will share as much data
as possible with its "parent". For instance, if a rope of length n
undergoes
m
operations (assume n >> m
) like set, append or prepend, the modified
rope will only require O(m)
space in addition to that taken by the
original one.
However, Rope is an amortized data structure, and its use in a persistent setting
can easily degrade its amortized time bounds. It is thus mainly intended to be used
ephemerally. In some cases, it is possible to use Rope persistently with the same
amortized bounds by explicitly rebalancing ropes to be reused using balance
.
Special care must be taken to avoid calling balance
too frequently; in the limit,
calling balance
after each modification would defeat the purpose of amortization.
The length of ropes is limited to approximately 700 Mb on 32-bit
architectures, 220 Gb on 64 bit architectures.
Author(s): Mauricio Fernandez
type
t
exception Out_of_bounds
exception Invalid_rope
val max_length : int
val empty : t
val of_latin1 : string -> t
val of_string : string -> t
of_string s
returns a rope corresponding to the UTF-8 encoded string s
.val to_string : t -> string
to_string t
returns a UTF-8 encoded string representing t
val of_ustring : BatUTF8.t -> t
of_string s
returns a rope corresponding to the string s
.
Operates in O(n)
time.val to_ustring : t -> BatUTF8.t
to_ustring r
returns the string corresponding to the rope r
.val of_uchar : CamomileLibrary.UChar.t -> t
of_uchar c
returns a rope containing exactly character c
.val of_char : char -> t
of_char c
returns a rope containing exactly Latin-1 character c
.val make : int -> CamomileLibrary.UChar.t -> t
make i c
returns a rope of length i
consisting of c
chars;
it is similar to String.makeval lowercase : t -> t
lowercase s
returns a lowercase copy of rope s
.
Note that, for some languages, the number of characters in
lowercase s
is not the same as the number of characters in
s
.
val uppercase : t -> t
uppercase s
returns a uppercase copy of rope s
.
Note that, for some languages, the number of characters in
uppercase s
is not the same as the number of characters in
s
.
val capitalize : t -> t
val uncapitalize : t -> t
val join : t -> t list -> t
BatRope.concat
val explode : t -> CamomileLibrary.UChar.t list
explode s
returns the list of characters in the rope s
.val implode : CamomileLibrary.UChar.t list -> t
implode cs
returns a rope resulting from concatenating
the characters in the list cs
.val is_empty : t -> bool
val length : t -> int
O(1)
).
This is number of UTF-8 characters.val height : t -> int
val balance : t -> t
balance r
returns a balanced copy of the r
rope. Note that ropes are
automatically rebalanced when their height exceeds a given threshold, but
balance
allows to invoke that operation explicity.val append : t -> t -> t
append r u
concatenates the r
and u
ropes. In general, it operates
in O(log(min n1 n2))
amortized time.
Small ropes are treated specially and can be appended/prepended in
amortized O(1)
time.val (^^^) : t -> t -> t
val append_char : CamomileLibrary.UChar.t -> t -> t
append_char c r
returns a new rope with the c
character at the end
in amortized O(1)
time.val prepend_char : CamomileLibrary.UChar.t -> t -> t
prepend_char c r
returns a new rope with the c
character at the
beginning in amortized O(1)
time.val get : t -> int -> CamomileLibrary.UChar.t
get r n
returns the (n+1)th character from the rope r
; i.e.
get r 0
returns the first character.
Operates in worst-case O(log size)
time.
Raises Out_of_bounds if a character out of bounds is requested.val set : t -> int -> CamomileLibrary.UChar.t -> t
set r n c
returns a copy of rope r
where the (n+1)th character
has been set to c
. See also BatRope.get
.
Operates in worst-case O(log size)
time.val sub : t -> int -> int -> t
sub r m n
returns a sub-rope of r
containing all characters
whose indexes range from m
to m + n - 1
(included).
Raises Out_of_bounds in the same cases as sub.
Operates in worst-case O(log size)
time.val insert : int -> t -> t -> t
insert n r u
returns a copy of the u
rope where r
has been
inserted between the characters with index n
and n + 1
in the
original rope. The length of the new rope is
length u + length r
.
Operates in amortized O(log(size r) + log(size u))
time.val remove : int -> int -> t -> t
remove m n r
returns the rope resulting from deleting the
characters with indexes ranging from m
to m + n - 1
(included)
from the original rope r
. The length of the new rope is
length r - n
.
Operates in amortized O(log(size r))
time.val concat : t -> t list -> t
concat sep sl
concatenates the list of ropes sl
,
inserting the separator rope sep
between each.val iter : (CamomileLibrary.UChar.t -> unit) -> t -> unit
iter f r
applies f
to all the characters in the r
rope,
in order.val iteri : ?base:int -> (int -> CamomileLibrary.UChar.t -> unit) -> t -> unit
iter
, but also passes the index of the character
to the given function.val range_iter : (CamomileLibrary.UChar.t -> unit) -> int -> int -> t -> unit
rangeiter f m n r
applies f
to all the characters whose
indices k
satisfy m
<= k
< m + n
.
It is thus equivalent to iter f (sub m n r)
, but does not
create an intermediary rope. rangeiter
operates in worst-case
O(n + log m)
time, which improves on the O(n log m)
bound
from an explicit loop using get
.
Raises Out_of_bounds in the same cases as sub
.val range_iteri : (int -> CamomileLibrary.UChar.t -> unit) ->
?base:int -> int -> int -> t -> unit
range_iter
, but passes base + index of the character in the
subrope defined by next to arguments.val bulk_iter : (BatUTF8.t -> unit) -> t -> unit
val bulk_iteri : ?base:int -> (int -> BatUTF8.t -> unit) -> t -> unit
val fold : ('a -> CamomileLibrary.UChar.t -> 'a) -> 'a -> t -> 'a
Rope.fold f a r
computes f (... (f (f a r0) r1)...) rN-1
where rn = Rope.get n r
and N = length r
.val bulk_fold : ('a -> BatUTF8.t -> 'a) -> 'a -> t -> 'a
BatRope.fold
but over larger chunks of data.val enum : t -> CamomileLibrary.UChar.t BatEnum.t
val bulk_enum : t -> BatUTF8.t BatEnum.t
val of_enum : CamomileLibrary.UChar.t BatEnum.t -> t
val of_bulk_enum : BatUTF8.t BatEnum.t -> t
Provided for convenience and speed.
val backwards : t -> CamomileLibrary.UChar.t BatEnum.t
val of_backwards : CamomileLibrary.UChar.t BatEnum.t -> t
val init : int -> (int -> CamomileLibrary.UChar.t) -> t
init l f
returns the rope of length l
with the chars f 0 , f
1 , f 2 ... f (l-1).val of_int : int -> t
val of_float : float -> t
val to_int : t -> int
Invalid_rope
if the rope does not represent an integer.val to_float : t -> float
val bulk_map : (BatUTF8.t -> BatUTF8.t) -> t -> t
map f s
returns a rope where all ropes c
in s
have been
replaced by f c
. *val map : (CamomileLibrary.UChar.t -> CamomileLibrary.UChar.t) ->
t -> t
map f s
returns a rope where all characters c
in s
have been
replaced by f c
. *val bulk_filter_map : (BatUTF8.t -> BatUTF8.t option) -> t -> t
bulk_filter_map f l
calls (f a0) (f a1).... (f an)
where a0..an
are
the UTF-encoded strings of l
. It returns the list of elements bi
such as
f ai = Some bi
(when f
returns None
, the corresponding element of
l
is discarded).val filter_map : (CamomileLibrary.UChar.t -> CamomileLibrary.UChar.t option) ->
t -> t
filter_map f l
calls (f a0) (f a1).... (f an)
where a0..an
are
the characters of l
. It returns the list of elements bi
such as
f ai = Some bi
(when f
returns None
, the corresponding element of
l
is discarded).val filter : (CamomileLibrary.UChar.t -> bool) -> t -> t
filter f s
returns a copy of rope s
in which only
characters c
such that f c = true
remain.val index : t -> CamomileLibrary.UChar.t -> int
Rope.index s c
returns the position of the leftmost
occurrence of character c
in rope s
.
Raise Not_found
if c
does not occur in s
.val index_from : t -> int -> CamomileLibrary.UChar.t -> int
Rope.index
, but start searching at the character
position given as second argument. Rope.index s c
is
equivalent to Rope.index_from s 0 c
.val rindex : t -> CamomileLibrary.UChar.t -> int
Rope.rindex s c
returns the position of the rightmost
occurrence of character c
in rope s
.
Raise Not_found
if c
does not occur in s
.val rindex_from : t -> int -> CamomileLibrary.UChar.t -> int
BatRope.rindex
, but start
searching at the character position given as second argument.
rindex s c
is equivalent to
rindex_from s (length s - 1) c
.val contains : t -> CamomileLibrary.UChar.t -> bool
contains s c
tests if character c
appears in the rope s
.val contains_from : t -> int -> CamomileLibrary.UChar.t -> bool
contains_from s start c
tests if character c
appears in
the subrope of s
starting from start
to the end of s
.Invalid_argument
if start
is not a valid index of s
.val rcontains_from : t -> int -> CamomileLibrary.UChar.t -> bool
rcontains_from s stop c
tests if character c
appears in the subrope of s
starting from the beginning
of s
to index stop
.Invalid_argument
if stop
is not a valid index of s
.val find : t -> t -> int
find s x
returns the starting index of the first occurrence of
rope x
within rope s
.
Note This implementation is optimized for short ropes.
Raises Invalid_rope
if x
is not a subrope of s
.
val find_from : t -> int -> t -> int
find_from s ofs x
behaves as find s x
but starts searching
at offset ofs
. find s x
is equivalent to find_from s 0 x
.val rfind : t -> t -> int
rfind s x
returns the starting index of the last occurrence
of rope x
within rope s
.
Note This implementation is optimized for short ropes.
Raises Invalid_rope
if x
is not a subrope of s
.
val rfind_from : t -> int -> t -> int
rfind_from s ofs x
behaves as rfind s x
but starts searching
at offset ofs
. rfind s x
is equivalent to rfind_from s (length s - 1) x
.val starts_with : t -> t -> bool
starts_with s x
returns true
if s
is starting with x
, false
otherwise.val ends_with : t -> t -> bool
ends_with s x
returns true
if the rope s
is ending with x
, false
otherwise.val exists : t -> t -> bool
exists str sub
returns true if sub
is a subrope of str
or
false otherwise.val trim : t -> t
val left : t -> int -> t
left r len
returns the rope containing the len
first characters of r
val right : t -> int -> t
left r len
returns the rope containing the len
last characters of r
val head : t -> int -> t
BatRope.left
val tail : t -> int -> t
tail r pos
returns the rope containing all but the pos
first characters of r
val strip : ?chars:CamomileLibrary.UChar.t list -> t -> t
val lchop : t -> t
val rchop : t -> t
val slice : ?first:int -> ?last:int -> t -> t
slice ?first ?last s
returns a "slice" of the rope
which corresponds to the characters s.[first]
,
s.[first+1]
, ..., s[last-1]
. Note that the character at
index last
is not included! If first
is omitted it
defaults to the start of the rope, i.e. index 0, and if
last
is omitted is defaults to point just past the end of
s
, i.e. length s
. Thus, slice s
is equivalent to
copy s
.
Negative indexes are interpreted as counting from the end of
the rope. For example, slice ~last:-2 s
will return the
rope s
, but without the last two characters.
This function never raises any exceptions. If the
indexes are out of bounds they are automatically clipped.
val splice : t -> int -> int -> t -> t
splice s off len rep
returns the rope in which the section of s
indicated by off
and len
has been cut and replaced by rep
.
Negative indices are interpreted as counting from the end of the string.
val fill : t -> int -> int -> CamomileLibrary.UChar.t -> t
fill s start len c
returns the rope in which
characters number start
to start + len - 1
of s
has
been replaced by c
.Invalid_argument
if start
and len
do not
designate a valid subrope of s
.val blit : t -> int -> t -> int -> int -> t
blit src srcoff dst dstoff len
returns a copy
of dst
in which len
characters have been copied
from rope src
, starting at character number srcoff
, to
rope dst
, starting at character number dstoff
. It works
correctly even if src
and dst
are the same rope,
and the source and destination chunks overlap.Invalid_argument
if srcoff
and len
do not
designate a valid subrope of src
, or if dstoff
and len
do not designate a valid subrope of dst
.val concat : t -> t list -> t
concat sep sl
concatenates the list of ropes sl
,
inserting the separator rope sep
between each.val escaped : t -> t
val replace_chars : (CamomileLibrary.UChar.t -> BatUTF8.t) -> t -> t
replace_chars f s
returns a rope where all chars c
of s
have been
replaced by the rope returned by f c
.val replace : str:t -> sub:t -> by:t -> bool * t
replace ~str ~sub ~by
returns a tuple constisting of a boolean
and a rope where the first occurrence of the rope sub
within str
has been replaced by the rope by
. The boolean
is true
if a substitution has taken place, false
otherwise.val split : t -> t -> t * t
split s sep
splits the rope s
between the first
occurrence of sep
.Invalid_rope
if the separator is not found.val rsplit : t -> t -> t * t
rsplit s sep
splits the rope s
between the last
occurrence of sep
.Invalid_rope
if the separator is not found.val nsplit : t -> t -> t list
nsplit s sep
splits the rope s
into a list of ropes
which are separated by sep
.
nsplit "" _
returns the empty list.val compare : t -> t -> int
Pervasives.compare
. Along with the type t
, this function compare
allows the module Rope
to be passed as argument to the functors
Set.Make
and Map.Make
.val icompare : t -> t -> int
module IRope:BatInterfaces.OrderedType
with type t = t
val print : 'a BatInnerIO.output -> t -> unit
val t_printer : t BatValue_printer.t