module BatUTF8:UTF-8 encoded Unicode strings.sig
..end
This module defines UTF-8 encoded Unicode strings, implemented in
a manner comparable to native OCaml strings. This module is
provided essentially for internal use and should be regarded as
mostly obsoleted by Rope
.
Note For type-safety reasons, the definition of type BatUTF8.t
is
kept abstract. This may cause incompatibilities with Camomile library.
Author(s): Yamagata Yoriyuki (Camomile), Edgar Friendly, David Teller
This module defines UTF-8 encoded Unicode strings, implemented in
a manner comparable to native OCaml strings. This module is
provided essentially for internal use and should be regarded as
mostly obsoleted by Rope
.
Note For type-safety reasons, the definition of type BatUTF8.t
is
kept abstract. This may cause incompatibilities with Camomile library.
type
t
exception Malformed_code
val validate : string -> unit
validate s
succeeds if s is valid UTF-8, otherwise raises
Malformed_code
. Other functions assume strings are valid
UTF-8, so it is prudent to test their validity for strings from
untrusted origins.val append : t -> t -> t
val empty : t
val of_char : CamomileLibrary.UChar.t -> t
String.of_char
val make : int -> CamomileLibrary.UChar.t -> t
String.make
val of_string : string -> t
val to_string : t -> string
val enum : t -> CamomileLibrary.UChar.t BatEnum.t
String.enum
val of_enum : CamomileLibrary.UChar.t BatEnum.t -> t
String.of_enum
val backwards : t -> CamomileLibrary.UChar.t BatEnum.t
String.backwards
val of_backwards : CamomileLibrary.UChar.t BatEnum.t -> t
String.of_backwards
val sub : t -> int -> int -> t
String.sub
val get : t -> int -> CamomileLibrary.UChar.t
get s n
returns the n
-th Unicode character of s
. The call
requires O(n)-time.val init : int -> (int -> CamomileLibrary.UChar.t) -> t
init len f
returns a new string which contains len
Unicode characters.
The i-th Unicode character is initialized by f i
val length : t -> int
length s
returns the number of Unicode characters contained in sval length0 : int -> int
c
is the beginning of a UTF8 encoded
character, length0 c
returns the total number of bytes which
must be read for the Unicode character to be complete.typechar_idx =
int
0
. The location of the second is 1
.module Byte:sig
..end
typeindex =
int
val look : t -> Byte.b_idx -> CamomileLibrary.UChar.t
look s i
returns the Unicode character of the location i
in the string s
.val out_of_range : t -> Byte.b_idx -> bool
out_of_range s i
tests whether i
is a position inside of s
.val compare_index : t -> index -> index -> int
compare_index s i1 i2
returns
a value < 0 if i1
is the position located before i2
,
0 if i1
and i2
points the same location,
a value > 0 if i1
is the position located after i2
.val next : t -> index -> index
next s i
returns the position of the head of the Unicode character
located immediately after i
.
If i
is inside of s
, the function always successes.
If i
is inside of s
and there is no Unicode character after i
,
the position outside s
is returned.
If i
is not inside of s
, the behaviour is unspecified.val prev : t -> index -> index
prev s i
returns the position of the head of the Unicode character
located immediately before i
.
If i
is inside of s
, the function always successes.
If i
is inside of s
and there is no Unicode character before i
,
the position outside s
is returned.
If i
is not inside of s
, the behaviour is unspecified.val move : t -> index -> int -> index
move s i n
returns n
-th Unicode character after i
if n >= 0,
n
-th Unicode character before i
if n < 0.
If there is no such character, the result is unspecified.val iter : (CamomileLibrary.UChar.t -> unit) -> t -> unit
iter f s
applies f
to all Unicode characters in s
.
The order of application is same to the order
of the Unicode characters in s
.val compare : t -> t -> int
compare s1 s2
returns a positive integer if s1
> s2
, 0 if s1
= s2
, a
negative integer if s1
< s2
.val concat : t -> t list -> t
concat sep [a;b;c...]
returns the concatenation of
a
, sep
, b
, sep
, c
, sep
...val join : t -> t list -> t
concat
val uppercase : t -> t
val lowercase : t -> t
val init : int -> (int -> CamomileLibrary.UChar.t) -> t
String.init
val map : (CamomileLibrary.UChar.t -> CamomileLibrary.UChar.t) ->
t -> t
String.map
val filter_map : (CamomileLibrary.UChar.t -> CamomileLibrary.UChar.t option) ->
t -> t
String.filter_map
val filter : (CamomileLibrary.UChar.t -> bool) -> t -> t
String.filter
val index : t -> CamomileLibrary.UChar.t -> int
String.index
val rindex : t -> CamomileLibrary.UChar.t -> int
String.rindex
val contains : t -> CamomileLibrary.UChar.t -> bool
String.contains
val contains_from : t -> CamomileLibrary.UChar.t -> Byte.b_idx -> bool
val rcontains_from : t -> CamomileLibrary.UChar.t -> Byte.b_idx -> bool
val escaped : t -> t
module Buf:sig
..end
with type buf = Buffer.t
val print : 'a BatInnerIO.output -> t -> unit
val t_printer : t BatValue_printer.t