Module BatUTF8


module BatUTF8: sig .. end
UTF-8 encoded Unicode strings.

This module defines UTF-8 encoded Unicode strings, implemented in a manner comparable to native OCaml strings. This module is provided essentially for internal use and should be regarded as mostly obsoleted by Rope.

Note For type-safety reasons, the definition of type BatUTF8.t is kept abstract. This may cause incompatibilities with Camomile library.
Author(s): Yamagata Yoriyuki (Camomile), Edgar Friendly, David Teller



UTF-8 encoded Unicode strings.

This module defines UTF-8 encoded Unicode strings, implemented in a manner comparable to native OCaml strings. This module is provided essentially for internal use and should be regarded as mostly obsoleted by Rope.

Note For type-safety reasons, the definition of type BatUTF8.t is kept abstract. This may cause incompatibilities with Camomile library.

type t = private string 
UTF-8 encoded Unicode strings. If you coerce it to a string, modify it at your own risk. Call BatUTF8.validate to verify that the contents are still valid UTF-8.
exception Malformed_code
val validate : string -> unit
validate s succeeds if s is valid UTF-8, otherwise raises Malformed_code. Other functions assume strings are valid UTF-8, so it is prudent to test their validity for strings from untrusted origins.
val append : t -> t -> t
Concatenate two UTF8 strings
val empty : t
The empty UTF8 string
val of_char : BatCamomile.UChar.t -> t
As String.of_char
val make : int -> BatCamomile.UChar.t -> t
As String.make
val of_string : string -> t
Adopt a string. Involves copying.
val to_string : t -> string
Return an UTF-8 encoded string representing this Unicode string.
val adopt : string -> t
Adopt a string without copying. Modifying the original string will modify this value, possibly breaking safety guarantees.
val enum : t -> BatCamomile.UChar.t BatEnum.t
As String.enum
val of_enum : BatCamomile.UChar.t BatEnum.t -> t
As String.of_enum
val backwards : t -> BatCamomile.UChar.t BatEnum.t
As String.backwards
val of_backwards : BatCamomile.UChar.t BatEnum.t -> t
As String.of_backwards
val sub : t -> int -> int -> t
As String.sub
val get : t -> int -> BatCamomile.UChar.t
get s n returns the n-th Unicode character of s. The call requires O(n)-time.
val init : int -> (int -> BatCamomile.UChar.t) -> t
init len f returns a new string which contains len Unicode characters. The i-th Unicode character is initialized by f i
val length : t -> int
length s returns the number of Unicode characters contained in s
val length0 : int -> int
UTF8 encoding often calls for the encoding of a Unicode character with several bytes. If c is the beginning of a UTF8 encoded character, length0 c returns the total number of bytes which must be read for the Unicode character to be complete.
Returns 1 if the character is complete, n >= 2 otherwise
type char_idx = int 
Positions in the string as indexes by characters. The location of the first character is 0. The location of the second is 1.
module Byte: sig .. end
Positions in the string represented by the number of bytes from the head.
type index = int 
val look : t -> Byte.b_idx -> BatCamomile.UChar.t
look s i returns the Unicode character of the location i in the string s.
val out_of_range : t -> Byte.b_idx -> bool
out_of_range s i tests whether i is a position inside of s.
val compare_index : t -> index -> index -> int
compare_index s i1 i2 returns a value < 0 if i1 is the position located before i2, 0 if i1 and i2 points the same location, a value > 0 if i1 is the position located after i2.
val next : t -> index -> index
next s i returns the position of the head of the Unicode character located immediately after i. If i is inside of s, the function always successes. If i is inside of s and there is no Unicode character after i, the position outside s is returned. If i is not inside of s, the behaviour is unspecified.
val prev : t -> index -> index
prev s i returns the position of the head of the Unicode character located immediately before i. If i is inside of s, the function always successes. If i is inside of s and there is no Unicode character before i, the position outside s is returned. If i is not inside of s, the behaviour is unspecified.
val move : t -> index -> int -> index
move s i n returns n-th Unicode character after i if n >= 0, n-th Unicode character before i if n < 0. If there is no such character, the result is unspecified.
val iter : (BatCamomile.UChar.t -> unit) -> t -> unit
iter f s applies f to all Unicode characters in s. The order of application is same to the order of the Unicode characters in s.
val compare : t -> t -> int
Code point comparison by lexicographic order. compare s1 s2 returns a positive integer if s1 > s2, 0 if s1 = s2, a negative integer if s1 < s2.
val concat : t -> t list -> t
concat sep [a;b;c...] returns the concatenation of a, sep, b, sep, c, sep...
val join : t -> t list -> t
as concat
val uppercase : t -> t
Return a copy of the argument, with all lowercase letters translated to uppercase.
val lowercase : t -> t
Return a copy of the argument, with all uppercase letters translated to lowercase.
val init : int -> (int -> BatCamomile.UChar.t) -> t
As String.init
val map : (BatCamomile.UChar.t -> BatCamomile.UChar.t) -> t -> t
As String.map
val filter_map : (BatCamomile.UChar.t -> BatCamomile.UChar.t option) -> t -> t
As String.filter_map
val filter : (BatCamomile.UChar.t -> bool) -> t -> t
As String.filter
val index : t -> BatCamomile.UChar.t -> int
As String.index
val rindex : t -> BatCamomile.UChar.t -> int
As String.rindex
val contains : t -> BatCamomile.UChar.t -> bool
As String.contains
val contains_from : t -> BatCamomile.UChar.t -> Byte.b_idx -> bool
val rcontains_from : t -> BatCamomile.UChar.t -> Byte.b_idx -> bool
val escaped : t -> t
module Buf: sig .. end  with type buf = Buffer.t
Buffer module for UTF-8 strings

Boilerplate code

val print : 'a BatInnerIO.output -> t -> unit
Printing

val t_printer : t BatValue_printer.t