2019-03-27
Unicode 13 support ([#179]).
No longer report zero width for category Sk ([#167]).
cmake support improvements ([#173]).
2019-05-10
Unicode 12.1 support ([#156]).
New -DUTF8PROC_INSTALL=No option for cmake builds to disable installation ([#152]).
Better make support for HP-UX ([#154]).
Fixed incorrect UTF8PROC_VERSION_MINOR version number in header and bumped shared-library version.
2019-03-30
Unicode 12 support ([#148]).
New function utf8proc_unicode_version to return the supported Unicode version ([#151]).
Simpler character-width computation that no longer uses GNU Unifont metrics: East-Asian wide characters have width 2, and all other printable characters have width 1 ([#150]).
Fix CHARBOUND option for utf8proc_map to preserve U+FFFE and U+FFFF non-characters ([#149]).
Various build-system improvements ([#141], [#142], [#147]).
2018-07-24
Unicode 11 support ([#132] and [#140]).
utf8proc_NFKC_Casefold convenience function for NFKC_Casefold
normalization ([#133]).
UTF8PROC_STRIPNA option to strip unassigned codepoints ([#133]).
Support building static libraries on Windows (callers need to
#define UTF8PROC_STATIC) ([#123]).
cmake fix to avoid defining UTF8PROC_EXPORTS globally ([#121]).
toupper of ß (U+00df) now yields ẞ (U+1E9E) ([#134]), similar to musl;
case-folding still yields the standard "ss" mapping.
utf8proc_charwidth now returns 1 for U+00AD (soft hyphen) and
for unassigned/PUA codepoints ([#135]).
2018-04-27
Fixed composition bug ([#128]).
Minor build fixes ([#94], [#99], [#113], [#125]).
2016-12-26:
New functions utf8proc_map_custom and utf8proc_decompose_custom
to allow user-supplied transformations of codepoints, in conjunction
with other transformations ([#89]).
New function utf8proc_normalize_utf32 to apply normalizations
directly to UTF-32 data (not just UTF-8) ([#88]).
Fixed stack overflow that could occur due to incorrect definition
of UINT16_MAX with some compilers ([#84]).
Fixed conflict with stdbool.h in Visual Studio ([#90]).
Updated font metrics to use Unifont 9.0.04.
2016-07-27:
Move -Wmissing-prototypes warning flag from Makefile to .travis.yml
since MSVC does not understand this flag and it is occasionally useful to
build using MSVC through the Makefile ([#79]).
Use a different variable name for a nested loop in bench/bench.c, and
declare it in a C89 way rather than inside the for to avoid "error:
'for' loop initial declarations are only allowed in C99 mode" ([#80]).
2016-07-13:
Bug fix in utf8proc_grapheme_break_stateful ([#77]).
Tests now use versioned Unicode files, so they will no longer break when a new version of Unicode is released ([#78]).
2016-07-13:
Updated for Unicode 9.0 ([#70]).
New utf8proc_grapheme_break_stateful to handle the complicated
grapheme-breaking rules in Unicode 9. The old utf8proc_grapheme_break
is still provided, but may incorrectly identify grapheme breaks
in some Unicode-9 sequences.
Smaller Unicode tables ([#62], [#68]). This required changes
in the utf8proc_property_t structure, which breaks backward
compatibility if you access this struct directly. The
functions in the API remain backward-compatible, however.
Buffer overrun fix ([#66]).
2015-11-02:
Do not export symbol for internal function unsafe_encode_char() ([#55]).
Install relative symbolic links for shared libraries ([#58]).
Enable and fix compiler warnings ([#55], [#58]).
Add missing files to make clean ([#58]).
2015-07-06:
Updated for Unicode 8.0 ([#45]).
New utf8proc_tolower and utf8proc_toupper functions, portable
replacements for towlower and towupper in the C library ([#40]).
Don't treat Unicode "non-characters" as invalid, and improved validity checking in general ([#35]).
Prefix all typedefs with utf8proc_, e.g. utf8proc_int32_t,
to avoid collisions with other libraries ([#32]).
Rename DLLEXPORT to UTF8PROC_DLLEXPORT to prevent collisions.
Fix build breakage in the benchmark routines.
More fine-grained Makefile variables (PICFLAG etcetera), so that
compilation flags can be selectively overridden, and in particular
so that CFLAGS can be changed without accidentally eliminating
necessary flags like -fPIC and -std=c99 ([#43]).
Updated character-width tables based on Unifont 8.0.01 ([#51]) and the Unicode 8 character categories ([#47]).
2015-03-28:
Updated for Unicode 7.0 ([#6]).
New function utf8proc_grapheme_break(c1,c2) that returns whether
there is a grapheme break between c1 and c2 ([#20]).
New function utf8proc_charwidth(c) that returns the number of
column-positions that should be required for c; essentially a
portable replacment for wcwidth(c) ([#27]).
New function utf8proc_category(c) that returns the Unicode
category of c (as one of the constants UTF8PROC_CATEGORY_xx).
Also, a function utf8proc_category_string(c) that returns the Unicode
category of c as a two-character string.
cmake script CMakeLists.txt, in addition to Makefile, for
easier compilation on Windows ([#28]).
Various Makefile improvements: a make check target to perform
tests ([#13]), make install, a rule to automate updating the Unicode
tables, etcetera.
The shared library is now versioned (e.g. has a soname on GNU/Linux) ([#24]).
C++/MSVC compatibility ([#17]).
Most #defined constants are now enums ([#29]).
New preprocessor constants UTF8PROC_VERSION_MAJOR,
UTF8PROC_VERSION_MINOR, and UTF8PROC_VERSION_PATCH for compile-time
detection of the API version.
Doxygen-formatted documentation ([#29]).
The Ruby and PostgreSQL plugins have been removed due to lack of testing ([#22]).
2013-11-27:
c language name)2009-08-20:
RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and
RSTRING()->len for ruby1.9 compatibility (and #define them, if not
existent)2009-10-02:
2009-10-08:
2009-10-16:
2009-06-14:
2009-08-19:
README file2008-10-04:
utf8proc_version returning a string containing the version
number of the library.libutf8proc.dylib for MacOSX.2009-05-01:
- PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro)
2007-07-25:
2007-06-25:
unistrip, which behaves like unifold,
but also removes all character marks (e.g. accents).2007-07-22:
utf8proc_codepoint_valid to the C library.Makefile from -g -O0 to -O2utf8proc_data.c file, is now
included in the distribution.2007-03-16:
String#utf8chars).2006-09-21:
Integer#utf8, which raises an exception, if the given
code-point is invalid because of being too high (this was missing yet)2006-12-26:
2006-09-20:
Release of version 1.0.1
2006-09-17:
LUMP option, which lumps certain characters together (see lump.md) (also used for the PostgreSQL unifold function)STRIPMARK option, which strips marking characters (or marks of composed characters)String#char_ary in favour of String#utf8chars2006-07-18:
2006-08-04:
CHARBOUND)String#chars, which is returning an array of UTF-8 encoded grapheme clustersNLF2LF transformation in postgresql unifold functionDECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)2006-06-05:
2006-06-20:
2006-06-02: initial release of version 0.1