Created by Uriah Eisenstein , 2010
This is a small Java™ Web Start application intended to provide easy SQL-based access to the information in the Unicode Han Database, version 6.0.0 (compatibility with Unihan 5.2.0 is still maintained). The application loads Unihan data from the original ZIP file into a database, and provides a simple SQL client to the database. Other SQL clients can of course be used if the database is persistent (rather than in-memory); the application does have the advantage of support for inserting and displaying characters by Unicode code point values.
The application can be useful for obtaining statistical information about Unihan contents (such as how many characters are listed for each radical-stroke count), checking the data for consistency, or programmatic access if a persistent database is created. For comprehensive information about single characters the Unihan Database Lookup is probably better.
The application was developed on Windows XP using JDK 1.6.0.16 and the Apache Derby 10.6.1.0 database.The application itself contains neither the Unihan.zip file nor a database driver. The former is available from the Unicode website here (version 6.0), and is not to be hosted elsewhere. A database driver may be optionally bundled with the application in later versions.
The application requires a JDBC-compliant driver to be locally installed and accessible to it. This usually means the relevant JAR file must reside in the lib\ext subfolder under the JRE (Java Runtime Environment) installation location; For Apache Derby, the file is derby.jar. Please refer to your database documentation for further information.
The application can be launched using the following link:
Fill in an appropriate database URL and browse for the Unihan.zip file, then press Connect to start loading Unihan data into the database. If you have loaded the data into a persistent database, you can instead mark the Don't load Unihan checkbox to directly access the existing data. Either way, allow the connection to complete and press Continue to be able to send SQL queries. Note that you may also create additional tables, indices etc. as well as change loaded Unihan data (though this of course will not be reflected in the Unihan source files).
The database driver is expected to be inferred from the database URL. For instance, the application defaults to jdbc:derby:memory:unihan;create=true, which locates the Apache Derby database and instructs it to create an in-memory database. This takes about a minute, depending on the database in use and computer speed.
If the application doesn't run properly, please refer to the Troubleshooting notes below.
NULL
values in a single table.CODE_POINT
column encoded
in UTF-16, that is - a single character or a surrogate pair.CODE_POINT, RADICAL_INDEX, RADICAL_SIMPLIFIED
and STROKES
have the same name in
many tables, in order to facilitate NATURAL JOIN/JOIN USING
clauses. Other columns have the field name as
their name or prefix._ord
suffix indicate the ordinal position of the given record, which is sometimes significant;
for instance, kMandarin pronunciations are ordered by frequency.CREATE FUNCTION HEX(I INT) RETURNS VARCHAR(8) LANGUAGE JAVA PARAMETER STYLE JAVA EXTERNAL NAME 'java.lang.Integer.toHexString' DETERMINISTIC NO SQL RETURNS NULL ON NULL INPUT
SELECT CODE_POINT, HEX((kGB0/100 + 160)*256 + (MOD(kGB0, 100) + 160)) FROM kGB0
HAN_BLOCK
view lists Unicode blocks related to CJK characters (see below).UNIHAN_OBJECTS
table is created listing all tables, views etc. created by the current version of the application,
along with their type and creation order. This can be useful for generating SQL statements to manipulate all such database objects -
see example below.SQL_STATEMENTS
table from the previous version has been removed.UNIHAN_OBJECTS
table can be used to configure all tables back into row-locking mode. Retrieve all radical-stroke indices from kRSKangXi (no simplified radicals)SELECT CODE_POINT, Radical, Strokes FROM kRSUnicode NATURAL JOIN CJKRadicals
Usage ofSELECT CODE_POINT, Radical, Strokes FROM kRSKangXi NATURAL JOIN KangxiRadicals
HAN_BLOCK
: retrieve any definitions given for Extension C characters
Find all assymetric Z-variant relationsSELECT CODE_POINT, kDefinition FROM kDefinition JOIN HAN_BLOCK ON CODE_POINT BETWEEN BLOCK_FIRST AND BLOCK_LAST WHERE BLOCK_NAME = 'CJK Unified Ideographs Extension C'
Find all characters which have only a J-Source (and thus are probably only used in Japan)SELECT CODE_POINT, kZVariant FROM kZVariant Z1 WHERE NOT EXISTS (SELECT * FROM kZVariant Z2 WHERE Z1.CODE_POINT = Z2.kZVariant AND Z1.kZVariant = Z2.CODE_POINT)
Generate statements to remove all database objects created by the application in the current databaseSELECT CODE_POINT FROM kIRG_JSource J WHERE 'kIRG_JSource' = ALL( SELECT SOURCE FROM IRGSource_any WHERE CODE_POINT = J.CODE_POINT)
SELECT 'DROP ' || TYPE || ' ' || NAME FROM UNIHAN_OBJECTS ORDER BY CREATION_ORDER DESC
org.sqlite.JDBC
(case-sensitive) loads the SQLite driver. Consult the
database documentation to find the JDBC driver class if necessary.