Unihan SQL Browser

Created by Uriah Eisenstein [e-mail], 2010

This is a small Java™ Web Start application intended to provide easy SQL-based access to the information in the Unicode Han Database, version 6.0.0 (compatibility with Unihan 5.2.0 is still maintained). The application loads Unihan data from the original ZIP file into a database, and provides a simple SQL client to the database. Other SQL clients can of course be used if the database is persistent (rather than in-memory); the application does have the advantage of support for inserting and displaying characters by Unicode code point values.

The application can be useful for obtaining statistical information about Unihan contents (such as how many characters are listed for each radical-stroke count), checking the data for consistency, or programmatic access if a persistent database is created. For comprehensive information about single characters the Unihan Database Lookup is probably better.

The application was developed on Windows XP using JDK 1.6.0.16 and the Apache Derby 10.6.1.0 database.
Note: Java DB (Apache Derby) versions prior to 10.6.1.0 do not support NATURAL JOIN, which is used in the definitions of several views, so they should be upgraded.
SQLite has also been used for testing; the JDBC driver for it can be found here.
Icons are from the FatCow collection.

Running the Application

The application itself contains neither the Unihan.zip file nor a database driver. The former is available from the Unicode website here (version 6.0), and is not to be hosted elsewhere. A database driver may be optionally bundled with the application in later versions.

The application requires a JDBC-compliant driver to be locally installed and accessible to it. This usually means the relevant JAR file must reside in the lib\ext subfolder under the JRE (Java Runtime Environment) installation location; For Apache Derby, the file is derby.jar. Please refer to your database documentation for further information.

The application can be launched using the following link:

Launch JNLP file

Fill in an appropriate database URL and browse for the Unihan.zip file, then press Connect to start loading Unihan data into the database. If you have loaded the data into a persistent database, you can instead mark the Don't load Unihan checkbox to directly access the existing data. Either way, allow the connection to complete and press Continue to be able to send SQL queries. Note that you may also create additional tables, indices etc. as well as change loaded Unihan data (though this of course will not be reflected in the Unihan source files).

The database driver is expected to be inferred from the database URL. For instance, the application defaults to jdbc:derby:memory:unihan;create=true, which locates the Apache Derby database and instructs it to create an in-memory database. This takes about a minute, depending on the database in use and computer speed.

If the application doesn't run properly, please refer to the Troubleshooting notes below.

Database Schema

The exact definition of the tables and views created by the application is available in its menu through Info → DDL Statements. The following are general guidelines.

Example Queries

Retrieve all radical-stroke indices from kRSUnicode
SELECT CODE_POINT, Radical, Strokes 
FROM kRSUnicode NATURAL JOIN CJKRadicals
Retrieve all radical-stroke indices from kRSKangXi (no simplified radicals)
SELECT CODE_POINT, Radical, Strokes 
FROM kRSKangXi NATURAL JOIN KangxiRadicals
Usage of HAN_BLOCK: retrieve any definitions given for Extension C characters
SELECT CODE_POINT, kDefinition FROM kDefinition
JOIN HAN_BLOCK ON CODE_POINT BETWEEN BLOCK_FIRST AND BLOCK_LAST
WHERE BLOCK_NAME = 'CJK Unified Ideographs Extension C'
Find all assymetric Z-variant relations
SELECT CODE_POINT, kZVariant 
FROM kZVariant Z1
WHERE NOT EXISTS (SELECT * FROM kZVariant Z2
		WHERE Z1.CODE_POINT = Z2.kZVariant
		AND Z1.kZVariant = Z2.CODE_POINT)
Find all characters which have only a J-Source (and thus are probably only used in Japan)
SELECT CODE_POINT FROM kIRG_JSource J
WHERE 'kIRG_JSource' = ALL(
	SELECT SOURCE FROM IRGSource_any
	WHERE CODE_POINT = J.CODE_POINT)
Generate statements to remove all database objects created by the application in the current database
A newer application version may change the DB schema, so this is sort of a cross-database way of storing the current table set.
SELECT 'DROP ' || TYPE || ' ' || NAME 
FROM UNIHAN_OBJECTS
ORDER BY CREATION_ORDER DESC

Changelog

Oct 16, 2010: version 0.20 Sep 30, 2010: version 0.10

Troubleshooting