LISTSERV 16.5 - CODE4LIB Archives

Although, if you are using non-US style QWERTY keyboards, Key Curry may not
be appropriate. At the moment we develop keyboard layouts using Key Curry,
KeymanWeb and a web based input system that is part of wikipedia.

These approaches tend to fall over on mobile devices, but there are some
interesting developments going on in that area.

Andrew

On Friday, 20 April 2012, Peter Noerr <[log in to unmask]> wrote:
> On the matter of input of characters, the following email from the
Unicode list may be of interest to those working through or developing a
web UI. Note that Key Curry is a work-in-progress, and has received a fair
bit of "it doesn't have" comment on the Unicode list. But it is a good
basis.
>
> Peter
> ------------------------------
> From:  [log in to unmask] on behalf of Ed Trager [
[log in to unmask]]
> Sent:   Tuesday, April 17, 2012 2:41 PM
> To:     Unicode Mailing List
> Subject:        Key Curry : Attempting to make it easy to type world
languages and orthographies on the web
>
> A long time in the making, I am finally making "Key Curry" public!
>
> "Key Curry" is a web application and set of web components that allows
> one to easily type many world languages and specialized orthographies
> on the web. Please check it out and provide me feedback:
>
> http://unifont.org/keycurry/
>
> In addition to supporting major world languages and orthographies, I
> hope that "Key Curry" makes it easy for language advocates and web
> developers to provide support for the orthographies of minority
> languages -- many of which are not currently supported (or are only
> poorly supported) by the major operating system vendors.
>
> Under the hood, the software uses a javascript user interface
> framework that I wrote called "Gladiator Components" along with the
> popular "jQuery" javascript library as a foundation. I have used HTML
> 5 technologies such as localStorage to implement certain features.
>
> Currently, Key Curry appears to work well in the latest versions of
> Google Chrome, Firefox, and Safari on devices with standard QWERTY
> keyboards (e.g. laptops, desktop computers, netbooks, etc.). Recent
> versions of Opera and Internet Explorer version 9 appear to have bugs
> which limit the ability of Key Curry to operate as designed. The app
> is not likely to work well on older versions of any browser. I have
> not yet tested IE 10 on Windows 8.
>
> Although Key Curry appears to load flawlessly on the very few Android
> and Apple iOS tablet and/or mobile devices that I have "dabbled" with,
> the virtual keyboards on those devices are very different from
> physical keyboards and I have not yet investigated that problem area
> at all - so don't expect it to work on your iPad or other mobile
> device.
>
> Constructive criticism and feedback is most welcome. I have many
> additional plans for Key Curry "in the works" - but I'll leave further
> commentary to another day!
>
> - Ed
> -----------------------------------------
>
>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Robert Haschart
>> Sent: Thursday, April 19, 2012 2:23 PM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
ISO_2709 and MARC21
>>
>> On 4/18/2012 12:08 PM, Jonathan Rochkind wrote:
>> > On 4/18/2012 11:09 AM, Doran, Michael D wrote:
>> >> I don't believe that is the case.  Take UTF-8 out of the picture, and
>> >> consider the MARC-8 character set with its escape sequences and
>> >> combining characters.  A character such as an "n" with a tilde would
>> >> consist of two bytes.  The Greek small letter alpha, if invoked in
>> >> accordance with ANSI X3.41, would consist of five bytes (two bytes
>> >> for the initial escape sequence, a byte for the character, and then
>> >> two bytes for the escape sequence returning to the default character
>> >> set).
>> >
>> > ISO 2709 doesn't care how many bytes your characters are. The
>> > directory and offsets and other things count bytes, not characters.
>> > (which was, in my opinion, the _right_ decision, for once with marc!)
>> >
>> > How bytes translate into characters is not a concern of ISO 2709.
>> >
>> > The majority of non-7-bit-ASCII encodings will have chars that are
>> > more than one byte, either sometimes or always. This is true of MARC8
>> > (some chars), UTF8 (some chars), and UTF16 (all chars), all of them.
>> > (It is not true of Latin-1 though, for instance, I don't think).
>> >
>> > ISO 2709 doesn't care what char encodings you use, and there's no
>> > standard ISO 2709 way to determine what char encodings are used for
>> > _data_ in the MARC record. ISO 2709 does say that _structural_
>> > elements like field names, subfield names, the directory itself,
>> > seperator chars, etc, all need to be (essentially, over-simplifying)
>> > 7-bit-ASCII. The actual data itself is application dependent, 2709
>> > doesn't care, and 2709 doesn't give any standard cross-2709 way to
>> > determine it.
>> >
>> > That is my conclusion at the moment, helped by all of you all in this
>> > thread, thanks!
>>
>> The conclusion that I came to in the work I have done on marc4j (which
is used heavily by SolrMarc)
>> is that for any significant processing of Marc records the only solution
that makes sense is to
>> translate the record data into Unicode characters as it is being read
in.  Of course as you and others
>> have stated, determining what the data actually is, in order to
correctly translate it to Unicode, is
>> no easy task.  The leader byte that merely indicates "is UTF8" or  "is
not UTF8" is wrong often enough
>> in the real world that it is of little value when it indicates "is
UTF-8"and is even less value when
>> it indicates "is not UTF-8"
>>
>> Significant portions of the code I've added to marc4j deal with trying
to determine what the encoding
>> of that data actually is and trying to translate the data correctly into
Unicode even when the data is
>> incorrect.
>>
>> You also argued in another message that cataloger entry tools should
>> give feedback to help the cataloger not create errors.   I agree.  I
>> think one possible step towards this would be that the editor must work
in Unicode, irrespective of
>> the data format that the underlying system expects the data to be.  If
the underlying system expects
>> MARC8 then the "save as" process should be able to translate the data
into MARC8 on output.
>>
>> -Robert Haschart
>

-- 
Andrew Cunningham
Senior Project Manager, Research and Development
Vicnet
State Library of Victoria
Australia

[log in to unmask]
[log in to unmask]