Searching Alma/Primo for CJK languages
Description | Alma and Primo is not always returning the correct results for CJK languages. For example, searching using simplified Chinese may not return records cataloged in traditional Chinese and vice versa. |
Reported by | CDL, UCLA, and UCSD (UCB also has a ticket which looks related but a little different?) |
Salesforce Ticket(s) | 06405399 - CDL for Alma 05318883 - UCB for Alma 06276425 - UCLA for Primo 06406147 - UCSD for Alma |
Current Status | For CDL’s and UCSD’s tickets, Ex Libris says we can configure Alma to for “special character search” but it can only be setup for one language and the ability to handle additional languages would be an enhancement request. We’re currently in the process of setting up a meeting that campus RM reps are invited to attend to talk to the Tier 2 support person responsible for CDL’s ticket. |
Below are excepts from the tickets we’ve submitted to Ex Libris.
CDL - 06405399 - Unicode Cross-Mapping Traditional and Simplified Chinese
6/8/2022 - Initial Ticket Description
It appears that Alma and Primo are not mapping the equivalent simplified and traditional Chinese characters. This is preventing our users from finding all the resources we own. We believe this is similar to a case UCLA opened (#06276425) and possibly UC Berkeley's case (#05318883). Is there a setting we can change to get this to work correctly?
For example, 群島 (traditional) and 群岛 (simplified) should retrieve the same record. However, they don't. If I search for 群岛 in Alma, I correctly retrieve MMS ID 9917107805006531 as the first result.
If I do the same search with the equivalent traditional characters, 群島, I don't get the correct record.
This works correctly in WorldCat. For example, compare 群島 vs 群岛 and you can see the results are the same.
6/9/2022 Ex Libris Response
My understanding is that Alma supports the ability for traditional/simplified Chinese transliteration, but this must be specially configured in order to be enabled for searching:
I do want to note that while this can be configured, only one language for special characters can be defined at a time.
The process requires a scheduled re-indexing by our Tier 2 team, so I'll share this case with my colleagues to assist further!
6/9/2022 CDL Question
Thanks for linking to this doc! I'm looking at it and it says "Alma's handling of special characters is relevant for searching in the institution zone only.". We're a Network Zone. This means it won't work for us?
Also, I'm concerned that it only allows one language for special characters. As you can see from the other tickets, the UC system needs the ability to search in multiple languages.
6/15/2022 Tier 2 Ex Libris Response
1 - The Alma search should be configured separately for Network and for institution(s).
The reindexing should be scheduled, as it takes time and is recommended to run during off hours.
Please let me know if you would like this configuration, maybe on your Sandbox first.
2 - The Primo configuration should probably be done for Network. Please see https://knowledge.exlibrisgroup.com/Primo/Product_Documentation/020Primo_VE/Primo_VE_(English)/030Primo_VE_User_Interface/Configuring_Display_Languages_for_Primo_VE
UI language is used for search time language recognition. Primo VE automatically detects the following languages, based on comparing the words of the record and the query with a dictionary .
For Chinese: If the character is Chinese and the locale of Primo VE is Japanese or Korean, Primo VE uses the locale of the selected language.
For more details please open a case to Primo Support
6/15/2022 CDL Question
Hi, I add thanks as well for linking document. However the issue is that CJK languages share Chinese characters, traditional, simplified, and variants, as Unicode put East Asia as one group. (See FAQ cited below from Unicode site https://unicode.org/faq/han_cjk.html ) the Chinese (= Han) ideographs used in the writing systems of the Chinese and Japanese languages, occasionally for Korean, and historically in Vietnam. The Unicode Standard supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X 0221, or JIS X 0213, for example, and many more. This is true no matter which encoding form of Unicode is used: UTF-8, UTF-16, or UTF-32. Unicode supports over 80,000 CJK characters right now, and work is underway to encode further additions. The International Standard ISO/IEC 10646 and the Unicode Standard are completely synchronized in repertoire and content. And that means that Unicode has the same repertoire as GB 18030, since that also is synchronized with ISO 10646 — although with a different ordering and byte format. Why Ex Libris configuration set to allow only one language at a time? Does it mean that in order to have cross mapping work for all CJK languages, Ex Libries will need to run Chinese characters three times, one for Chinese, then one for Japanese, then another for Korean. Should we need to run HKIUG TSVCC as well with three languages separately in order to get Chinese variants cross-mapped?
I have one more question, for re-indexing job, is it happening for after each language configuration? Or can it be done after configuration for each language?
6/16/2022 Ex Libris Response
Only one language can be defined at a time.
This is from the documentation:
https://knowledge.exlibrisgroup.com/Alma/Product_Documentation/010Alma_Online_Help_(English)/010Getting_Started/050Alma_User_Interface_%E2%80%93_General_Information/Searching_in_Alma
</>
Contact Ex Libris to enable the following enhanced search options:
Search in traditional Chinese/Kanji or simplified Chinese/Kanji and return results in both traditional Chinese/Kanji and simplified Chinese/Kanj.
Search in Hangul (Korean) and return results in both Hangul and Hanja.
Search in Hiragana (Japanese) and return results in both Hiragana and Katakana.
</>
Changing this behavior would be a request for enhancement. Please submit to the Ideas exchange.
Regarding the character sets, I would encourage you to post this question to the Alma discussion group (alma@exlibrisusers.org) - or please let me know and I will inquire with our APAC colleagues.
UCLA - 06276425 - Kanji character mapping in Primo VE
2/10/2022 Initial Ticket Description
Related case from UC Berkeley: #05318883
When searching for records using Japanese script, equivalent kanjis will not retrieve records correctly.
For example, 亜 is a new kanji form and 亞 an old form of the same kanji. But using one in a Primo VE search will not retrieve results that contain the other.
Example: 戸 and 戶
Searching 江戸版本聚葉 will not retrieve any results. Searching 江戶版本聚葉 will retrieve https://search.library.ucla.edu/permalink/01UCS_LAL/17p22dp/alma9994114343606533
Example: 説 and 說:
Searching 物語小説 will not retrieve any results. Searching 物語小說 will retrieve https://search.library.ucla.edu/permalink/01UCS_LAL/17p22dp/alma9994345633606533
This is not a problem when searching WorldCat:
Title 亜細亜 (format: book; language: Japanese): 790 hits https://www.worldcat.org/search?q=ti%3A%E4%BA%9C%E7%B4%B0%E4%BA%9C&fq=x0%3Abook+%3E+ln%3Ajpn&qt=advanced&dblist=638
Title 亞細亞 (format: book; language: Japanese): 790 hits https://www.worldcat.org/search?q=ti%3A%E4%BA%9E%E7%B4%B0%E4%BA%9E&fq=x0%3Abook+%3E+ln%3Ajpn&qt=advanced&dblist=638
亜 is a new kanji form and 亞 an old kanji form of the same kanji.
Some additional information about the different forms of kanji:
https://www.asahi-net.or.jp/~ax2s-kmtn/ref/old_chara.html
https://www.sljfaq.org/afaq/new-old-kanji.html
Impact example: Either 江戸 or 江戶 mean Edo of the Edo period of Japan, which is one of the most used terms in Japanese studies. For the UCLA Library Catalog (excluding articles), 江戸 title search will get 47 hits, the same search with 江戶 search retrieves 1,372 hits, about 30 times more. Primo users will find mere 47 titles held in the UCLA Library. Vernacular search is critical to discovery.
2/28/2022 - Transferred to Tier 2 Support
6/30/2022 - Most recent Ex Libris response (no useful response prior to this date either)
I apologize for the continued delays in communication in this case. I wanted to let you know that that I am still working on this issue, and hope to have an update for you next week.
UCSD - 06406147 - Cross-mapping simplified and traditional Chinese
6/9/2022 Initial Ticket Description
Hello! We would like to report that we are experiencing the same lack of functionality in the cross-mapping of simplified and traditional Chinese in both Alma and Primo that has been reported by UC Berkeley, UC Los Angeles, and CDL.
(See CDL’s ticket above for exact description.)
I would just like to add that UCSD has very large collection of Chinese materials that are of tremendous importance to various stakeholders not only on our campus but in the broader San Diego community. Being able to find all the materials we have is therefore of great significance to us.
7/8/2022 Ex Libris Response
Per some of the information you might have found in the other cases you mentioned in your original case comment, Alma supports the ability for traditional/simplified Chinese transliteration, but this must be specially configured in order to be enabled for searching:
Please note that while this can be configured, only one language for special characters can be defined at a time. The process requires a scheduled re-indexing by our Tier 2 team, so I'll share this case with my colleagues to assist further!
7/11/222 Tier 2 Ex Libris Response
This change involves reindexing of all the records.
Would you like to try the configuration first one your Premium Sandbox?
UCB - 05318883 - Japanese language author/title search
4/15/2021 Initial Ticket Description
It appears that neither title nor author browse searches in Primo and Alma are returning expected/accurate results when searching in Japanese. The user was searching on the 880 fields.
Example searches:
author: 村上春樹
author: 夏目漱石
title: 吾輩は猫である
This was originally submitted in Basecamp: https://3.basecamp.com/3765443/buckets/15553579/messages/3641607354#__recording_3670132245
4/26/2021 Ex Libris Response
Document says: Alternate Graphic Representations (880 fields) are now indexed and searchable for the following bib fields: 700, 710, 711, 260, 490, 830. Browsing bib headings with non-Roman text from these fields is already possible – make sure that in the "Authorizing Fields" mapping table, the line with target code "Name_Headings", Source1 = "880", Source2 = "880" is enabled.
I enabled 880 in the Authorizing Fields" mapping table for Name_Headings.
The 880 field can be configured as an access point with all the related capabilities of an access point. This KB article explains how to do this.
4/28/2021 UCB Question
Thank you for configuring the 880 field as an access point. If the 700, 710, 711, 260, 490, and 830 fields are indexed and searchable, will I need to configure 1xx and 245 fields as access points? Also, would this be something our other UC partners would need to update as well in their individual instances?
7/29/2022 Ex Libris Response
Primo browse search is returning 880 fields as the bibliographic heading search in ALMA.
Checking the example search that previously didn't work, I am able to see results now.
Example searches:
author: 村上春樹
author: 夏目漱石
title: 吾輩は猫である
Please take a look and let me know if we can set this case to be resolved.
2/4/2022 UCB Question
We are still seeing this behavior, and UCLA has reported it as well in their instance. They may open a case and I will include their case number here if they do.
. . .
I have added my colleague to this ticket since she reported the issue and has the language expertise to advise. She has included several examples from UCLA of the behavior because the 880 examples provided do not illuminate the issue at hand:
江戸版本聚葉
(will not retrieve). Search by
江戶版本聚葉
物語小説
(will not retrieve). Search by
物語小說
https://search.library.ucla.edu/permalink/01UCS_LAL/17p22dp/alma9994114343606533
https://search.library.ucla.edu/permalink/01UCS_LAL/17p22dp/alma9994345633606533
7/19/2022 Status
No further updates from Ex Libris have been received.
The SILS mission is to transform library services and operations through innovation and collaboration. The future is shared!
Question? Contact AskSILS-L@ucop.edu