SCP, SFX, and related resource records handling for eResource records in NZ - VANGUARD

Legend: NOT STARTED IN PROGRESS STALLED DECIDED

Status	DECIDED
Description	Determine how SCP & SFX records, and related resource records will be handled for creation of electronic resource records in the Network Zone.
Decision
Owning group	CDL Data in the Network Zone
Approver	Policy & Practices Coordinators
Stakeholders	R/A = Resource Management & Acq/ERM C = PPC, ILSDC I = SILS Phase 4 Cohorts
Decision-making process
Priority	Before Vanguard
Due date	26 Jun 2020

Revised recommendation (7/30/2020)

Based on strong recommendations from Ex Libris and careful consideration by the group, we recommend:

load all SFX CDL data into the NZ
load CDL 360 data into the NZ
Do NOT load any CDL SCP records into the NZ
Load CDL SCP records into non-linked separate 'SCP IZ'
CDL SCP records will go through P2E process in the SCP IZ
CDL will identify packages that are unique to SCP and test to add them into the NZ once the Vanguard environment is open
CDL will work with UC San Diego to test the strategies for deduping records resulted from loading UCSD SFX data and UCSD e-resources through the P2E process

Reasoning for the revised recommendation (7/30/2020)

Ex Libris strongly recommended to forgo loading SCP data into the NZ due to the high number of duplicate records this would cause (likely over 1 million). The concerns raised were:

time during vanguard would be spent troubleshooting duplicates rather than learning system
duplicates would filter down to all IZs and into all Primos, causing a very negative user experience
SFX data will provide match into CZ, while SCP data cannot
SFX data will create available for and inventory groups, while SCP data cannot
CDL 360 data has potential to match to CZ (Ex Libris estimates around 60% match point), which may lessen the duplicates from this data set

Recommendation (6/26/2020)

PPC Recommendation (6/26): Load as much data as we have to see what happens and learn from the experience. (90% of SCP bibs and 100% of SFX CDL data). For the rest of 10% of SCP data, CDL/SCP would like to recommended test cases listed below - for PPC review and approve by the 26th meeting.

Other test cases (10% of SCP bibs):

· EEBO (~130K SCP bibs) –excluded from SCP bib extracts & P2E, test to see bib records from SFX load

· Dacheng brief records (lack OCLC#) (4700 SCP bibs) –excluded from SCP bib extracts & P2E, test to see post-migration of importing vendor marc records

· Small collections (~1200 SCP records) –excluded from SCP bib extracts & P2E, send bib file separately to ExL to test ExL option of import SCP records after SFX load

Karger online monographs @855 (mono may be renewed each year that collection in CZ by year, test how load would match SFX and SCP records
Karger online journals @ 167 (has about a few records lack ISSN)
Open access resource selected by the UC Libraries. Karger online journals @14
Loeb Classical Library online monographs @220 (has about 20 lack ISBN, we have set records for multi volumes while CZ may have by each volumes, test how load would match SFX and SCP records)

Historical info:

[Previous PPC Recommendation (6/19): Load as much data as we have to see what happens and learn from the experience. (100% of SCP bibs and 100% of SFX CDL data). If SCP or CDL feels they have some capacity to do some other testing, they can recommend some particular ways of doing the extract - they can make that recommendation to PPC for review by the 26th meeting.]

Background

Alma’s “first-in” nature of promoting a record as the master in the Network Zone has implications for SCP records as they are a mixture of participants (some “all UCs” and others a mixture of 4-9 campuses, with some where UCSD is not a participant).

When we started, we were considering the question of how to reduce the number of duplicates between the SFX data and the SCP data. There was a question of which, if any, SCP records to remove from the bib data extract in order to reduce duplicates. Now (6/19), PPC is unsure if it’s worth cherry-picking records, we can leave it up to the subgroup to recommend.

Vanguard considerations

Given that this is our first test of the NZ, what is the best value to test, also given that we don’t know a lot?

We need to tell the campuses what they have, how to access it, how much they owe for it, what can they do with it (eg. ill, course reserves).

What are the functions that we currently have in CDL in the Network Zone?
1. SCP bib records are MARC bib data for shared electronic resources, that’s created once for multiple consumers. (links to vendor sites in some, participant data)
2. SFX activations gives us the OpenURL resolver. Include links out to the vendor sites (licensed material), participant data (who has access to which pieces).
3. Shared Acq pays for the things, recharge the campuses, maintain vendor data / contracts, get the title lists of what we actually license.
4. Campuses consume the bib data, access data, and have financial relationship with Shared Acq (sending money).
What are the pieces of data that we currently have?
1. SCP bib records for shared electronic resources (Tier 1 and Tier 2) (live in UCSD III ILS)
2. SFX activations managed by CDL
  1. SFX data managed by campuses locally is handled separately…
3. Shared Acq workflows, financial info……….
  1. Not a clear migration path for some of these data sources (e.g. Excel sheet, or shadow system)
  2. This is a CDL team local decision about working with Ex Libris to see what path is feasible here. This might be a post-vanguard action.
4. CDL License terms, vendor data = 360 RM database?
5. Campus copies of SCP records. Is there an edited local version of the SCP record? with different info?
Which of these pieces should get tested now? Why / why not?
1. SCP bibs, SFX, Campus SCP records can be tested for Vanguard - we need to know now how these will interoperate and the number of duplicates created / cleanup tasks will be created.
2. It’s harder to test legacy data related (ultimately) to analytics (financial data, etc.)

Recommendations for testing the data:

All the data CDL has will load through different processes:

SCP (MARC bib records in UCSD) = P2E process for NZ
SFX data managed by CDL = automatic load from Ex Libris into NZ
1. Local campus SFX instance data will be migrated separately (by Ex Libris) into each IZ (?) (see below questions for Ex Libris)
Licensing & vendor data = ERM process from 360 RM?
Ordering info = (e.g. Excel sheets) = Local CDL decision working with ExL
Campus copies of SCP data = see: /wiki/pages/createpage.action?spaceKey=PPC&title=Preparation+of+Campus+SCP+Data+-+Vanguard&linkCreation=true&fromPageId=495452849

Repercussions for order in which data is moved

SFX data will not match with SCP records = there will be dupes = post-migration cleanup? (post-migration analysis of duplication). They want to work with us to recommend options to reduce the number of duplicates. May not be available for vanguard, but there may be options in the future, and things could go wrong now - that will provide more info for future decisions. BRING THE PAIN.

What are the ramifications for local campus SCP records if they can’t remove the data? (see the other page)

Questions for Ex Libris:

How will campus SFX instance data be handled? Verify if there is any linking between the NZ and IZ during that migration.

What actually needs to happen

SCP needs to extract the bib data from III, and do the migration form / mapping form
SCP needs to prepare the P2E files
1. Note - these lists (bib data extract and P2E list should be the same list / number of records, since all SCP records are electronic)
Deadline for giving to ExL = July 17th

Questions to consider (updated 6/19)

Will SCP records be loaded separately from the rest of the UC records?

Yes (see Order of Network Zone Record Loading - Vanguard )

Will campuses that are not UCSD be extracting and loading their SCP records or will they only be included as part of the UCSD extraction and load?

See /wiki/pages/createpage.action?spaceKey=PPC&title=Preparation+of+Campus+SCP+Data+-+Vanguard&linkCreation=true&fromPageId=495452849

If SCP records will be included from all campuses, then we would want to consider loading SCP/UCSD first into the NZ since the SCP catalog contains the original form/complete set of records.

Update: this is not how it works now. See (see Order of Network Zone Record Loading - Vanguard )

Ultimately, what does SCP (and potentially CDL electronic resources as a whole) look like in the Network Zone? (This is the big question!)

In process….

What about what’s in 360 Resource Manager? How does that related and get migrated over?

We will hopefully learn this at the 6/25 meeting!

What about PIDs? They are part of this ecosystem and should be considered somehow… Part of CDL-managed data. Do we have to flip the PIDs in SCP data to target URL before or after migrating?

Hopefully some answers will come out of 5/18 and 5/21 migration meetings (6/19 still unknown). Can we pass this to Discovery FG to think about? Do we still have to run a PID server? Can ExL processes take care of this? What is the fundamental purpose of the PIDs. Also - PIDs might not be compatible with the OpenURL {stuff… ] there may be access issues depending on where a user is coming from.

This may be something we need to reconsider as a service. → CDL / Discovery FG
- How can we test this?
- There are also local PIDs that get used from other places, not just SCP records.

How much of our data or processes are duplicative? (Things might be technically duplicative - e.g. SCP bib records vs the data in SFX, but there is some qualitative difference that might make a difference for us.)

In process…

Notes from 5/11 chat

What data is in which place, and how do we connect them? → CDL should be able to indicate what data lives where in what format? How are they related?

How to match would come from that data inventory.

Which data is best, and we want to use that?

Dedupe from P2E process (or else will end up with duplicate records)

What if we don’t want to use CZ? Then we will create everything in NZ as UC local resources - still don’t want dupes there either. And we’ll lose the admin changes in the CZ. Can we use the Vanguard to test out taking advantage of what we want from the CZ (admin changes, coverage updates, etc.) but the bib quality we want from our NZ records?

Can we try different type of workload for Vanguard

What data lives where?

CDL/SCP resource records: some are bibs, some are SFX, some bibs have some SFX data but not all, entitlements live in a separate place?

Can we create cases that we are able to poke at, in the vanguard: let’s start making a decision for the vanguard test.

And create test cases for the Vanguard so that we can compare how things work with our own data

Title-level cataloging for serials ~~is not~~ may not be that important with PCI in Primo. Let’s rethink how we’re doing this.

Dependencies

Outcomes of 5/25 Migration meeting may shed some light on how we handle SFX, MARC and 360M data

SUB-Decisions → separate page

Preparation of Campus SCP Data (full decision page with notes about this topic)

Action Log

Action/Point Person	Expected Completion Date	Notes	Status
Review by PPC - decided to take this to a subgroup for breaking up into multiple questions.	11 May 2020	Caitlin, Lisa, Liz M, and Xiaoli	DONE
Ask CDL to do a data inventory / relationship diagram Alison Ray (CDL)	?	This seems like a necessary starting point to identify what we have and where duplicates might be.	DONE
ERM/Acq and Resource Management to take a pass at answering some of these questions	week of 5/11	RM to take on the ILSDC sub-decision above as well.	DONE
Q&A with ExL	16 Jun 2020		DONE
This decision page gets reviewed, updated, and recommendations get made by CDL Data group
Recommendations get reviewed and approved / revised by PPC
Decisions get communicated out to CDL / local campuses for action.
(Test in the vanguard!)