Lies, Damn Lies…

This week, I’m getting excited about statistics! Well, I need something down to earth to balance out the amazing experience of being at APAN29 in Sydney.

Just before I started at JISC, we had some long and detailed conversations about statistics as part of the ANGEL project. Whilst usage statistic work has mumbled on in the background but there hasn’t been any significant work in this area….until now. Like buses, JISC usage statistic projects all come at once.

Something I am very happy to see funded, particularly as I saw the birth of the project idea whilst walking on a very hot day in San Antonio, is the RAPTOR project at Cardiff University. At the moment, Shibboleth Identity Providers can produce very useful access logs for institutions, but in a format that is not particularly friendly or helpful to the needs of librarians who need to be able to quickly review and assess resource usage. RAPTOR will produce a toolkit to not only provide this functionality but also to integrate these statistics with EZProxy logs – a joined up approach which I’m sure will be appreciated.

Hand in hand with this, the UK federation are planning on producing a portal to allow institutions to upload appropriately anonymised statistics….possible using the outputs from RAPTOR if we are smart about it. This will give us an interesting national view of resource usage, useful for both JISC and JISC Collections in focusing attention on the requirements of our community.

At the other end of the picture, it is equally important that we look at Service Provider statistics to provide the more detailed view of user behaviour beyond the authentication point. JISC Collections have been examining the potential of a usage statistics portal that will aggregate statistics from COUNTER compliant reports provided by publishers. Again, the point here is to reduce the amount of time librarians are forced to spend aggregating this information.

To complete the picture, the PIRUS project is looking at usage statistics right down at the article level across both publisher resources and repositories. More information is available in this post from Ben Wynne. PIRUS has produced a review of what information would be required to provide article level statistics. My only concern about this report is ‘who’ section and the options described for identifying unique users. eduPersonTargetedID and eduPersonPrincipleName seem obvious candidates for potential unique identifiers but are missing from the report. The challenge here will be any suggestion that looks at tracking the same user across multiple Service Providers. Obviously this is useful information for institutions, publishers and authors, but the privacy issues and management of Personally Identifiable Information (PII) will have to be carefully examined.

So that is your usage stats round-up – certainly lots of good stuff to keep an eye on.

Ross MacIntyre

Nicole,
In the posing above you attribute the JISC Usage Statistics Review Report to PIRUS, however, the PIRUS report is a different document.
I agree with you that the eduPerson attributes are appropriate candidates should you be wishing to identify an individual user’s usage. But we were/are coming at it more from the more general aggregation end – ‘how much in total?’.
The Berlin meeting, the genesis for the JISC Report, went quite deeply into the various national Data Protection implications of IP addresses being used (an ongoing debate). It didn’t resrict discussion to where federated access management was operational. I do promise to uphold the UK federation perspective where appropriate.

I would take issue with your opening salvo that “…usage statistic work has mumbled on in the background but there hasn’t been any significant work in this area…”! The development (started via PALS) and subsequent global adoption of COUNTER for Journal, DBs, eBooks etc and latterly SUSHI represent significant work and are IMHO more than mumbles ;^)

Cheers,
Ross

Thanks Ross, I meant only from the authentication end rather than statistical review at the resource end. Should have made that clearer ;-)