Life will find a way

I’ve been following Dave White’s Visitors and Residents work for a while as it fits in well with the discussions around identity provisioning that we’ve had endless debates about in REFEDS and Shibboleth circles. I was really pleased to see his latest piece on the Learning Black Market as it highlighted concerns I have had for some time about the way we are presenting resources to students and expecting them to discover and use them.

I know that finding stuff out there on the scary interweb is a hard hard thing, but isn’t part of a student about embracing the wide wide knowledge of the world and navigating an intelligent way through it? I’m fairly certain that scholars in the Library of Alexandria didn’t ask to be directed to a small subset of the vast knowledge contained within and they had much harder job as Google had not digitised the scrolls for them.

What we are now continually doing is instead trying to contain the internet in to chunks of knowledge, which we then wall and say ‘here good, there bad’. We are defining anything not within that area as illegitimate, and as Dave points out in his article any engagement of this becomes the learning black market. As a person who quoted the lyrics from ‘Kill the Beast‘ in Disney’s Beauty and the Beast in her dissertation this worries me.

I think there are 3 main behaviours at fault here: the reading list problem, library discovery services and access via proxies. Dave has covered the influence of the academics on this, I’d like to talk about the influence of the libraries.

Discovery systems are very popular at the moment and they quite frankly bewilder me – why would anyone want to repeat what Google already does so well in an inevitably smaller and contained way? Yes I know all the arguments about ease for users etc. etc., but I remain concerned. I’m often told that statistics show that use of resource x increases tremendously when a discovery service is used, but I think that is a false metric. If a supermarket decides only to sell oranges as the only fruit, sales of oranges will inevitably increase. This doesn’t mean people don’t want to eat apples or pears, it just means it is the only option you were presenting them with. During periods of rationing during World War II sales of delicacies such as Spam also went up. I’m pretty sure this was not because people preferred it to roast beef, but because this was the only sustenance that was being fed to them. As a process it is both spoon-feeding and walling. I’m really hoping that things like the new JISC Collections KB+ work will address ways to look at how libraries can do more to filter information to aid to discovery and not wall it. It’s looking promising, but will face massive cultural change issues to work.

The next problem is proxies – obviously something quite close to my heart! Proxies route all access to paid for resources to again, one walled place and say the best way to access them is to pretend to be somewhere you are not. Again I hear the arguments ‘its easier this way’ but easier for whom? I fear that their use is often about a need to catalogue and maintain links to things rather than facilitate access on the open web, which is more natural user behaviour. If someone tells me that I have to read an article on Fruit Spot on Jamaican Bananas in Transactions of the British Mycological Society, I am far more likely to Google some combination of those words than think – I’ll go and login to my institutional ezproxy!

I’m just wondering what would happen if we put a bit more time and effort in to supporting discovery on the open web rather than trying to wall and contain knowledge as a subset of ‘legitimate’ resources? Things like the recently announced MDUI work from the uk federation could certainly use a bit more support and championing by librarians in particular, and really does significantly improve the discovery process out there in the scary open world. I’m also concerned that legitimate in this context is simply another word for ‘paid for’ – the self perpetuating story that if a lot of institutional money has been spent on a resource effort must be expended to increase use to justify said expense.

When I think about approaches to containing knowledge, I can’t help but think about Jurassic Park. Dr John Hammond wanted to create a safe and managed environment for people to view his dinosaur collections, so he put in place all the walls that he could to make this happen – electric fences, breeding the dinosaurs as all female and breeding them ‘lysine deficient’ so they could not survive without human intervention. All of these precautions fail – the science doesn’t work. As Dr Ian Malcolm so succinctly puts it, in this exchange:

Henry Wu: You’re implying that a group composed entirely of female animals will… breed?
Dr. Ian Malcolm: No, I’m, I’m simply saying that life, uh… finds a way.

The Customer Grift

I think we are all quite accepting now of the fact that ‘the customer is always right’ has been replaced with ‘the computer is always right’. All around the world, companies are struggling to offer even basic customer services (I buy stuff, you send stuff) as confused ‘support’ staff battle to make computers do what we all know is the logical response to ‘buy stuff’ – i.e. send me what i ordered, the way I ordered it, and not something entirely different to a different place at a different time 🙂 We fight, we shout, we cry, we hit our heads on our desk and sometimes we get lucky and after expending hours of time and energy we actually get what we want. Other times we just give up. I’ve tried billing a company for the time I have spent being my own customer support – yeah, that doesn’t work.

So even if this basic, clearly customer lead environment, we can’t cope with being a customer. It gets even more confusing when our notion of customer is toyed with, thrown on its head and cast as an illusion. There is a simple name for what happens when you don’t understand your role in a transaction – it’s called a grift…or a confidence trick.

Two of my favourite cultural discoveries of the year both deal with grifting – American Gods and Zombieland. Both detail a confidence trick that involves confusing the mark about the value of something and about their ownership of that something – a violin and an engagement ring in turn. In these examples, the actually objects put out to play in the grift are effectively useless. A different approach is required when dealing with an object that has real value to the mark. In these scenarios, simplicity is key.

In American Gods, Wednesday steals significant sums of money from people with three simple techniques: 1) hanging an out of order sign on a deposit box, 2) wearing the uniform of a guard and 3) having a (fake) character witness at the end of the phone. People literally queued up to give away their money.

It’s something that we replicate everyday.

I’ve always said that academic publishing is the biggest confidence trick ever run. I always imagine trying to explain academic publishing to an alien: universities fund people to carry out research, said researchers then give that away to publishers for free, other researchers vet, judge and rate the stuff for free, publishers make it available for a fee….and universities buy it back. It’s Wednesday’s con in a different guise, although the publishers take it further and make you buy your valuables back. Brilliant! I’m not the only person to marvel at this business model.

Another classic con is the process of offering something to someone seemingly for free…as long as you just do x, y and z first. This is the approach used by the 419 scammers, and yes people have fallen for these scams enough times to make it big business. We might laugh at the badly worded emails fronting these scams, but we fall for it in other ways everyday by signing up to the plethora of free services that the social media world has to offer.

“If you’re not the customer, you’re the product” may have already have become a hackneyed phrase…but for me it rings quite true. Services no longer have to sell you something to make money, instead they take something from you that adds value to their offering and sell that concept on to other people: data, status and reputation. if a company can convince you to 1) sign up and give them all of your very valuable information, 2) get you to shout out about the use of that service and 3) get you to build up a profile within the service that will attract offers, confidence job done. All they need then to do is get you to pay an monthly fee for that service at a later date and they match the academic publisher in grifting skill.

I am of course being purposefully cynical here. I use loads of social networking sites and enjoy them immensely. I do it however with my eyes wide open. I’ve been amused lately by the outcry from people at Facebook’s recent changes to the way in which Facebook displays news feeds and they way Facebook uses cookies. ‘Let’s lobby them to change it!’ is not going to help. That’s a customer response. You’re not a customer.

So will we see a mass migration of people over to Google+ / The Next Big Social Thing? For that to work, you are going to need to persuade the things that add value to Facebook for you to move over to…and that’s people, not features. Once there, you can join in another identity battle – the ‘nym’ wars, with Google refusing to allow people to use pseudonyms, but have a really awful way of judging whether something is, or is not, a real name.

Where’s the real fight?

The Open Access movement is of course trying to tackle the academic publishing con. In the identity space, championing the non-open is good….identity should not be an open commodity. People are already trying to fight this cause around the banner of a ‘personal data ecosystem’ (a phrase that makes me shudder, protect my s**t would probably resonate with more people!). You’ll see things like UMA from Kantara, Mydex in the UK and various commerical offerings. The challenge faced by these attempts to take back ownership of personal data? The people giving it away just don’t care enough.

So what can we do about it? In terms of the identity problem, I think you have to either care about it and take responsibility for managing it, or decide you don’t care. If the first, it’s not enough to just move to another platform that you think might be more caring and cuddly about your identity information, you need to engage in something that allows you to actively manage your identity data. It’s a trade off, and will restrict the services you get to use…but is that worth the price?

If the later, then don’t complain if the service isn’t too your liking – you are not the customer, you are the product…you will be assimilated.

#IRISC Session 2: Andrew Lyall

I couldn’t live blog Andrew Lyall’s session because it took me a long time to work out what the question was. Basically the ELIXIR project is shifting around huge amount of sensative clinical data and have issues with doing that. At the moment a lot of what they do is associated with anonymised data that can be made openly available (as long as they get this right) but they have one core system that does require authentication.

The problem is the issue of delegated authorisation. Much of the data used is of such sensistivity that the community has established committees who decide whether you are allowed to get access to a resource or not – Data Access Committees. Within a federated infrastructure, this means the authorisation does not come from an IdP or an SP, but from a third-party. A system needs to be put in place that allows this authorisation to be both added and revocated in a trustworthy manner.

This sounds like a typical virtual organisation set-up, but we haven’t seen a lot of adoption of this sort of architecture within the federated landscape. Time for us to revisit these requirements at REFEDS? I think so.

As I mentioned at the beginning I’m not sure I got these requirements right so please let me know if I am barking up the wrong tree with this scenario.

#IRISC Session 1 – Barend Mons

Nanopublications are on the agenda now – the concept that the smallest unit of research data should be publishable, and of course you need to be able to apply identifier information to this. This create a massive noise of data, a massive noise of concepts…and these concepts need to be identifable.

For the first time in the session someone highlights that people and objects are no difference – people are just a concept that can have an identity tag associated with them, as are bits of data. This brings another role in to play – the person who is not just identified as an author but the person who is identified as being most linked to a concept and therefore most likely to be an expert on a subject area. Mons refers to this as a ‘concept profile’.

This approach to nano-publishing creates its own anatomy, that does not necessarily map in to the approaches used in traditional author identifier approaches. Mons then goes on to explain how all of these identifiers canbe used in tweets, blogs, wikis etc…again the first person to move outside the traditional publishing space. Mons believes this can only be done through the VIVO approach, whereas ORCID has more application in sorting out the problem of accurate impact citations in the traditional publishing space.

#IRISC Session 1 Brian Lowe

OK Brian Lowe needs to breathe a couple of times if I am going to have *any* chance of catching what he is saying. He’s talking about VIVO, which is an identifer system primarily funded by the NIH in the US.

VIVO focused on a linked data approach – one URI: public profiles for humans with data for machines. The URIs are assigned by the institution providing the VIVO instance and are structure to served linked data but no semantics are imposed…although it is assumed that it will be some part of an institutional domain space, e.g.: http://vivo.cornell.edu/somestuff.

VIVO assumes there will definitely be multiple URIs for people as we move institutions. This immediately asks the question…why the affiliation approach and what value does this add??

VIVO has a core ontology that focuses on mapping people with organizations using existing frameworks where they exist…e.g. FOAF and BIBO. This can then be extended locally with institution specific semantics. VIVO includes information not only about how stuff is related but also when….assuming that time context is specific and relevant to the author identifier space. They focus on moving beyond the publisher space arguing that publishing is just one instance of the application of an author identifier. Is this all approach the right one? I’m sure the approach will come up for debate. VIVO also relies quite a lot on local systems to populate…which gets back to the consent question and how this should be managed.

Question from the floor on who normally takes the role of supporting the VIVO node – answer is in differs. This is a fairly typical response to IDM approaches and one we are familiar with in the REFEDS space.

#IRISC Session 1 Martin Fenner

Martin Fenner is also from ORCID but is going to talk about ORCID. He starts off with some assumptions – we are all agreed that authors need identifiers, he doesn’t want to talk about technology and he doesn’t want to talk about the business model. His talk will focus on how to make this stuff work, and why we are getting it wrong so far.

Fenner lists the following issues:

  • A succesful identifer has to be used across many many different systems under different jurisdictions.
  • A successful identifer needs to be used. A perfect identifer cannot be launched as a ‘big bang’ where everyone changes.
  • A succesful identifer needs to be built up within the community.

The proposed approach from ORCID?

  • ORCID is discipline neutral and is being used in multiple countries.
  • More importantly ORCID interacts with other author identifier systems but does not try to replace them…a lot of these are owned by a specific publisher.
  • ORCID wants to be open – in the following contexts: anyone should be able to apply for an ORCID identifier, all ORCID data is openly available, the ORCID sofware is open source.

Fenner ends up by talking about consent – which sends ripples through the REFEDS folks in the room. Does ORCID empower people to say I permit that you can use my dataset in association with this publication? I’m not so sure about the level of choice involved in this process. Publish or die does not really create a consent based policy approach – some more food for thought.

Excellent chairing by the beepy bot and a creative commons slideset bring us to the end of this session with happy smiles. ORCID registrations will start in the next 12 months or so.

#IRISC Session 1 Geoffrey Bilder

First up at #IRISC2011 is Geoffrey Bilder from CrossRef and ORCID. Apparently Identity and Identifiers is second only to tax in terms of boring subjects – Geoffrey has obviously never hungout with the REFEDS folks before, we can talk about this stuff for *days*.

The Scholarly Record focuses on the longterm with the need to focus on trust, verification and provenance. This might be obvious, but Bilder then goes on to relate this to Brand, and the importance of trust, verification and provenance associated with the brand of a publisher. It’s an interesting perspective, given that I typically consider the verification of the author via identifier, not trust through brand. This idea of stewardship or a trust framework in which identifers are used echoes the trust framework built up by federations.

Bilder focuses on another element of trust – given a blanked out research paper we can easily identify all the parts as the format is known. We trust an article because of its shape and structure.

So what challenges do we have? Bilder describes this one of the primary problems as:

Language – identity means something different when we talk about access control as to when we talk about the knowledge discovery problem. Important aspects of this from the discovery space are the problems of name duplication, variation, changes and transcription. Interestingly, Bilder only focuses on real name examples here. I will get back to this later.

Bilder rattled of a lot of debate points, and I only had time to capture some of them. These were:

  • Persistence of identity is a social issue, not a technological one.
  • Persist-able does not equal persistence.
  • Persistent does not equal stable.
  • Distributed begets centralised.

Questions:

Central authorities are evil and unreponsive so we distribute to make ourselves feel better. However technology rarely works in a non-central way, there tends to be a central piece built in somewhere. Bilder interestingly uses software forking as an example of how the community can work to make sure that centralised model do not become complacent or evil. I wonder if CrossRef works to such a model where it is *possible* to fork….

Another question was around the need for human readable identifers. Human readable nearly always means that the identifier system will break. Semantics and ontology and persistence creates problems.

Hosting a logo – Simples?

You *may* have heard me mention, once or twice, that we really want to use your logos as part of the new discovery process for the UK federation. On the face of it, this seems like a really simple request…send us a link to a logo on an https page, send us the preferred name of your service / institution and send us a short description if you are a service provider (100 character – that’s around 25 words max). Simples, right?

Believe me I get how difficult this seemingly benign request is. Suddenly you are probably negotiated with your marketing department, library, IT and goodness knows what else to try and get agreement for this, and because it is not the clearest of areas people will be reluctant to say yes.

One of the main bits of feedback we get is that giving us a logo is easy enough, but the https hosted bit is more difficult. We are reluctant to store them centrally as it creates another single point of failure and, if your logo needs updating, you are going to have to keep sending us this information. A link to something on your site is more likely to be updated automagically.

A quick way around this is to look on your login page. If you are following good design guidelines, you probably have a logo on there. So for example, the LSE login page has this logo on it, and an easy link to send us: https://gate.library.lse.ac.uk/idp/images/LSElogo.jpg. Tell us the words you want to appear below that logo and Bob’s your Uncle.

Also, if you can’t manage a logo and an icon just now please just send us the link to the logo – it’s a great start for us! I appreciate that the icons are difficult for people.

It’s good for Service Providers too – for example JSTOR have this logo on their site: https://www.jstor.org/templates/jsp/_jstor/images/jstor_logo.jpg. Again – just fine on the requirements for us in terms of hosting (although perhaps not the best on size in this example).

For Service Providers that don’t use the WAYF, you might not think this is important to you. However, with the new embedded approach to discovery, IdPs can bring your branding on to their login pages if you provide us with this information so it is very useful. Here’s an example of this in action at Dundee College:

Embedded Discovery at Dundee College

I hope that is helpful.

Personally I’d really love to just go and hoover these all up for y’all and then just use them but we really do need persmission. I’m happy to suggest something for you to use though if you need help! Just let me know.