Sunday, May 27, 2012

A thought about predictability of user requests

I am not by any stretch a customer service guru or expert, but one thing I noticed about libraries is that user needs tend to come in cycles and often can be easily anticipated in advance.

After working in an academic library in the last 5 years and monitoring emails, tweets, chats , search logs, you see the same pattern repeat itself over and over again.

For example, just before the exam results are released there will be a spike of questions or searches about fines payment because the university department will typically send out a email warning students that amounts outstanding to NUS , be it hostel fees, course fees or even library fines might bar them from receiving their exam results.

Given that this need for information is predictable in advance, one can post existing FAQs on fines payment etc at just the right moment on Twitter, Facebook and sit back and watch the information get shared and reshared.

The same thing can be seen with users searching for past year exam papers near the exam period, freshman trying to figure out their passwords during orientation, honours years and graduate students trying to figure out how to format their thesis near the end of the term or even certain hours before a certain day where most term assignments are due (for us there is a big spike of usage of eresources just a couple of hours before the end of the one week term break).

This has made me wonder if the library should take into account such usage patterns and perhaps deploy more help such as lengthening online chat references (up to midnight?) for identifiable periods where users desperately need help.

That's just one of the ways where a library can be truly adaptable to user needs by shifting manpower to areas of concern to users, though the drawback is such a flexible , adaptable system with many exceptions would be quite confusing to users and staff...

Nothing earth-shattering, just a thought.



Tuesday, May 8, 2012

How is Google different from traditional Library OPACs & databases?

It's a truism in library circles today to say that Google and web search engines (I will use "Google" as a stand in for web search engines) have changed the way users search which in turn affects what they expect from searches in the library.

Libraries have two ways to react, first is to try to change user behavior through information literacy or altering the library catalogues and databases to fit user expectations. If one goes by the famous "User is not broken" meme,  the later seems to be the way to go.

That said how is Google different from typical library catalogues or even library databases?

In particular Web Scale Discovery products which are pretty much Google like search engines have the potential to be exactly like Google but what exactly are we talking about and if it's possible should we do it?

I was inspired by a recent exchange in a Summon mailing list, between Librarians and many Library IT people (so called Shambrarians) on how exactly Summon should work which led to a disgression on what additional feature sets should be in there to support searching.

It seems to me we are discussing two different matters and that the most traditional library catalogues and databases differ from Google in two main ways.  

  • Firstly the default search works differently from Google. 
  • Secondly they have additional options not available in Google. 

My thesis is that increasingly there is much less debate over the first type of difference (though there are still some holdouts for some features) but still plenty of disagreement about the value of the second as an option. In particular many of the defaults we take for granted in library catalogues and databases today have *already* being impacted by Google.

It's a long blog post so here's the summary

Library OPACS and databases generally already

  • rank using relevancy by default
  • do "implied AND" by default

They are slowly moving towards

  • autostemming (for databases)
  • covering full-text (for Web scale discovery services)
 But unless a miracle happens will never


  • Do a "soft AND" , where occasionally search terms might be dropped  

In short, the further away your library search is from these characteristics , the more difficult your users will find the search to use due to different expectations. Trained by Google, their searches are created based on the expectations such features are built-in , lacking any one of them will result in difficulties and poor quality results.

Of course implementing these features means losing control and predictability of searches, librarians don't want to be surprised and for sure they don't want to see a result they can't explain. Being able to do a precise controlled search would enable a searcher to be *sure* he has done a exhaustive search that he wants.

Google works on the opposite paradigm, aiming at getting a few good relevant results never mind if the searcher can't explain how it got that result, so there's a tension here.

Even if a library search ever achieves all  five points, a second battlefront opens... what about adding more advanced options for librarians and power users? This is a pretty contentious issue....

But for now let's discuss the differences in default searches. Let's start from features  that are totally accepted


1. Traditional Library catalogues and databases do not rank by relevance even for keyword searching

This is an area where Google web search engines have completely won. The idea that you could get results without any relevance ranking at all is one that is hard to wrap your head around in this day and age.

Even newer librarians find this concept alien. I remember explaining to a colleague in 2007 that traditionally Boolean searches did not rank results by relevancy as in theory all results can be considered equally relevant as they meet the search criteria but she didn't believe me.

The older OPACs, merely ranked in system order (date the record was entered) or if you were lucky by author, title or publication date.

Of course today the newer OPACs and next generation catalogues have relevancy ranking as the default (relevancy ranking is one of the signature characteristics of a next gen catalogue).

Still there are some older OPACs out there that don't do relevancy ranking at all for keyword searching.

An example would be the National Library Board ageing classic catalogue (note : they also have Primo as well) , according to the help file 

"The default sorting order for the list of titles is in reverse chronological order, that is, the most current items first. If your retrieval set is less that 300, you may also choose to sort your list by author, title, call number, date, or original order." - no mention of relevancy ranking at all.

Even our classic catalogue Webpac pro, IIRC turned on relevancy ranking by default fairly late in 2007.


Still doing advanced field searches like (a:twain) and (t:huck*) , disables relevancy ranking as an option (probably a technical reason). Something that surprises even librarians sometimes.



In general though I think the idea of ranking results by relevancy has met with very little resistance by librarians (if they even knew of a time before relevancy rankings), most of whom are happy with somewhat unpredictable rankings as long as they can explain why a result is found.

Still there is some loss of control to the searcher, you might be able to explain every result (assuming strict boolean) but you can't explain why this result is on top not another. 

As such I know of librarians who are a little unhappy with the limited amount of information available on how relevance ranking works.

In the case of OPACs and next generation catalogues, like our own Encore it is fairly easy to figure out.

There are 5 levels only -
  • "Most relevant" - exact phrase match in title $a subfield, 
  • "Highly relevant" - exact phrase match in title but other subfields
  • "Very relevant" - exact phrase match in any field
  • "Relevant" - No phrase match but all terms appear in title field
  • "Other relevant" - No phrase match but all terms appear, but not all in title field 

Then again with the limited amount of information you have in OPACs, you probably can't get more sophisticated then that.

But when it comes to databases considering full text it obviously gets complicated (here's EBSCOhost's explanation) and all you get is a sketch, typically will state what fields are heavily weighted (title, subject heading typically, if you are lucky might order the fields by importance), weighted toward exact phrase match, and some general explanation of tf-idf.

For librarians myself whose knowledge of search relevant rankings extend just to tf-idf weighting I probably wouldn't understand everything if they bothered to explain it all, not that they would. :)

Still that doesn't stop librarians from asking if they can alter the relevancy rankings for Web Scale discovery tools. This is something that is usually not possible, typically you can ask to "boost" the results from your location collections or Institutional repositories but nothing beyond that except maybe Primo Central (self-hosted version only?).

Though one wonders if most know what they are asking for.



2. Traditional Library catalogues and databases do not do implied AND

Again this is one area where Google has slowly chipped away at what traditional databases used to do. In Google when you enter a search say Singapore History , it is implied that you are also happy with results that include Singapore AND History, instead of just Singapore History as a phrase.

Correct me if I am wrong but in the past, most databases did not do implied AND. If you typed in Singapore History the database expects that you want articles that have the two terms in that order next to each other aka a "phrase search".

But now pretty much every library database from Scopus, Web of Science, JSTOR all do implied AND.

I am hard pressed to find examples where implied AND is not on by default. So far for us, I have found Lexisnexis academic that does not do implied AND.

Of course, some databases have various modes and/or allows you the option to set the default to whatever you want, in Ebscohost platforms you can in fact choose between a pure "Boolean/phrase" mode (so if you don't enter AND it assumes a phrase search) or a "Find all my search terms" mode which is doing implied AND.

For databases with multiple modes, it's common to have the "basic mode" with implied AND but the advanced/boolean mode do implied phrase search.

Still, I know the lack of implied AND was very common in library databases in the past, as I can still find mention about how "some databases do implied AND" or similar statements in older books on database searching, my thesis is now implied AND is more of a norm than exception.

In short like relevancy rankings this is another area where web search engines like Google have impacted how library systems work.

Again similar to relevant ranking, my impression is that this change has being accepted & absorbed by library community without much resistance. Sometimes you do need exact search, but phrase search using quotes is generally sufficient and a worthwhile price to pay for the gain. Thanks to relevant rankings that prioritize phrase matches, so you seldom need that anyway.

This change involves a little loss of control to the searcher since the search tries to be "clever", but you can gain control back by adding quotes (or + operator until Google changed it due to Googleplus).



3. Traditional Library catalogues and databases generally do not do stemming by default

Relevance ranking and implied AND has being generally absorbed into library databases and opacs without much dispute but what about auto-stemming? Here I use stemming loosely to mean including word stems (e.g run finds running) . Closely related is the ability of search engine to include synoymns and other related words (e.g automobiles finds cars).

From what I observe auto-stemming by default is currently still in the minority but the tide is turning.

Traditional databases like Scopus/JSTOR generally give you exactly what you search for. If you search for librarian, it won't give you librarians (note extra "s"), library or even information professional. In this sense it is totally predictable.

But cracks are appearing, the new web of science platform "automatic searches for over 7,000 spelling variations such as British/US English (colour / color) as well as name variants (mice / mouse)"

OvidSP's default basic search by default includes related words ("included related terms") - typically plurals but occasionally synonyms.

Ebscohost database has options "Applied related words" , though it isn't turned on by default on my version.

Lastly Summon itself does autostemming unless you add Boolean operators in which case it does exact searches. I believe Summon also has a list of proper names so it will by default search for those as well.

From the reaction seen on the Summon list autostemming by default is more controversial.

Add autostemming without a way around it such as adding quotes or a Verbatim mode in Google? You will have a librarian revolt on your hands.

Google of course does autostemming for word variants but according to Googleguide (which is not official) it does not find related terms (cars vs automobiles) unless you add the tilde (~) operator. Personally I think the later portion is a simplification.

It you look at what verbatim mode actually turns off, you can see Google does more than just does word stems.

Personally I think if you have a choice it is good to turn on stemming by default. Particularly if stemming is of the form that searches the root-words.  In most cases, a searcher looking for car is also looking for cars . It is just a bit cruel to make him use truncations, wildcards or even the Boolean OR just to achieve this. In the rare case, he isn't just add quotes.

I am a bit more on the fence if it is of the form that throws in additional synonyms.  On one hand, I agree with Iris that the greatest difficulty students face when searching is what she calls "term economy" , where adding the wrong keywords get you bad or even no results, so if the search can help by entering appropriate related terms to expand the search it would be great.

As with all the cases where the search tries to be "intelligent" (and hence less predictable), how useful this feature is depends on how good the system is at throwing in the right synoymns.

Our library databases and OPACs generally rely on authority records and thesauri and in cases of very technical areas when the right term , lingo or technical term can break or make the search, this can be useful. Google probably as a even more sophisticated system to find related words but either way, this can be decided by empirical tests on whether to turn this on by default.

Regardless of whether autostemming improves results on average, turning it on by default means losing more control. 

Typically Librarians would prefer for such features to be an option. Even more control can be had if searchers get to choose what synonyms & related words to add.

Something like OVIDSP's "Map Term to Subject Heading" or Ebscohost's "Suggest Subject Terms" is what would be ideal since one can decide what extra search terms to add, instead of immediately adding keywords without any control.

OVIDSP's "include related terms"  in basic search mode automatically dumps all related terms but at least the search isn't opaque and you can understand why a certain result is found.





4. Traditional Library catalogues and databases generally do exact searches ("hard implied AND) and will not drop search terms (except stopwords)
 
Google is generally very very unpredictable despite the impression given by http://www.googleguide.com/

For example, in a long search query, it may randomly drop a search term if the search term causes the results to drop drastically. In the UK Phil Bradley and Karen Blakeman has done quite a bit of experimentation on Google searches, in particular the comments at the end of this blog post is enlightening by a googler.

 "When you do a multi-term query on Google (even with quoted terms), the algorithm sometimes backs-off from hard ANDing all of the terms together. It’s a kind of “soft” backoff. Why? Because it’s clear that people will often write long queries (with anywhere from 5 to 10 terms) for which there are no results. Google will then selectively remove the terms that are the lowest frequency to give you some results (rather than none)."

That's just one oddity among others due to the various optimizations done by google, for now though databases and web scale discovery services are still not at this level of opacity and unpredictability.

The closest function to this I know of in library databases is ebscohost's SmartText Searching which allows you to enter a chunk of text (up to 5000 characters) and it will try to match the best article which may not have all the words.

Google also tends to "autocorrect" your searches and will automatically without prompting give you a search it thinks you want, though it does tell you about it and gives you a choice to change back. And this happens even if your inputted search has some results!





In terms of library related system analogues, I know of some library catalogues if it finds no results will automatically switch to ORing the terms and showing the result but that's the closest I can think of.

If relevancy ranking and implied AND were accepted without much dispute and auto-stemming caused some grumbling, unpredictable searches similar to doing "Soft AND" would be the end of the world. If Summon or library databases started to work like Google where you couldn't tell most of the time why a search result was retrieved, all control is thrown out of the window and the library apocalypse would be here.

Still I wonder while the typical librarian or serious searcher would care, as the comment in Karen's blog indicates 99% of people wouldn't care as long as the results were relevant. Again we are back to control/predictability vs quality of results.


5. Traditional Library catalogues & databases which used to be I&A do not have full-text

Similar to relevancy ranking, the fact that OPACs search terms just cover the bibliographic record and not the full-text is something that is very alien to today's searchers thanks to Google. The library database version of this is of course Indexing & Abstract databases, like Scopus and Web of Science, and I often field queries from users who are stunned to realize that Scopus does not have the full-text.

Somewhat related to this that confuses users is the idea of indexing and searching only certain fields ("Keyword search" in many OPACS/next generation OPACs/databases search a subset of available fields)  or worse do things like inverted author names.

The idea of full-text search reigns over all, so concepts like index browse, pre-coordination of subjects or controlled vocab are totally alien to users, even if explained to them, they will just probably think it's is a strange weird idea.

The fact that Web Scale Discovery services actually

a) allows full-text search for articles (some of them anyway)

b) allows full-text search for books (some of them anyway)

is in my opinion the biggest gain of Web Scale Discovery systems and perfectly aligns with what users are thinking of and are the main reason why they are a big hit.

I have often helped students who tell me they can't find anything in the catalogue. If it's a article they are looking for, it's fine to explain we don't have articles title in there though I still get weird looks sometimes but what if he is searching for a book?

It's perfectly possible of course they are searching on very very specific topics that aren't in books but often what the user actually wants is a textbook on a specific statistical technique say a specific type of linear regression.

A search in the library catalogue typically yields nothing and so I will ask them to do a google books search and lo and behold a ton of books appear, most of which we have in the library! I then have to explain to users that our catalogue doesnt have the full text (though occasionally we have table of contents) so even though the technique might be covered in a chapter or a few pages our library catalogue fails.

Web Scale Discovery systems obviously include ebooks full text if agreement exists but even in cases where all you have is a print version and not ebook, it can still help indicate a print book is relevant because it knows from the ebook the term you are looking for is in there. Essentially you get something similar to Google books.

I would argue many cases of people having problems with OPACs stem from the fact searchers or just used to searching full-text. In google, they can happily search for very specific terms and still gets hits but fail in OPACs since we just include limited book information.

While searching over full-text doesn't mean choosing the right search terms or not important, the fact you have a bigger set of text to match over makes it more likely to get something.

I also have seen many users struggling over "subject search" in OPACS, thinking it is searching over full-text. Of course, no-one but a cataloger would ever start from using library of congress subject headings or other controlled vocab (though one could use pearl growing techniques of course from a relevant item)

In general the ability to search full-text for books and break down silos across databases for articles is still quite a new experience to librarians so I am unsure how librarians are reacting to it, though I guess most librarian subscribe to the more is better?

Of course, searching over full-text means a lot more possible hits for broad general search terms which puts greater stress on the relevancy ranking algorithm. Thus far I have seen a request on the Summon List to put in an option to search just metadata and not full text similar to what ebsco discovery service has.

Another issue specific to Summon is while it tells you that a result is obtained due to a match in the full-text, you can't quite verify it easily since unlike Google books it doesn't have a "snipplet view" to see the words in context.



 Showing that a result is matched via full-text but not exactly what is matched


Even if one can agree what defaults are best, and that we should basically do defaults to simulate what users expect aka Google, what about the need to add options for more advanced/librarian-like features.

Here's a list of such features,

What's not in Google but in Library OPAC and databases

1. Boolean operators, proximity operators, truncation/wildcards

Google does have a lot of advanced operators including OR, the minus operator which is equalvant to NOT and also uses quotes for exact phrase search. Still that pales compared to a typical database.

For one thing they don't have a proximity operator though there is a undocumented AROUND function  and hacks to sort of achieve that using  asterisk  but neither takes into account term order though.

E.g Ebscohost has NEAR and WITHIN.

While Google uses the * option it is not a truncation feature like most library databases.

In fact most databases differentiate between wildcards typically ?  (replace one character or sometimes zero or one) and truncation typically * (replace unlimited number of characters or in some cases a fixed number).

That said, with tons of intelligent searching built-in including autostemming, does Google really need truncation or exact boolean operators?


2. Plenty of search modes and field searches

Google has one advanced search and it is pretty comprehensive in turns of search fields available. Including ability to narrow by language, region, last update, the site or domain it is on, file type etc.

Google "field searches" are limited to title of the page, URL of page, in links to the page (anchor text) and of course text of the page.

With library databases on the other we have tons of field searches. In business source premier on Ebscohost I see 19 fields you can search,  with Psychinfo on ovidSp I count over 70 fields searchable! With so many fields available to search, it is pretty common for many databases to offer multiple search fields connected with boolean operators pull down menus like the multi-field search in OVIDSP below.






 Some of 70+ fields in PsycInfo on OVIDSP in multifield search mode


Other databases with similar layout includes EBSCOhost, Scopus, Web of Science etc. This design encourages the use of nested boolean operators with each "nest" consisting of one concept combined with OR to pick up synonyms. 

To be fair Google mostly indexes webpages which has a lot less meta-data but there is some. I suspect though even if there was more metadata and fields Google probably won't trust it due to spamming (or nicer term SEO).

Of course we librarians love our advanced searches, and to "honour us" Google gave us this awesome advanced search on April 1 2012. Just kidding!




 3. Subject/Author etc browsing & Thearusi 

 Related to above many "serious" library databases have subject thearusi that allow you to browse and "Explode" concepts. Examples include pubmed, psychoinfo etc. Even the most humble OPAC allows you to browse by subjects. AFAIK Google doesn't have anything close.




4. Combination of search history or sets

Many databases like OVIDSP, Scopus, Web of Science, Ebscohost also keep track of your past searches and allow you to combine them using Boolean (what else)!


Conclusion

Going through the differences I can see that the major difference between Google and typical library search systems is one of control and predictability.  In most cases modern library systems and newer systems can in fact duplicate a lot of the functions turned on by default in Google, but the modern library systems generally require you explicitly turn on the option.

The typical library search systems also place a lot of focus on Boolean operators , leaving aside the functionality to combine search history sets, a pure library system that does not help users by stemming or adding other related terms will require Boolean operators to help get higher quality results.

Conversely, the "smarter" a system is at helping the user, the less he needs Boolean and/or truncation/proximity due to a combination of relevancy ranking, stemming and "soft AND" that knows to exclude search terms.

That said, Google's target audience is often people who just want to find a few relevant pages, while serious academic researchers want exhaustiveness and this requires high level of controls.

A big debate is now brewing over the ultimate goal of Web Scale Discovery systems. One School of thought is that such systems including Summon should aim to cater only to undergraduates and shouldn't aim to be more. David Pattern is just one of several of this view (see this blog post and comments)

If I understand correctly, this school of thought feels that serious researchers should still use normal library databases, and while Summon can still useful to them, it's just one option out of many.

They oppose any attempt to make Web Scale discovery tools closer to typical databases by adding more library type functions like ability to combine search history, giving more options to control search and more powerful advance searches (It must be noted though that while Summon is not much like a typical library database, some others like Ebsco Discovery Service are far closer.)

They fear the clean UI will be messed up and might even confuse users.

Another school of thought with includes Marshall Breeding feels that such services should evolve to support all users including advanced users. Given such systems will tend to be default searches the inability to support more than one class of users seems to be a waste...

EDIT  May 10 2012. As noted in the comment below and by others include Dave Pattern himself, the disagreement here between Marshall Breeding and Dave Pattern may be over-stated. Sincere apologies for misrepresenting the two.

Which side of the debate do you stand on?







Tuesday, May 1, 2012

Posting historical photos of your library - Facebook milestones, Dipity, Historypin and more

I must admit, local history is not much a specialty of mine but I happen to work for a University, whose history goes back a fair distance to 1905 and as a library unit we have collections that go back almost as far making us the oldest academic library in Singapore.

As such, we do have some old photo collections showing the past rich history of the library and when Facebook pages moved to timeline view and began encouraging organizations to post milestones, I started to think about whether we could put these old photos and how best to display them.

Many libraries including ours also have collections of photos , photo archives etc, is there a better way to expose some of them?

I currently have four ideas, Dipity, Facebook pages as milestones, Historypin and Singapore Memory Project (Singapore only) mobile apps.



Dipity

It occurred to me then that we already had posted our milestones using Dipity, a timeline based lifestreaming tool that I have mentioned several times in the past (as early as in 2009)  and much later I mentioned the dipity project of library milestones where the widget is actually placed on our official library homepage.






Facebook Milestones in Facebook Page


It was pretty straight forward to transfer or create Facebook milestones on our Facebook page.

As you enter more milestones, you can see the extended timeline on the right (see below).


Some further examples





Below shows what you see when you enter milestones.






It's unclear how popular or how often the milestones are viewed since Facebook insight doesn't seem to track this, but we did get fairly good comments.





Historypin 


The awesome Historypin launched in Nov 2011. Many other librarians have blogged about the potential of Historypin such as Justin Hoenke but perhaps this post entitled
DIY augmented reality…finally is perhaps the earliest and most complete post explaining features and what libraries can do.





Essentially you can pin photos , videos or audio recordings to a specific location (though photos are probably the majority posted), after which you can search for them in a map view which is basically similar to what you see in online maps with crowd-sourced photos contributed.

What is most interesting of course is that it comes with a Android and IoS app that you can use with Augmented reality to direct you to the closest photo or at a location see how a certain location or building looked like in the past, similar to apps like Layar.

Historypin is very full featured, one can set up channels (similar to YouTube channels) linked to your Google account as Google is a partner. One can setup channels as one of the following

  • Individuals
  • Libraries, Archives or Museums
  • Business or Company
  • Schools
  • Community groups

You can obviously upload manually (up to 5 mb at one shot) but there is also a bulk uploader .

The manual uploader is quite detailed. You can enter Title, Description, Tags etc.




Interestingly you can define the license for uploading. As shown above by default it is set to Copyright (c) all rights reserved but you can click on change to change it to a very comprehensive list of licenses. You can also add details including a link back to the original photo if you have it in a repository say.






As shown in the earlier picture, you can't post if you don't indicate a date or time. But for date you can select "I'm not sure". 

The partnership with google shows, with the question about putting it into Google Street View and searching for location via Google maps (see below).




Once it is posted, you can then view the photo you pinned on the map.





As creator you can edit or delete the photo. Others can Fave/Report or Dispute. The last leads to a online form to "Dispute" essentially when you disagree with any details such as date or location of the photo posted by you.

I can go on about creating collections of photos you faved or posted but perhaps the most interesting function is the ability to create tours.




Obviously this has great potential for University or Library Orientation tours as students walk around using their ioS and android devices to see how certain locations looked like in the past.

In effect this allows you to duplicate NCSU's well known Wolfwalk app

It is clear that Historypin is a really well thought our project, with many details and advanced features for those who need it such as detailed copyright options, link back to original photo and is clearly designed to court both individuals and organizations.

Hence it is no surprise that there is now a long list of libraries and museums on historypin including channels by  New York public library, Library of congress etc.

Mobile app


The mobile app is fairly capable, allowing you to browse collections, search the map and of course post a photo etc.


Browsing the collection.



The search is fairly capable, though unlike the desktop version which allows you to search by location and refine by date and keyword, the mobile app does only location refined by date.

Below shows the  the map




The arrow on the bottom left, brings you to your current location and the "Cam view", brings on the augmented reality view telling you where to turn to if you want to move towards the closest photos.






I was however most curious about uploading photos via the app, as already mentioned you need a Google account to login to. Most of it is standard except the date has a interesting default where it assumes you don't know the exact date.





As you can see above by default it just puts the year and then you select "Give or take x years", interesting.... As far as I can tell most of the other advanced options in the desktop version in particular copyright options cannot be set and once set can't be changed later?

Worse yet if you go to the next step for location, it automatically pinpoints your location which you can't change by searching unlike the desktop version. So Historypin cannot be used to upload photos unless you are at the very right spot it seems.


Singapore Memory Project App 

Despite the number of months since the launch of Historypin, I could not find that many photos of locations in Singapore which was to be expected.

Coincidentally in Nov 2011, news broke that the National Library Board in Singapore was launching the Singapore Memory Project . (Some clarification : my current place of work is the National University of Singapore which is a separate and distinct institution, though of course we do work together in the library community.)

The official site is at  http://www.singaporememory.sg/   It is a national project aiming at collecting Five million memories by 2015. It seems similar to Historypin in some respects but is of course restricted to Singapore only and it aims to collect memories, which might include not just photos and videos but also stories in text.

It seems to be a major undertaking with the project being mentioned by the Prime Minister at the National Day Rally Speech in 2011 and the Singapore Memory Project (SMP) team includes some well known names in Singapore libraries field including Gene Tan who is Director of the project and current Library Assocation of Singapore President (see his interview in TWIL last year) as well as Ivan Chew who is Assistant Director, better known to many as the Rambling librarian who has blogged about the project several times including this "behind the scenes look" .

I am just scratching the surface of what this national project is doing, and with all this firepower, it's not surprising I can find quite a bit of memories collected , at this time of writing there are close to 65,000 memories collected.

While Singapore Memory Project is superficially similar to Historypin, it's quite a different beast. It's many ways simpler than HistoryPin , which is fine given the objectives.

For example the memories submission form is simpler.

SMP submission format - desktop version

It seems to me that unlike Historypin, the form is designed for individuals only, while the project does collaborate with institutions it probably does it behind the scenes. This explains the relatively simpler form and the lack of a bulk uploader.

But then again, the Singapore Memory Project needs to handle only Singapore organizations which is a lot fewer than Historypin which needs to be versatile enough to handle organizations from around the world with different requirements hence the complexity of Historypin submissions.

Probably the biggest difference between the project and historypin is the difference in Mandatory fields. While Historypin insists on both date and location being mandatory, for the SMP only the date is mandatory.

While this speeds up the process and increases the chances someone will not give up midway when submitting, it also means not all memories have a location. I am unsure if this the reason was to encourage submissions, or perhaps it was felt that unlike photos or videos that have a specific location that can be pinpointed, memories can be a lot more broad ranging and cannot be pinpointed to a given location easily.

For example, if I posted a memory of going house to house visiting relatives during Chinese New Year in a text story, where should I locate the memory besides in the broad sense of Singapore?

Mobile app 

                                                     SMP startup screen - iOS

The mobile app currently available for iOS, starting screen makes a very strong pitch to submit memories, compared to HistoryPin. I would speculate the SMP app seems more designed for submission of memories than for browsing memores than Historypin for several reasons.

Firstly the search capabilites of the mobile app seem to be very weak.

While the desktop version does allow you to search memories by locations but the iOS app can only show memories that are nearby with no search facilities, you have to awkwardly manually move across the map to look at other locations.


SMP map view showing nearby memories -  iOS


Browsing is no better, you can see the most popular or most recent memories submitted but only limited to 10 or so.



SMP browsing - iOS


As it now stands, the Historypin mobile app appears to be more fully featured, as the Singapore Memory Project lacks augmented reality functions.

That coupled with the lack of a search function at all in SMP mobile app, seems to me makes the SMP app a app more for adding memories then looking for memories (except those nearby). Still, one wonders how often one uses the mobile app, whether HistoryPin or SMP's to search for content.

SMP also unlike Historypin lacks the concept of a "channel" for each person or the ability to curate one's own collection.

The desktop version lists clusters like "my school days", "Toa payoh" which I guess is managed by the staff around special themes. The iOS app, does not have access to this.

One thing the desktop version of Singapore Memory Project one-ups Historypin is that one can login using Facebook, Google, Yahoo, Windows Live or the National Library board account instead of just Google. Though the mobile app asks you to login using Facebook when posting memories.

The fact that it is neutral also plays to the advantage compared to Historypin as the app offers you the option of posting to Facebook whenever you post a memory.




SMP submission of memory - iOS




The interesting bit about posting memories via the app is unlike Historypin  , one can indeed change the geolocation of the memory! This is done by selecting the arrow next to the geolocation listed.
Below shows a search for a location to Geo tag. As mentioned before this can't be done when submitting in Historypin mobile app.


                                   SMP changing location of submitted memory - iOS

Given that both are using Google maps for geolocation, I suspect that Historypin's inability to change the location when posting in the app is a oversight rather than a technical flaw.


Historypin mobile app vs SMP mobile app

The ability to search locations in HistoryPin coupled with tour and augmented reality functions seem to imply that Historypin is designed as a discovery search tool. While the SMP mobile app does allow you to look at memories posted nearby, the lack of a search function, limited browse functions to 10 most popular and 10 recent submissions seems to make it less ideal for finding memories.

It seems to me given the more aggressive startup page appealing users to login and submit memories, coupled with the ability to post memories by changing geotagged locations (a strange lack in Historypin's mobile app) and the ability to share with Facebook your postings implies to me that the SMP app was designed more to encourage posting of memories than for browsing, searching for ones.

Of course, either app could and probably will evolve.

An idea I heard was that given that we are posting memories or old pictures, might be nice if the app could offer filters similar to instagram ......

Conclusion 

I just mentioned four possibilities to post pictures of old library buildings. Which one you choose will depend on rights management/licenses, and popularity of the platform. At this stage it's unclear how popular any one of them are, and I would argue none are really.





Wednesday, April 18, 2012

How a "Facebook for researchers" platform will disrupt almost everything

I recently attended a talk about Mendeley institutional version (powered by Swets) , I am fairly familiar with Mendeley , Zotero and other reference managers (though my main usage is with EndNote) but have not looked at the institutional version yet.

You can read about the exact features of the service   and also here but more importantly, during the talk while looking at the features I finally grasped how powerful and disruptive a real and dominant "Facebook for researchers" is going to be.

Of course, the road to such a goal has being strewn with many failures, including Elsevier's 2collab , Labmeeting etc (check a report in 2008 of such tools and check how many still stands) and attempts have being or could be made from social bookmarking/reference management angle (e.g citeulike/Connotea/Mendeley),  Discovery/Search angle (potentially webscale discovery/next generation catalogues with social features) or  even more directly straight forward Identity management (e.g. ResearcherID).

But no matter who wins how would a dominant "Facebook for researchers" platform affect academic research and hence academic libraries? What areas would they disrupt?

Note: I am going to mostly use the Mendeley Institutional edition as a stand-in for this dominant hypothetical  "facebook for researcher" platform. I actually haven't use the institutional service beyond looking at brochures. I am not saying that Mendeley will eventually succeed either.


Disrupt search including webscale discovery tools

There is a reason why Google is so worried about Facebook coming after them in search and desperately trying to force people into their own version of Facebook. Simply put the more the system knows about you the better recommendations you can get and potentially much better search results.

In the academic/research world, the advantages are perhaps less but still considerable.

Mendeley , Citeulike etc are already starting to show hints of this, when you search you can see how many people put a certain article in their reference libraries, that itself could be a strong signal of quality. Think of it like having articles ranking by Times cited, except you don't have to wait for a year or so for the paper to be cited. You don't necessary cited everything that is in your reference library of course but studies are starting show there is strong correlation between these two measures.

And that's just the beginning, one could imagine Mendeley or similar tools, allowing you to restrict searches to take into account only people in your institution, your specific groups, your friends etc, do collaborative filtering techniques for recommendations based on researcher profile characteristics (see Mendeley's version) and more - ie "researchers like you have read this"

Currently Mendeley claims to have 150 million unique items (Jan 2012) when you search Mendeley , "This makes it, according to Victor Henning, the company’s CEO and co-founder, the world’s largest research database."

Depending on how one defines research database, this is probably false. Web Scale Discovery systems like Serial Solutions Summon, OCLC's Worldcat local etc have more items. Currently Summon for example has 249 million items , Worldcat local has 663 million articles , totaling 943 million items etc.

Still, it's clear Mendeley is catching up, and I could be wrong but they probably have partnerships pulling in metadata with publishers , as I doubt crowd sourcing alone is unlikely to get so much so quickly? In fact, crowd sourcing would be a distinct advantage since one could find items like data sets, reports inside that would not typically be found in a traditionally discovery product.

Currently Mendeley gets you to full-text using OpenURL, very similar to Summon and provides an option to upload your library holdings. While I am not sure what you uploading your library holdings does currently , I would guess it wouldn't be impossible to use that to eventually allow "search within your subscriptions" options or at least use it to show the openurl button only when an item exists like in Google scholar now.

I suppose though it will never completely replace your ILS, as I doubt such a platform will want to take that function (though who knows?) but perhaps discovery layers that sit atop ILS might be disrupted.

Disrupt unique author indentifer rivals

I don't know much about this area but I know there is probably no leading contender in this area yet.

I know of attempts like ResearcherID by Thompson-Reuters, Elesvier has a Scopus Author ID and there's an attempt at a standard with ORCID

But just as Facebook Connect is pretty much making OpenID irrelevant , could a Facebook for researchers platform make efforts like ORCID irrelevant?

Mendeley provides a researcher profile and if it becomes as dominant for researchers as Facebook is for common networking it would be the one ID to rule them all.

Mendeley Institutional Version also claims to allow you to "track your members publications", "view the reach of your publications" etc


Provide better analytics

Imagine being able to see what papers , articles, or entries your researchers are downloading and putting in their libraries. You might think so what? We have usage stats downloads (Counter Stats or not) , so we already know what is used.

Not quite. What about items that are Open Access and researchers download directly say via Google scholar? What about items they find from searching Google etc that are not traditionally in databases you track? But perhaps you don't care about those. But what about items you don't own and they never get around to do document delivery because  they get it via other methods?




One can imagine the degree of tracking available with signed in users would be considerable and one could get in theory all sorts of user behavior during the research process.

One wonders with the collaboration with Swets whether this will eventually lead to linkage to backend systems but that's a long way off.



Replacing your library website

Everyone knows about the finding that practically zero percent of library users start from the library website and I have written wondering whether if this is the case how much effort we should spend on it, versus trying to reach users outside the portal , but assuming this "facebook for researchers" takesoff, it is likely going to be as sticky as the real facebook and the amount of time spent there while doing research is likely going to be very high.

Add the fact that it is like going to have a superior search experience (see above), it will become the first stop for research (perhaps even giving Google , Google Scholar a run for its money, the CEO already claims people are using Mendeley to search instead of Google Scholar), further displacing library portals.

No big deal right? Users weren't coming to us after all right? If given this is the case, should libraries try to put our offerings and services into this platform?

Mendeley Institutional version is starting towards that direction, with the ability to upload A-z list (to allow direct linking of eresources), "Have teachers set up course packs to direct students to important content" (presumably this is just a link to eresources the library subscribe to only not scanning of hardcopy material??) etc.

What else would a user really need from the library website if he can search for articles from the platform and get access to full text vis his library's subscription?

Not much really, perhaps he might want to find a way to contact librarians to ask questions on research or policy issues? Or perhaps the library would like to "push" important news and events to users? I suspect the latter is more of a want by the library than of researchers though :)


Targetted marketing 

So say you want to market something on this Facebook for researchers platform.

I suppose a liaison librarian could create research groups in mendeley and invite all researchers into the group to communicate with them (equalvant of Facebook pages/groups), or link up using the librarian's personal mendeley accounts (equalvant of friending people on facebook with your personal account), but are there other ways to reach them?

Well... if this was actually Facebook, you could buy an ad :)

In Why Google Is Terrified Of Facebook , there is a nice screenshot showing the amazing amounts of granular targetting one could do.

Check out the image here

Now imagine if libraries could do this. Target specific library news , events of interest to specific people in your university mostly likely to be interested instead of blindly mass emailing everyone in the university, or even in a department. Say you have a speaker on a exotic topic coming....and you could immediately target only researchers who might be interested based on their profiles or better yet based on papers they put in their library. So say it might notice you have plenty of papers by researcher X in your library....

Of course if one had really top notch liaisons who had their pulse on the research of every researcher, their interests one could sort of already do this, but realistically speaking for large universities that would be very hard. Imagine a system where you could automatically maximise the possibilities of reaching the ones most interested.

I envision a system where you would still push news on your normal broadcast channels like blogs , posting on your portal etc, but researchers specially targetted would see events streaming on their platform as they did their work.

Is this going too far I wonder? What about privacy? Are librarians too noble to use such marketing tactics? I don't know. I have heard of libraries experimenting with google adwords and facebook ads to target users, so this isn't quite unheard of...

Google adwords seem to work but facebook ads didn't "They found that Facebook advertising was not effective because that is not where students are spending their time when they’re in research mode" but you won't have that problem for this hypothetical "facebook for researchers" platform. 


Issues

Of course, I am just wildly speculating, there are many major differences between social networking/social media in general and using it for academic research and some of the network effects that work for Facebook will probably be a lot weaker in academic world.

For one thing, it's unclear if researchers want to share what is in their library either with each other or with the library. Aggregated stats is probably okay I suspect but that would reduce the effectiveness of some of the social signals.

It's still an open question if a network for researcher will work best by operating more like Twitter default open model with asymmetric links or like Facebook which is default closed with symmetric links or some combination.

Unlike facebook it is also unclear if there will be one dominant winner.  In social networking sites the network effect would lead to one solution winning out as you want to be on the same network where your friends are.

In the academic world, if most of your "friends" are in the same institution you would by definition be on the same supported platform. Or would the desire to collaborate across institutions online push towards one dominant solution?

Still even without one dominant platform used by all researchers, that platform would still have a lot of power over your institution users as all the eyeballs would be there.....


Libraries positioning

Let's say I am even half right and eventually such a platform will come to dominate the research world (or perhaps just locally on institutional level). What should libraries do?

Firstly one has to recognise that Mendeley and its cousins should not be treated just as another reference manager like Refworks or Endnote, they have far bigger ambitions. To just focus on how it performs as one versus your existing solution while important is not the only or even major point.

In fact their whitepaper makes it very clear of their ambitions , there is ample references to discovery, facebook likes and Twitter trending and pretty much makes similar arguments to this blog post then there is this passage...

"Many researchers have welcomed social media into their workflow, using Twitter, LinkedIn and Facebook to organise groups and share information. However these all-purpose platforms do not always have the unique functionality that researchers need, and involve them stepping out of their workflow to login, post a link or make a contribution"

followed by their determination to be in the digital workflow of all stages of research, pretty clear isn't it that Mendeley wants to be that platform....

Indeed Reference Managers are a very good base to build a crowd sourcing/social platform around because 1) there is value in using them even when used alone so early adopters still benefit 2) It does not require the researcher to do anything extra on top of what they do normally.

Mendeley's strategy seems to be to give away free to personal users to build enough brand awareness and now that there is a sufficient user base so Mendeley isn't a complete stranger to most librarians, they are going after libraries by institutions. This phase seem to have being announced by partnering Swets 

I was a bit puzzled at first by the tag line "Institutional edition powered by Swets", in what sense is it powered by Swets? Particularly when all? of the technology is Mendeley's?




But then i realized Swets was partnered more for the marketing and sales arms which has relationships with libraries that Mendeley lacks.

The fact that Mendeley made this move , is a compliment to the power and influence academic libraries have on users choice of reference managers. While many researchers will end up trying and learning on their own, sizable numbers will be taught by librarians in their honours year or post grad year and might end up using that tool for life , so it makes sense for Mendeley and swets to court libraries.

But I guess from the libraries point of view is, what is in it for us? Cynical as it seems, frankly speaking my opinion currently is that while some reference managers are better than others, the differences isn't really that great to be worth the switching costs.

*paranoid mode on*

If libraries start supporting one platform together we could potentially end up creating a powerful entity that would make the library even more invisible in the research workflow and would tip the balance of power away from us. Once they are dominant will they use their power against us?

* paranoid mode off*

I guess that's the same argument, some librarians make against being on facebook, the fear of giving them even more information and power, but to be fair librarians were hardly the ones who gave Facebook their power...

This is not so for citation managers. Hate to sound cynical but at this stage such services still need us more than we need them I think, so while our bargaining position is strong we should make a stand and not give the store away at least not without quid pro quo.

At the very least, switching will mean, in the end the librarians are the one who will bear the cost of training, handling difficult troubleshooting queries on cite while you write etc, so it's not a small thing.

But what we should ask for in exchange for support, I leave it as an exercise to the reader.


Looks like the parody below about Pubmed to the tune of the movie trailer for "The Social Network" could be redone for Mendeley :)



Notes


1. I am not the first to see how disruptive Mendeley can be disruptive


"Mendeley has often been mentioned as a potential industry disruptor. With its presence as a resource manager, database, search tool, social network and now, thanks to the partnership with Swets, its integration with library holdings and provision of usage analysis to libraries, it’s not hard to see why."

http://www.researchinformation.info/news/news_story.php?news_id=879


2. Again I reiterate while I use Mendeley as an example here, it could be a stand-in for any service that has similar ambitions to be a Facebook for researchers platform.  So Mendeley supporters please don't take it that I am targetting Mendeley.







Friday, April 13, 2012

Different ways of finding a known article - Which is best?

As a fresh graduate from library school with little practical experience, I used to think that known item searches ie finding an article or book when you already knew the title etc was relatively trivial and the difficulty was with the other type of searches subject/topical searches?

(BTW I am well aware that there is quite a bit of disagreement over what actually counts as a known item search  (more academic piece) but for simplicity , I am going to take known item here to mean finding an article if you know the article title at least and perhaps even the whole citation.)

But as time went by, after answering question after question on how to find if the library has a certain known article, I realized known item searches for articles while not as hard as subject searches, it is usually no piece of cake for users either.

It's not that users didn't ask about finding known titles of books, some do, particularly if they got the title wrong or in cases where they were looking for textbooks with common titles and dozens of editions like "Financial Accounting" (failure on identify in FRBR tasks).

Still in general they were dwarfed by users asking if a known article exists either because they were
  • following up from a reference in a paper (online or print) 
  • looking for a paper that cited the one they were reading
  • found it in a indexing and abstract database
  • or a professor/colleague/friend mentioned it. 

Why is it so difficult? Several reasons
  • Users are used to searching by keywords in article titles thanks to web search engines and complete article index that covers everything accessible don't exist
  • Difficulties in maintaining a clean knowledgebase/ source of journal titles means even if the user does a search by journal name he may still get misleading results
  • Increasing amount of Open Access or Free material not in the usual library silos/database/OPACS 

Today libraries support a bewildering list of options for finding known articles from searching
  • OPACS 
  • Next generation catalogues e.g Encore (that do not list article titles)
  • Journal A-Z Listings - e.g Serial Solutions E-Journal PortalEBSCO A-to-Z  
  • Article finder/citiation linkers (OpenURL) - e.g Webbridge/SFX/360link
  • Article index search engines like Google Scholar or the new Web scale/Unified discovery products like Summon or Ebsco Discovery Service or Pubget like services


Classic Library Catalogue - ASU Libraries

Serials Solutions EJ Portal - ASU Libraries

Citation Linker (SFX) - University of York

Summon - University of Queensland

Google Scholar - with Harvard 


Pubget


Which method is the best? In general they divide into 2 main classes, searching by source/journal title first, followed by article versus article title search directly in a search engine that indexes article titles.

Of the two methods, users instinctively do an article title search unless  first trained by a librarian. But we as librarians know that if we want to be sure if an article is available, a source title search approach by searching for journal title is the best method because search engines that index article titles don't cover everythng we own.

Warning : What follows is overly complicated, over-thinking that provides hold little value. Feel free to skip to the comments and post what method you use yourself or use to teach others when they ask you how to find a known article.

A Finding known item using Journal Title first

This is the method that was taught to me when I first joined my current library and the method I used when I was at library school. Still there are 2 options at least if one uses this method.

1. Searching using OPAC/ILS

At my current work place, the official method we teach involve searching by Source Title aka Journal Title in either our classic catalogue Innovative Interfaces' webpac pro or the next generation catalogue Encore.  Both given the same information, a title browse works better for webpac pro by cutting out possibilities compared to a default keyword search.

Then of course hopefully you do see the Journal title, make sure the online version has the right coverage, click on it and then hope that there is an online version with the right coverage or failing that a print copy. Assuming online copy, once you are in right the platform or database, you either browse by issue or just do an article search by article title. (I personally prefer the former which while slower is surer, as a search by article title might fail due to special characters like commas fouling the search, or copy and paste space characters causing problems).

Phew! I am so used to doing this, I can almost do this instinctively but in fact there are many pitfalls, some specific to our system some not. Below are some fairly common one to most systems.
Cons

First the obvious. For this method to get reasonable amounts of accuracy the library has to have a policy of catalogues all journal subscribed even those subscribed in a database or aggregator. While many libraries do, some don't and simply create a MARC record to the database. E.g There are libraries that have a library record for Business Source Premier, but don't catalogue separately (or upload journal titles to the OPAC) each journal in it.

Depending on the size of the collection this can be a huge undertaking. Assuming this is done, there are other issues to do with user error.


1) Difficulty getting the journal title to search for 

This could be due to the fact that the user only has an article title for whatever reason. Of course, one can usually find the source title by googling or using google scholar but for some titles particularly older ones it may actually fail to yield anything. This scenario typically happens when a student is told of some article title (which may be slightly off) mentioned by his supervisor. 

2) Abbreviations of journals

Some citations/references have very obscure abbreviations. That itself may not be a problem depending on the quality of the journal title cataloguing. For many institutions, the cataloguing of abbreviations may not be very good, in our case we tend to recommend users find the full journal name rather than try the abbreviation. Finding out the full name of a journal from an abbreviation may however not be simple matter sometimes.

3) Very generic Journal names 

Journal names like Nature, Science can often lead to dozens of records. Depending on whether the library practices single record approach (combining print and online journals into one record) or seperate records approach or hybrid this can lead to even more confusion and whether the user was smart enough to restrict to journals etc.

4) Inaccurate journal holding or coverage dates

Electronic resource management is a big bugbear for all libraries. This approach presumes that journal holdings are accurate and often it is not with wrong coverage dates . As we will see later some article title first search approaches might actually give access even if the holdings are wrong.

5) Many OPAC systems may not cover free/open access journals unless special pains is taken to upload this. A subset of #4

6) Time consuming

This is the biggest factor of all. While in theory this can be the most exhaustive method to confirm existence of an article in the subscription assuming no problems with #4 and #5 , it can be extremely time consuming.

You need to navigate two different systems, the OPAC first, and once you reach the ejournal/database platform you will need to hunt around for the right way to access the article by either searching or browsing.

Add to the fact that there are so many platforms out there that are constantly changing, even an experienced reference librarian  if sent to some unfamiliar interface may have to spend minutes looking for the right place to browse by issue, figure out where to click to download.

Pros

1) Covers print and online - unlike other approaches this method catches both print and online. If library practices single record approach for print and online this is even a bigger advantage since you can see everything in one view.


Classic Catalogue - Single record approach


2) Depending on workflow for Journals subscriptions, may be the most accurate method

This varies from library to library. For my place of work, definitely this is true. I have no idea if OPAC centric journal collection is still the rule for most, or do libraries focus more on their  Ejournal A-Z lists see below.


2. Searching using A-Z Ejournal lists
I am pretty new to this class of products, though I remember using them back in library school when I relied a lot on my library's  EBSCO A-to-Z  list. Currently I am playing with Serials Solutions A-Z Ejournal portal.


My understanding  is that such lists are generally meant for Ejournals (though it is not unknown for libraries to load up print holdings). Librarians manage holdings or lists of ejournals by selecting default packages or by selecting specific journal titles and if necessary customizing coverage dates. The main thing they don't do as compared to a ILS/OPAC is to load up or create MARC records.

In many ways finding a known item using this method is very similar to using the OPAC as it starts by searching for the journal title.

It has some advantages over searching OPAC in that

1) It covers only journals so you get less irrelevant results from books etc

To be fair, one could always restrict by default to journal collection in OPACs to get around this

2) It allows an easy way to browse by A-Z 

3) May or may not be more accurate journal holdings and coverage holdings, and probably has better journal information (e.g. title, alternative titles, issns, eissns) than inhouse cataloguing of Journals.

Many Journal A-Z lists are backed by strong authority records managed centrally. For example SerialsSolution's products are backed by KnowledgeWorks which is managed centrally, so once you indicate a journal or package is owned by your library, any changes needed to journal names, alternative names, issn, e-issn, merging/splitting etc of journals will all be managed automatically centrally by SerialsSolutions.

With economies of scale that come from mistakes found being corrected for everyone this can lead to a far more accurate journal search by title or issn then any one library can manage.  This makes searches by journal abbreviations etc more likely to work.

While SerialsSolutions can handle authorithy control of journals centrally, one thing they cannot handle is holdings. That is something they cannot do for you and the onus is on you to update yourself when your subscriptions and packages change (particularly if you use lots of customized packages).

Depending on your library workflow, the Journal A-Z listings may have more or less accurate data than the OPAC, depending on where your priotizes lie and where the source comes from. 

Some libraries push data from the Journal A-Z listings to the OPACs, some do the reverse, and yet others keep and update two independent systems.

I always wondered if one could also, maintain ejournal holdings in the A-Z listing like Serials Solutions 360Core, maintain print only MARC records of journals in the OPAC, then combine both in a web scale discovery product like Summon.

It gets even more complicated if you use SFX link resolver with Summon so you need to maintain two knowledgebases on top of the OPAC?  

The disadvantages of A-Z listings are often similiar to using OPAC to search/browse by journal title

including

1) Time consuming and unintutive

2) usually does not include print journals


B. Finding known item using Article Title

The main problem with searching by journal title is that's it's so indirect and extremely slow. In essence one must do the FRBR's 4 user tasks of Find-Select-Identify-obtain  almost TWICE, once for the journal title , then again for the article title, which explains why it is slow. 

What if we could just enter the article title, click on the result, authenticate  and get access?  

If I was writing this prior to 2007, I would probably talk about how one can use federated search for this. But in fact if I wouldn't bother back then as in fact federated search would actually be a non-starter since most library federated search systems did not provide enough coverage to make this method worth trying and would be too slow anyway.

Of course now we have Google Scholar and Web scale discovery products like Summon that cover typically 90% of most academic libraries collections in a unified article index so it's worth a shot to see if it might be worth doing an article title search.


3. Searching by article title using Web scale discovery products

I am most familiar with Summon but Ebsco Discovery Service and others are pretty much similar. You enter the article title. With any luck you see the article you want. You click on it, and it brings you to the full-text via OpenURL linking or direct linking (via some sort of agreement with the provider).



Ebsco Discovery Service, Nanyang Technological University Library


Pros

1) Fast quick and efficient - If it works it gives you the experience akin to google, though you may have to go pass a link resolver page and of course authentication. 

2) No need to figure out journal title name, abbreviations etc


Cons

1) Problems with known item searches - discovery products at least currently struggle with known item searches. Often it is not that the article title is not in the discovery index but it isn't surfaced simply because it is buried on 2nd or later page! This might be improving but could be still problematic for very article titles with very generic or common words.

2) Inaccurate holdings - This is similar to the problem in the A-Z listings. In the case of Summon it is drawing from the same holdings that populate the E-Journal A-Z listings. So the same problems apply here depending on the workflow the holdings here might be less accurate than the OPACs/ILS

3) Article index does not cover article - Even if inaccurate holdings is not an issue, searching by article title in Summon and its cousins often fails. This is because, Summon does not yet have the article metadata (much less full text indexed) so searching by article name fails, where searching the ejournal A-Z listing by journal title first succeeds.

While most discovery services boast over 90% coverage of typical collections this may vary from subject area to subject area. For example for Summon it's almost certainly weaker in chinese and law then in science areas, so if you tried searching for law articles in Summon you would get far below 90%

4) Access to full-text is sometimes not stable (varying problems from wrong metadata from source, knowledge base of resolver is wrong, provider target URL translation error etc )  - Even if the article is correctly listed in Summon, clicking on the full-text might fail as typically OpenURL is used to access , which is usally less stable then a direct link to journal title in OPACs or Journal A-Z lists.

In particular, in some cases the target does not allow OpenURL linking to the article level and drops the user at the journal level, which of course is almost the same as first searching by journal title!

5) Similar to other approaches, article level searches  work only for online articles but again it is possible to upload your print collection as does University of Huddersfield.


University of Huddersfield A-Z EJ Portal showing print holdings



4. Searching by article title using Google Scholar  




In many ways, Summon and similar were designed to compete with Google Scholar and hence both are very similar. Fast, quick with article level searching features.

In fact, some libraries have evaluated Web scale discovery products and opted to go for Google Scholar due to costs/

How then does one get to the full-text via Google Scholar? Typically the library opts into the Google Scholar Library Links program , this uses the library's OpenURL resolver but also requires that the library provide holdings to Google Scholar so the search results page in Google Scholar is "smart" and shows the  OpenURL link only if necessary.

A lesser option but still fairly popular is using proxy bookmarklet.

Using either method to access full-text leads to the same pros as Discovery products including

1) Fast, quick and efficient

2) No need to figure out journal title name, abbreviations etc

Google scholar also handles Open access and free stuff very well as a bonus. 


The disadvantages are similar as well

1) May not be in the Google Scholar index (it's notorious that nobody knows what is inside)

2) Inaccurate holdings given to Google Scholar

3) Access to OpenURL maybe unstable etc


To complicate matters one can bypass the  OpenURL /Google library links programme by using a proxy bookmarklet.

That can sometimes bypass inaccurate holdings and inability due to  OpenURL  since it blindly applies the Ezproxy stem to see if access is available.

So even if the journal holding coverage is wrong in either the OPAC/A-Z Journal listing etc, it doesn't matter, you will be brought to the article page via Google Scholar and the proxy applied might work.

In our institution this is hugely popular method. Needless to say this method can fail, because without  OpenURL  to solve the appropriate copy problem, Google Scholar's first choice to send you to get the full text might be wrong.

So you may have access via subscription agent like Swetwise or aggregator like Ebscohost but Google Scholar's would not know and send you direct to somewhere typically the publisher's version ("Publisher's full-text, if indexed, is the primary version") where you have no access.

However in our institution this is not so common for most science and social science users they are perfectly happy with the proxy bookmarklet method since we usually do buy direct if we have access. And I estimate this method working correctly in ideal conditions better than 8 out of 10. 

Add the fact our current OpenURL implemention is quite new , relying on the proxy bookmarklet seems to be the best balance of speed and accuracy. More about this later.


5. Pubget - Even faster?


You might think the article title first approach is is the fastest possible way to get a known article in terms of number of clicks, but you can in fact speed this up.

To repeat searching using Google Scholar or Web Scale Discovery involves

1. Typing article title in search box
2. Scan results list (hopefully only one) and clicking on result
3. Authenticate here or after step 4
4. Scan Link resolver page for results and click on appropriate result that brings you to the article page
5. Scan article page and click on download pdf link or button


If you are using Google Scholar with a link resolver or a WebScale Discovery product like Summon you will usually see the link resolver page (#4). But is that screen really necessary?

Of course, Google Scholar + proxy bookmarklet avoids #4 but that has drawbacks already stated since it doesn't take into account the library's collection.

The link resolver page can be bypassed if you turn on one-click functionality in SerialsSolution 360link (OpenURL resolver) so it always sends you to the first option available if multiple options  are available if your library happens to have the article in multiple places.




One click option from 360 link bypasses link resolver screen


One-click option is nice, but OpenURL linking is well known to be unstable sometimes, so SerialsSolutions tries to handle this problem with a "helper window", actually a Iframe so users can go to the link resolver screen if the direct link fails (See above).

Besides this option, depending on the Discovery platform you use there may be "direct linking" options that don't rely on OpenURL at all.  Both ways you don't see the additional OpenURL screen.

In fact a study has shown that 23% of students tested actually got stuck at the link resolver screen! So perhaps it would be good to bypass that screen if possible. So let say you do that.

Still is the following the fastest (in term of clicks)?


1. Typing article title in search box
2. Scan results list (hopefully only one) and clicking on result
3, Authenticate if necessary and brings you to article page (with helper window for one-click OpenURL)
4. Scan article page and click on download pdf link or button


Surprisingly no.

You can actually one up this by skipping #4 and offering downloading of the PDF from the search engine page . This is in fact the selling point of Pubget.

As shown below, you search for the article in  Pubget  and you don't even need to go to the ejournal page, there is a "Find PDF" button that will automatically get you the pdf so you don't even see the original Ejournal article page.





I am not familiar with the inner-workings of  but I assume it's somewhat like a openurl resolver except it "knows" for a certain journal/platform the correct way to accesss the pdf direct with a contructed url so you don't even need to land on the ejournal page.

You might think this isn't a huge improvement to bringing you to the article page and then clicking download, but many users just want the pdf to download, they don't care to go to the ejournal page to struggle with the diverse and varying user interface to hunt for the link to the pdf etc. 



Using article finder/citiation linkers (OpenURL) - e.g Webbridge/SFX/360link







I was of two minds about placing this under either A. Using Journal title or B. Using Article title.

But since this method usually requires at least a journal title or ISSN, it should be the former , in any case I left it for last.

This is actually using OpenURL with the user manually providing the data needed in a form. So there is no article level index, and theoretically this should outperform article level indexes even if both are using the same journal holdings or knowledgebase.

The html form typically called article finder/citation linker can actually lead the user to the full text via  OpenURL , while searching by article in Summon fails because the article may not be indexed in Summon, but the  OpenURL can still lead the user there.

This took me a while to grasp, particularly since you could enter article title in citaion linker as well, confused me.

But basically article finder/citation linker is not relying on article index. It requires on 360core which is a journal level index. Given the journal and certain other metadata such as author, issue , date, starting page, the OpenURL  can "guess" the correct url to construct that will get the user to the full-text.

The  OpenURL  resolver does not need to know if the article actually exists, but knows that for that platform, if the article with such and such charactersitcs did exist, the url would be such.

In comparison Summon etc needs to have the article title indexed before it can be found.

If this methods works flawlessly, the user enters sufficient metadata, the  OpenURL brings him directly to the article (assuming one-click is on) or shows the OpenURL resolver screen for options.

The article finder has many weaknesses of course including

1) The citation needs to be pretty complete and accurate for direct linking to article. Lacking some information means you can be dropped down only to journal level which can be frustrating.

2) For users it's often unclear how much or how little citation data to give. 

3) It's strictly OpenURL based and hence subject to all the problems of  OpenURL already mentioned

4) Time confusing since you need the whole citation to maximize chances 



Bonus method - DOI resolution


Articles have unique indentifers like DOI, PMIDs that can bring you to an article. The main problem with using that is, not all articles have this! Another problem, failure to cope with the appropriate copy problem unless paired with a OpenURL solution.



What is best method?

If one were looking for pure accuracy and cannot tolerate false negatives (i.e missing full text for a journal title when there is one), what is the best method?

If the same knowledge base with holdings are used accuracy in descending order (online only) seems to be

1. Journal A-Z list/ OPAC - Browse by Journal
2. Citation Linker
3. Discovery service eg. Summon

#2 may fails to get articles by manually browsing #1 because of problems with OpenURL linking. While #3 even if informed by the same knowledge base and journal holdings is subject to the article being indexed on top of issues related to linking via OpenURL (usually).

For my library the most accurate method typically involves the following algorithm 

1. Search journal title in OPAC (Our OPAC is loaded with the most accurate journal holdings)

2. Only if print and online are not available there, will try searching Google Scholar, for free & even as a lark might even try applying proxy bookmarklet to online copies just in case it works. Sometimes an article might just happen to be free for a short period.

This method is often very cumbersome and perhaps only library staff engaged in checking document delivery requests would do and typically we just tell users to do the first.

But is that necessarily the most efficient way? Let's make it simple and assume we are looking only for online articles (a very common scenario for users who want quick access only). Let's also assume that the OPAC holdings are 100% correct and if you can't find it using that, it isn't available.

So which is on average faster?

A) Search Google Scholar by article title and use proxy to access, if that fails, re-research by journal title in OPAC

B) Search by journal title in OPAC

Say you knew for example that searching Google Scholar first, then applying the proxy worked 90% of the time to find the full text. This would take on average 30 seconds.

While in 10% of the cases you would fail to find full text and need to search the catalogue by journal title to confirm if it exists. Say that takes 10 mins on average.

Simple maths should show it is more efficient on average to search using Google Scholar first then re-search in OPAC only if it fails, than to always try searching OPAC.

Mean time method A using Google Scholar first + re-search if necessary = (0.9*30 + 0.1*(30+600)) = 90s

Mean time method 2 using library catalogue first and only  = (1*600) = 600s

Of course, I just pulled all the numbers from the air. The average time taken for each method could be gotten by time studies which isn't particularly hard I think.

In any case I would estimate from fastest to slowest to complete/terminate one search (terminate could mean successfully find the article or failure)

1. Pubget

2. Summon + OpenUrl (one click enabled or direct linking) or Google Scholar + OpenUrl (one click enabled), Google Scholar + proxy bookmarklet

3. Summon + OpenUrl (normal) or Google Scholar + OpenUrl (normal)

4. Journal title search using Journal A-Z

5. Journal title search using OPAC

But let's get back to the example where we compare searching Google Scholar first and a re-search in OPAC with journal title vs searching OPAC with journal title only.

It's fairly easy to estimate average time for each step, somewhat harder to estimate would be the probability of getting a hit so you stop the first time. This would be a function of a) How big the article index is (for cases where you search for article title in Summon or Google Scholar) and b) How large your subscription/collection is (for article title first approaches and source title first approaches).

A) is intuitively true, the larger the article index you searching for, the more likely the step will terminate with success. B) is true as well if you think about it. The larger your subscription/collection the more likely you won't have to do a re-search.

E.g If there are only 1,000 articles in the universe and Library A *really subscribes* to 990, vs Library B that *really subscribes* to 10. No matter what method you use, the search will tend to terminate the first time with less re-searches for Library A. The latter library B will almost always have to re-search and still fail anyway.


Better methods?

Perhaps a hybrid method might work? 

This article  suggests enhancing citation indexes with article level indexess using Ajax. So if you enter article title in article finder and it matches.. the system would match without you even clicking the search button.

Think Google's auto-complete/auto-suggest as you type in the article title it would suggest closest matches drawn from it's known article index.  Of course you could do this to assist in entering other fields for say Author, Journal name in the Citation Linker etc.

Talking about Google, how about a Google instant version? Yes, I know technically could be tough to display the full article due to authentication, though I wonder if it can just show the brief record with metadata.

Or how about a discovery search that could identify known item searches and if it fails to match will automatically suggest a journal name browse method. 

Conclusion

I am not sure if anyone made it all the way to the bottom given this overly complicated post. But if you did, I thank you.

I am not a system or even eservices librarian, so my understanding of systems relating to journals might be off-base, if so do let me know, I am always trying to learn. 






























Related Posts Plugin for WordPress, Blogger...