Wikipedia references for LibraryThing works
KeskusteluNew features
Liity LibraryThingin jäseneksi, niin voit kirjoittaa viestin.
1timspalding
I've introduced a new feature, showing all the Wikipedia pages that contain citations to LibraryThing works.
Basically, I parsed Wikipedia looking for books, and then connected them to LibraryThing works. I'd done this once before, back in 2007 (see http://www.librarything.com/blog/2007/02/wikipedia-citatons-with-feed.php). Back then it covered some 90,136 pages. Now it covers some 251,911. It's a little tough to calculate the works covered, but I believe it blew up similarly. It covers some 227,852 works—a very high percentage of works that aren't singletons. All told, there were 540,000 citations found.
Some examples:
*D-Day: June 6, 1944 http://www.librarything.com/work/40642#references
*An Inquiry into the Nature and Causes of the Wealth of Nations http://www.librarything.com/work/12307#references
*The Ship of the Line http://www.librarything.com/work/963894#references
Questions:
1. So, what you think is the winner—the work cited on the most pages? It's no longer a Pokemon. I've given you number 6 (The Ship of the Line). When you hear it, you'll hit your head in recognition.
2. Where should this feature go? It's related to Get this Book—it's a link. But it's different.
3. Can you see some other cool things we could do?
Basically, I parsed Wikipedia looking for books, and then connected them to LibraryThing works. I'd done this once before, back in 2007 (see http://www.librarything.com/blog/2007/02/wikipedia-citatons-with-feed.php). Back then it covered some 90,136 pages. Now it covers some 251,911. It's a little tough to calculate the works covered, but I believe it blew up similarly. It covers some 227,852 works—a very high percentage of works that aren't singletons. All told, there were 540,000 citations found.
Some examples:
*D-Day: June 6, 1944 http://www.librarything.com/work/40642#references
*An Inquiry into the Nature and Causes of the Wealth of Nations http://www.librarything.com/work/12307#references
*The Ship of the Line http://www.librarything.com/work/963894#references
Questions:
1. So, what you think is the winner—the work cited on the most pages? It's no longer a Pokemon. I've given you number 6 (The Ship of the Line). When you hear it, you'll hit your head in recognition.
2. Where should this feature go? It's related to Get this Book—it's a link. But it's different.
3. Can you see some other cool things we could do?
3timspalding
>2 jmnlman:
Contemplated.
Also, it's fairly good, if boring, recommendation engine—what books are cited by pages that cite a given book.
Contemplated.
Also, it's fairly good, if boring, recommendation engine—what books are cited by pages that cite a given book.
4asalamon
Great feature.
1. Have you parsed only English Wikipedia?
2. Is it a one-time parsing, or some kind of automatism will keep it up-to-date?
Thx,
Sala
1. Have you parsed only English Wikipedia?
2. Is it a one-time parsing, or some kind of automatism will keep it up-to-date?
Thx,
Sala
5generalising
Awesome!
It would be interesting to know if you could use a similar parser to generate a link to the article *about* the work, rather than the ones referencing it - it's good to be able to go from an author page to the Wikipedia article, and a link to the article on the book would be equally interesting.
It'd be a lot more work, though; you'd need to find some way of identifying an ISBN or OCLC (etc) code in the article in such a way that it was unambiguously not a reference to a different work. I'll have a think about this.
Oh, and one small bug report: it garbles links with a slash in them, so the link to Wikipedia:Reliable sources/rewrite on the Wealth of Nations page is broken, at least for me in Firefox 3.0.
It would be interesting to know if you could use a similar parser to generate a link to the article *about* the work, rather than the ones referencing it - it's good to be able to go from an author page to the Wikipedia article, and a link to the article on the book would be equally interesting.
It'd be a lot more work, though; you'd need to find some way of identifying an ISBN or OCLC (etc) code in the article in such a way that it was unambiguously not a reference to a different work. I'll have a think about this.
Oh, and one small bug report: it garbles links with a slash in them, so the link to Wikipedia:Reliable sources/rewrite on the Wealth of Nations page is broken, at least for me in Firefox 3.0.
6Edward
I love this feature. Thank you for bringing it back!
With regard to where it should go, I think of it as something like Recommendations: "If you like this, here's some more stuff you can look at." But it's probably not an important enough feature to go above Reviews or Common Knowledge by default.
With regard to where it should go, I think of it as something like Recommendations: "If you like this, here's some more stuff you can look at." But it's probably not an important enough feature to go above Reviews or Common Knowledge by default.
7TheoClarke
Surely the most cited work on Wikipedia is the 1911 edition of Encyclopaedia Britannica? Or at least, it should be, given that it was imported wholesale.
This is a great feature. Thank you.
This is a great feature. Thank you.
8legallypuzzled
1) I'm going to guess the Bible.
2) I agree with Edward; the link should probably go below "Editions" (actually, I could see a separate grouping of "Common Knowledge" and "Your Books on Wikipedia" as user-contributed knowledge).
2) I agree with Edward; the link should probably go below "Editions" (actually, I could see a separate grouping of "Common Knowledge" and "Your Books on Wikipedia" as user-contributed knowledge).
9prosfilaes
The number one work is not the Bible, because it's not one work in LibraryThing, and most citations in Wikipedia aren't of the same form as book citations.
As for non-English Wikipedias, there's about 30 large ones, which could get pretty overwhelming. On the other hand, a lot of non-English books would get links if the Wikis in their languages were searched.
As for non-English Wikipedias, there's about 30 large ones, which could get pretty overwhelming. On the other hand, a lot of non-English books would get links if the Wikis in their languages were searched.
10bugaderes39
What about Wikipedia in other languages?
11jjwilson61
It looks like most of those Ship of the Line pages would be removed as not notable enough if someone were to report them. If that were done then LT would have dead links, so I hope the harvesting would be done on a periodic basis.
12lquilter
Well, I love this; and I love this kind of feature -- more intertextual links! LT could *so* become an awesome research resource of its own.
14timspalding
>13 jbd1:
Nope. Another reference work, but one that's frequently updated.
Well, I love this; and I love this kind of feature -- more intertextual links! LT could *so* become an awesome research resource of its own.
Keep going. Someone's going to hit on the idea I've wanted to add for years.
Nope. Another reference work, but one that's frequently updated.
Well, I love this; and I love this kind of feature -- more intertextual links! LT could *so* become an awesome research resource of its own.
Keep going. Someone's going to hit on the idea I've wanted to add for years.
15WalkerMedia
>14 timspalding:
Well, we can't add works cited by another work until we've added works within a work. :) (Tonight, Brain, we're going to take over the world...)
Well, we can't add works cited by another work until we've added works within a work. :) (Tonight, Brain, we're going to take over the world...)
17lquilter
#1 > Q1: I was going to guess something like Joseph Campbell, The Hero with a Thousand Faces -- but that only shows 5 hits in Wikipedia. So clearly that's not it but I'm going to go ahead and mention it because surely the search algorithm is missing a lot of cites? Is it for instance picking up things listed in "reference" tags?
#14 > As for something you might have wanted to add for years -- the thing that leaps to my mind as the most useful in terms of research would be interlinking books with references. Book A references Book B on page 251; Book B references Book C on page 11 and 49; etc. Tied into some kind of scholarly literature database (or maybe several) and it would be *incredibly* useful as a research tool. Tie into a personal fields -- "where did you find out about this book" (Book A, recommendation by UserName, LT discussion group Y thread Z, etc.) -- and it would be incredibly useful as a personal cataloging tool, and probably as some kind of meta-research tool, too. .... So what is it? Don't leave us hanging!
#14 > As for something you might have wanted to add for years -- the thing that leaps to my mind as the most useful in terms of research would be interlinking books with references. Book A references Book B on page 251; Book B references Book C on page 11 and 49; etc. Tied into some kind of scholarly literature database (or maybe several) and it would be *incredibly* useful as a research tool. Tie into a personal fields -- "where did you find out about this book" (Book A, recommendation by UserName, LT discussion group Y thread Z, etc.) -- and it would be incredibly useful as a personal cataloging tool, and probably as some kind of meta-research tool, too. .... So what is it? Don't leave us hanging!
18nperrin
Keep going. Someone's going to hit on the idea I've wanted to add for years.
Hmm, is it the one I've wanted for years--links to books about the book? Especially, links to criticism of the book?
ETA: What lquilter said too.
Hmm, is it the one I've wanted for years--links to books about the book? Especially, links to criticism of the book?
ETA: What lquilter said too.
20_Zoe_
As for something you might have wanted to add for years -- the thing that leaps to my mind as the most useful in terms of research would be interlinking books with references. Book A references Book B on page 251; Book B references Book C on page 11 and 49; etc. Tied into some kind of scholarly literature database (or maybe several) and it would be *incredibly* useful as a research tool. Tie into a personal fields -- "where did you find out about this book" (Book A, recommendation by UserName, LT discussion group Y thread Z, etc.) -- and it would be incredibly useful as a personal cataloging tool, and probably as some kind of meta-research tool, too.
Yes.
I hope we can start with a basic version of this: Book A references Book B. Pages mean proper editions and probably another year of development before we could see anything.
I'm actually more excited about the thought of the personal fields than the scholarly-reference fields, though. It has the potential to make the site so much more interconnected and fun to browse.
Yes.
I hope we can start with a basic version of this: Book A references Book B. Pages mean proper editions and probably another year of development before we could see anything.
I'm actually more excited about the thought of the personal fields than the scholarly-reference fields, though. It has the potential to make the site so much more interconnected and fun to browse.
21235711
17, 20: Amazon used to do that, but they don't seem to anymore. It would be interesting to know the reason.
(Reading post #20 brought back to me how I first found a certain book via another book through Amazon's referencing system. I must have been in the dark about it for years.)
(Reading post #20 brought back to me how I first found a certain book via another book through Amazon's referencing system. I must have been in the dark about it for years.)
22jlelliott
Very nice, I've always thought that a work level section for external links would be a great addition. Is there any chance that it would ever be possible for users to add more relevant links (like on the author pages)?
23lquilter
Most popular cited reference:
It doesn't seem to be things like Will & Ariel Durant, or Edith Hamilton, or OED (not that I can easily tell which OED "work" would be the best to check), or Encyclopaedia Britannica, or Aristotle, etc., etc. Based on the example that Tim gave us that had so very many cites, it was an encyclopedia of ships that would appeal to some history buff. So I'm going to go with some encyclopedic reference about a super-popular topic. Playboy models? Pokémons?
It doesn't seem to be things like Will & Ariel Durant, or Edith Hamilton, or OED (not that I can easily tell which OED "work" would be the best to check), or Encyclopaedia Britannica, or Aristotle, etc., etc. Based on the example that Tim gave us that had so very many cites, it was an encyclopedia of ships that would appeal to some history buff. So I'm going to go with some encyclopedic reference about a super-popular topic. Playboy models? Pokémons?
25VisibleGhost
BibliographyThing? A compendium of bibliographies?
26235711
23: (not that I can easily tell which OED "work" would be the best to check)
Me neither. It may have a shot if you ignore LT works. The "what links here" page on Wikipedia itself is huge, but many of the references are talk pages, so it's difficult to get a good idea of how many actual wiki articles there are. (And my earlier guess was an answer to the wrong question.)
24: No, that one only has 11. The number is on the work page.
Me neither. It may have a shot if you ignore LT works. The "what links here" page on Wikipedia itself is huge, but many of the references are talk pages, so it's difficult to get a good idea of how many actual wiki articles there are. (And my earlier guess was an answer to the wrong question.)
24: No, that one only has 11. The number is on the work page.
27timspalding
I think I'll eat a hotdog now. Maybe two.
29jjwilson61
Not even close. Ship of the Line, which Tim said is #6 has 764 and The Baseball Encyclopedia only has 81. Just look at the work page, it will tell you the number.
30MMcM
There are around 2K references to the Guinness Book of Records in Wikipedia, but it's a mess on the LT end because of multiple works (as was pointed out for other titles above).
31timspalding
Ding ding ding!
33235711
*groans*
But how appropriate that the Guinness Book of Records should be the one holding the record. Does it say so anywhere in the book itself?
But how appropriate that the Guinness Book of Records should be the one holding the record. Does it say so anywhere in the book itself?
34TheoClarke
Doh!
35timspalding
"There are around 2K references to the Guinness Book of Records in Wikipedia, but it's a mess on the LT end because of multiple works (as was pointed out for other titles above)."
How is this known? I'm starting to worry about my data not finding everything.
How is this known? I'm starting to worry about my data not finding everything.
36timspalding
I blogged it, but then moved the post behind SantaThing, because I'm questioning whether the process worked correctly.
See http://www.librarything.com/blog/2009/11/books-of-wikipedia.php
See http://www.librarything.com/blog/2009/11/books-of-wikipedia.php
37lquilter
You know what it's not finding, Tim? It's not picking up things like bibliographies on the author's pages. Probably because the author's name is not side-by-side with the titles. See, e.g., Octavia E. Butler and Mind of My Mind and Wild Seed. In neither page is the author's page picked up.
This is probably (certainly?) not all the missing mass but it's one source, anyway.
This is probably (certainly?) not all the missing mass but it's one source, anyway.
38aethercowboy
I'm sure it's been said before, Tim, but you == theMan always returns true.
To answer 2: Links to Wikisource! or Wikibooks.
To answer 2: Links to Wikisource! or Wikibooks.
39PeterCapek
This seems like an interesting exercise, but I wonder if it doesn't say more about the interests of Wikipedia contributors than about anything else.
41skgoetz
Is this feature meant to pick up books that have their own Wikipedia page, or have I misunderstood? (If it is thus meant, I've just noticed that http://en.wikipedia.org/wiki/Autobiography_of_Mark_Twain is not picked up by http://www.librarything.com/work/109448/commonknowledge/72414772, but I don't see a way to add it.)
42jjwilson61
At one point a few years ago Tim ran a process that found Wikipedia pages that corresponded to LT works and added links for them. As far as I know that process hasn't been run since and there has never been a way to manually add links to Wikipedia so any Wikipedia pages added in the last few years won't be linked to.
43lampbane
>>41 skgoetz:
>>42 jjwilson61:
It picks up books that have been cited on Wikipedia by ISBN or OCLC. To add the book here, you have to make sure the book is cited on Wikipedia with one of those two numbers. I've seen more recent citations appear here on LT, so the process does get run more occasionally than this thread would indicate.
>>42 jjwilson61:
It picks up books that have been cited on Wikipedia by ISBN or OCLC. To add the book here, you have to make sure the book is cited on Wikipedia with one of those two numbers. I've seen more recent citations appear here on LT, so the process does get run more occasionally than this thread would indicate.
45timspalding
Yes, I was thinking I'd run it again. Not top priority, but potentially fun.
46brightcopy
45> Is it something that requires a lot of care and feeding? If not, would be cool if there was a "recalculate wikipedia reference" for books that don't have one. I'm assuming there must be some reason why a feature allowing users to add/edit wikipedia references isn't desirable.
47fugitive
Tim, This sounds like something you might want to give the community an advance "heads up" so that those of us who are wikipedians have a little time to check if our favorite books have the necessary ISBN and/or OCLC numbers in the wikipedia article and update them so they would be caught when you run your bot/script.
48timspalding
45> Is it something that requires a lot of care and feeding? If not, would be cool if there was a "recalculate wikipedia reference" for books that don't have one. I'm assuming there must be some reason why a feature allowing users to add/edit wikipedia references isn't desirable.
No, it's just a matter of getting and downloading the latest Wikipedia dump. They're ENORMOUS so I always have to check where I'm going to put it.
No, it's just a matter of getting and downloading the latest Wikipedia dump. They're ENORMOUS so I always have to check where I'm going to put it.
49Cynfelyn
As a newby still finding my way around LT, and looking at the answers to message 41 above, am I right to think there is no way to edit links to Wikipedia? What I have in mind is 'Nancy Blackett : under sail with Arthur Ransome', http://www.librarything.com/work/1661183/book/83403290, where both Nancy Blackett links go to Nancy Blackett the Amazon pirate, rather than one of them heading off to Nancy Blackett the yacht, http://en.wikipedia.org/wiki/Nancy_Blackett_%28cutter%29. Thanks.
50gangleri
Hi Tim! Is it possible to run your scripts based on some LT chractersistics:
a) LT authors with links to http://nobelprize.org/
b) LT authors with links to http://www.nndb.com/people/
c) on some results for LC CK search
I looked at Muller, Herta Nobel Prize (Literature ∙ 2009)
a) LT authors with links to http://nobelprize.org/
b) LT authors with links to http://www.nndb.com/people/
c) on some results for LC CK search
I looked at Muller, Herta Nobel Prize (Literature ∙ 2009)
51geyser
This may be off topic, but I found an author bio on wikipedia and I would like to add it to the reference section for that author.
http://www.librarything.com/author/burtonkatherine-1
https://en.wikipedia.org/wiki/Katherine_Burton
How could this be done? Thank you.
http://www.librarything.com/author/burtonkatherine-1
https://en.wikipedia.org/wiki/Katherine_Burton
How could this be done? Thank you.
52Lyndatrue
>51 geyser: I did this for you. Here are the steps.
Up in the corner of the author page (viewer's right) is a small area called "Links" and that's the correct place to put things like the "Wikipedia author page" (which is what it is). There are multiple types of links, of course, and the sources are richer for some genres (such as Science Fiction) and for better known authors.
I'm a bit surprised that you weren't aware of this, considering how long you've been on LT, but there's so many nooks and crannies here that I'm always being surprised myself.
Up in the corner of the author page (viewer's right) is a small area called "Links" and that's the correct place to put things like the "Wikipedia author page" (which is what it is). There are multiple types of links, of course, and the sources are richer for some genres (such as Science Fiction) and for better known authors.
I'm a bit surprised that you weren't aware of this, considering how long you've been on LT, but there's so many nooks and crannies here that I'm always being surprised myself.