Posts Tagged ‘access’
“I get slightly obsessive about working in archives because you don’t know what you’re going to find. In fact, you don’t know what you’re looking for until you find it.”*…
An update on that remarkable treasure, The Internet Archive…
Within the walls of a beautiful former church in San Francisco’s Richmond district [the facade of which is pictured above], racks of computer servers hum and blink with activity. They contain the internet. Well, a very large amount of it.
The Internet Archive, a non-profit, has been collecting web pages since 1996 for its famed and beloved Wayback Machine. In 1997, the collection amounted to 2 terabytes of data. Colossal back then, you could fit it on a $50 thumb drive now.
Today, the archive’s founder Brewster Kahle tells me, the project is on the brink of surpassing 100 petabytes – approximately 50,000 times larger than in 1997. It contains more than 700bn web pages.
The work isn’t getting any easier. Websites today are highly dynamic, changing with every refresh. Walled gardens like Facebook are a source of great frustration to Kahle, who worries that much of the political activity that has taken place on the platform could be lost to history if not properly captured. In the name of privacy and security, Facebook (and others) make scraping difficult. News organisations’ paywalls (such as the FT’s) are also “problematic”, Kahle says. News archiving used to be taken extremely seriously, but changes in ownership or even just a site redesign can mean disappearing content. The technology journalist Kara Swisher recently lamented that some of her early work at The Wall Street Journal has “gone poof”, after the paper declined to sell the material to her several years ago…
A quarter of a century after it began collecting web pages, the Internet Archive is adapting to new challenges: “The ever-expanding job of preserving the internet’s backpages” (gift article) from @DaveLeeFT in the @FinancialTimes.
###
As we celebrate collection, we might recall that it was on this date in 2001 that the Polaroid Corporation– best known for its instant film and cameras– filed for bankruptcy. Its employment had peaked in 1978 at 21,000; it revenues, in 1991 at $3 Billion.
“I rather think that archives exist to keep things safe – but not secret”*…
Brewster Kahle, founder and head of The Internet Archive couldn’t agree more, and for the last 25 years he’s put his energy, his money– his life– to work trying to make that happen…
In 1996, Kahle founded the Internet Archive, which stands alongside Wikipedia as one of the great not-for-profit knowledge-enhancing creations of modern digital technology. You may know it best for the Wayback Machine, its now quarter-century-old tool for deriving some sort of permanent record from the inherently transient medium of the web. (It’s collected 668 billion web pages so far.) But its ambitions extend far beyond that, creating a free-to-all library of 38 million books and documents, 14 million audio recordings, 7 million videos, and more…
That work has not been without controversy, but it’s an enormous public service — not least to journalists, who rely on it for reporting every day. (Not to mention the Wayback Machine is often the only place to find the first two decades of web-based journalism, most of which has been wiped away from its original URLs.)…
Joshua Benton (@jbenton) of @NiemanLab debriefs Brewster on the occasion of the Archive’s silver anniversary: “After 25 years, Brewster Kahle and the Internet Archive are still working to democratize knowledge.”
Amidst wonderfully illuminating reminiscences, Brewster goes right to the heart of the issue…
Corporations continue to control access to materials that are in the library, which is controlling preservation, and it’s killing us….
[The Archive and the movement of which it’s a part are] a radical experiment in radical sharing. I think the winner, the hero of the last 25 years, is the everyman. They’ve been the heroes. The institutions are the ones who haven’t adjusted. Large corporations have found this technology as a mechanism of becoming global monopolies. It’s been a boom time for monopolists.
###
As we love librarians, we might send carefully-curated birthday greetings to Frederick Baldwin Adams Jr.; he was born on this date in 1910. A bibliophile who was more a curator than an archivist, he was the the director of the Pierpont Morgan Library in New York City from 1948–1969. His predecessor, Belle da Costa Greene, was responsible for organizing the results of Morgan’s rapacious collecting; Adams was responsible for broadening– and modernizing– that collection, adding works by Virginia Woolf, E. M. Forster, Willa Cather, Robert Frost, E. A. Robinson, among many others, along with manuscripts and visual arts, and for enhancing the institution’s role as a research facility.
Adams was also an important collector in his own right. He amassed two of the largest holdings of works by Thomas Hardy and Robert Frost, as well as one of the leading collections of writing by Karl Marx and left-wing Americana.

“The love of learning, the sequestered nooks, / And all the sweet serenity of books”*…
Righteous recycling: garbage collectors in Ankara started “reclaiming” discarded books and ended up opening a library….
It all started when sanitation worker Durson Ipek found a bag of cast-off books when he was working and then it snowballed from there. Ipek and other garbage men started gathering the books they found on the streets that were destined for landfills and as their collection started to grow, so did word of mouth. Soon, local residents started donating books directly.
The library that originally contained 200 books is located in the Cankaya district of the capital city in a previously vacant brick factory at the sanitation department headquarters. The library was initially available only to the sanitation employees and their families to use but as the collection grew, so did public interest and the library was opened to the public in December 2017…
All the books that are found are sorted and checked for condition, if they pass, they go on the shelves. In fact, everything in the library was also rescued including the bookshelves and the artwork that adorns the walls…
Today, the library has over 6,000 books that range from fiction to nonfiction and there’s a very popular children’s section that even has a collection of comic books. An entire section is devoted to scientific research and there are also books available in English and French…
The full story at: “Turkish Garbage Collectors Open a Library from Books Rescued from the Trash“
* Henry Wadsworth Longfellow
###
As we check it out, we might spare a thought for James Billington; he died on this date in 2018. A historian at Harvard and Princeton, who went on to hold the directorship of the Woodrow Wilson International Center for Scholars, Billington is probably best remembered for his final post, Librarian of Congress, a position he held from 1987 to 2015.
The Library of Congress, the oldest federal cultural institution in the U.S., is the nation’s de facto national library. As librarian, Billington oversaw that resource and appointed the U.S. poet laureate and awarded the Gershwin Prize for Popular Song each year. Billington undertook during his tenure to broaden and deepen public access to the LoC’s remarkable holdings, introducing a series of no-fee access services.
As Librarian, he also oversaw the Digital Millennium Copyright Act (DMCA). In 2010, Billington’s decision to open new DMCA loopholes resulted in his being described as “the most important person you never heard of.”
“In a time of deceit, telling the truth is a revolutionary act”*…
For years, the Internet Archive has been acquiring books (their goal is every book ever published) and warehousing them and scanning them. Now, these books are being “woven into Wikipedia” with a new tool that automatically links every Wikipedia citation to a print source to the exact page and passage from the book itself, which can be read on the Internet Archive.
Citations to print materials are both a huge potential strength and weakness for Wikipedia: a strength because there’s so much high-quality, authoritative information in print; and a weakness because people can make up (or discount) print citations and bamboozle other Wikipedians who can’t see the books in question to debate their content, context, or whether they should be included at all.
Archive founder Brewster Kahle kicked off the initiative after a discussion with Wikimedia’s executive director Katherine Maher, who was “worried that truth might fracture.”
Wikipedia is a key battleground in the war against disinformation, and the Internet Archive’s measures — which were presented to Congressional staffers yesterday — are a huge advance on the state of the art.
“I want this,” said Brewster Kahle’s neighbor Carmen Steele, age 15, “at school I am allowed to start with Wikipedia, but I need to quote the original books. This allows me to do this even in the middle of the night.”
For example, the Wikipedia article on Martin Luther King, Jr cites the book To Redeem the Soul of America, by Adam Fairclough. That citation now links directly to page 299 inside the digital version of the book provided by the Internet Archive. There are 66 cited and linked books on that article alone.
Readers can see a couple of pages to preview the book and, if they want to read further, they can borrow the digital copy using Controlled Digital Lending in a way that’s analogous to how they borrow physical books from their local library.
Via Boing Boing: “The Internet Archive’s massive repository of scanned books will help Wikipedia fight the disinformation wars“; for more details, read The Internet Archive’s announcement here.
“Together we can achieve Universal Access to All Knowledge, said, one linked book, paper, web page, news article, music file, video and image at a time.”
– Mark Graham, Director of the Internet Archive’s Wayback Machine
* George Orwell
###
As we accelerate access, we might send insightfully-humorous birthday greetings to William Penn Adair Rogers; he was born on this date in 1879. A stage and motion picture actor, vaudeville performer, cowboy, humorist, newspaper columnist, and social commentator, he he traveled around the world three times, made 71 films (50 silent films and 21 “talkies”), and wrote more than 4,000 nationally syndicated newspaper columns. By the mid-1930s Rogers was hugely popular in the United States, its leading political wit and the highest paid of Hollywood film stars. He died in 1935 with aviator Wiley Post when their small airplane crashed in northern Alaska.
Known as “Oklahoma’s Favorite Son,” Rogers was a Cherokee citizen, born to a Cherokee family in Indian Territory (now part of Oklahoma).
“I am not a member of an organized political party. I am a Democrat.”- Will Rogers
“Knowledge, like air, is vital to life. Like air, no one should be denied it.”*…

Belgian information activist Paul Otlet (1927)
More than a century ago, Belgian information activist Paul Otlet envisioned a universal compilation of knowledge and the technology to make it globally available. He foresaw, in other words, some of the possibilities of today’s Web.
Otlet’s ideas provide an important pivot point in the history of recording knowledge and making it accessible. In classical times, the best-known example of the knowledge enterprise was the Library of Alexandria. This great repository of knowledge was built in the Egyptian city of Alexandria around 300 BCE by Ptolemy I and was destroyed between 48 BCE and 642 CE, supposedly by one or more fires. The size of its holdings is also open to question, but the biggest number that historians cite is 700,000 papyrus scrolls, equivalent to perhaps 100,000 modern books…
Any hope of compacting all we know today into 100,000 books—or 28 encyclopedic volumes—is long gone. The Library of Congress holds 36 million books and printed materials, and many university libraries also hold millions of books. In 2010, the Google Books Library Project examined the world’s leading library catalogs and databases. The project, which scans hard copy books into digital form, estimated that there are 130 million existing individual titles. By 2013, Google had digitized 20 million of them.
This massive conversion of books to bytes is only a small part of the explosion in digital information. Writing in the Financial Times, Stephen Pritchard notes that humanity generated almost 2 trillion gigabytes of varied data in 2011, an amount projected to double every two years, forming a growing trove of Big Data available on about 1 billion websites… Search engines let us trek some distance into this world, but other approaches can allow us to explore it more efficiently or deeply. A few have sprung up. Wikipedia, for instance, classifies Web content under subject headings…
But there is a bigger question: Can we design an overall approach that would reduce the “static” and allow anyone in the world to rapidly pinpoint and access any desired information? That’s the question Paul Otlet raised and answered—in concept if not in execution. Had he fully succeeded, we might today have a more easily navigable Web.
Otlet, born in Brussels, Belgium, in 1868, was an information science pioneer. In 1895, with lawyer and internationalist Henri La Fontaine, he established the International Institute of Bibliography, which would develop and distribute a universal catalog and classification system. As Boyd Rayward writes in the Journal of Library History, this was “no more and no less than an attempt to obtain bibliographic control over the entire spectrum of recorded knowledge.”…
The remarkable story in full at: “The internet before the internet: Paul Otlet’s Mundaneum.”
* Alan Moore, V for Vendetta
###
As we try to comprehend comprehensiveness, we might recall that it was on this date in 1985 that the first .com Internet domain, symbolics.com, was registered by Symbolics, a now-defunct Massachusetts computer company.
You must be logged in to post a comment.