A few weeks ago on the Code4Lib Slack channel, a computer science student asked:
Hi! I'm only a bachelor's student in CompSci but I'd be
interested to learn more about library specific software. What
are some small beginner friendly projects to get into? I learn
better creating a product.
And I replied with the following. I'm sharing it here because I believe it is relevant to many of us here:
More or less, libraries and librarianship are traditionally about
the collection, organization, preservation, and disseminataion of
data, information, and knowledge. Beginner friendly projects?
There are quite a few, listed below in no priority order:
* Create a library catalog - Download and install a program
called Koha. Bring together a large handful of your books. Use
Koha to describe your books. Make the resulting catalog
temporarily available on the Web. For extra credit, search things
like the Library of Congress of descriptions of books (known as
MARC records), and add them to your catalog. All of this can be
done within Koha using both GUI and programatic interfaces.
* Build a collection of scholarly journals - Articulate a topic
of personal interest, but don't be too specific. Peruse a
directory of scholarly journals called the Directory of Open
Access Journals (DOAJ). Look for titles matching your interest
and note the URL pointing of the titles' OAI-PMH data root. (This
is the hardest part.) Use either Perl or Python toolkits
implementing the OAI-PMH protocol, and collect the bibliogrpahics
of all the articles in a given title. For extra credit collect
the actual articles, not only the bibliographics.
* Index the content of a relational database - Draw an enity
relationship diagram illustrating the layout of a set of
data/information you want to collect. The data/information can be
any number of things: your books, your CDs, your DVDs, cool
websites, journals articulated from the previous project, etc.
Use SQLite to implement the layout, and fill the database with
content. Finally, use SQLite's fulltext/freetext indexing feature
to make the database searchable, and write a command-line shell
tool to query the index. For extra credit, create a Web-based
interface to the index.
* Archive content - Identify websites of interst to yourself.
Become familiar with the robots.txt convention. Use a
command-line tool called wget to crawl the websites of interest,
and use wget's WARC feature to create long-lasting snapshots of
the sites. Use these WARC files as fodder for the library catalog
or relational database project.
* Create a website - Sign up for a free Amazon Web Services
account. Spin up a tiny instance with the two cores, and the
tiniest bits of RAM and disk space. All of this is still free.
Install Apache -- an HTTP server -- on the instance. Write the
tiniest of HTML pages and save it at the root of your Apache
server. Finally, configure the instance to accept connections
from the world. For extra credit, write a .htaccess file limiting
access to the site via usename/password combitations, and lock
down access to the site to only your friends and family.
* Practice with REST - Enumerate things of interest to yourself.
Become familiar with the Internet Archive's REST interface for
searching its collection. Articulate queries to search the
Archive's collection using its REST implementation, and manifest
the queries using a tool call curl. The results of the queries
will be JSON streams. Use another tool -- jq -- to read, parse,
filter the result. In the end you will get links to PDF, plain
text, image, and descriptive (MARC) files of the Archive's
content. Use your new skills against other websites with REST
interfaces. For extra credit, use this content as fodder for some
of the other projects.
* Bind a book - Identify a classic work of literature that piques
your interest. Download a PDF version of the book. Print it. Use
a binding technique called the Japanese stab stitch to bind the
book. Read the book and while you do so, write in the margins.
Alternatively, use a comb binder or something similar to bind the
book. For extra credit, understand that a PDF version of a book
is not a book. Instead, it is a file. To really make the PDF file
a book, it behooves you to impose the pages to create signatures,
bind the signatures into a book block, and finally encase the
book block between covers. By far, the most difficult part of
this process is the imposition and a program called Fitplot works
very well for me in this regard. Like the creation of WARC files,
the binding of books is a type of preservation process.
Underlying all of these projects is one thing: libraries are not
about books. Books manifest the data, information, and knowledge,
and now-a-days, data, information, and knowlege are increasing
manifested in digital forms.
Fun with librarianship!
--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
Universityi of Notre Dame
|