Main »

Tokyo DH Symposium 2011

These are my notes on a symposium organized by Tokyo University on "The Establishment of a Knowledge Infrastructure for the Next Generation and the Mission of Digital Humanities." This was organized by Masahiro Shimoda.

Note: these are being written live and therefore are full of typos and inaccuracies. I am also depending on simultaneous translation for the Japanese talks.

The summary of the symposium sent to me reads,

The ongoing revolution in the realm of digital technologies and the rapid permeation of the World Wide Web have served in combination to achieve the liberation to readers around the world a broad variety of knowledge preserved (or "trapped") in each individual institute, university, library, museum etc. This is being done by means of transforming both the manner of the preservation of knowledge and the manner of its transmission. In order to pass down to the next generation an intellectual heritage from which to develop new knowledge, it is our urgent responsibility to understand the distinctive characteristics of each form and genre of traditional (analog) knowledge, as well as the functions of the new digital medium. In this way, we can digitize each form of knowledge in a way that takes best advantage of its own characteristics. To rightly accept these challenges in the disciplines of literature, the arts, and sociology, Digital Humanities has been established as a distinct new field of research in North America and Europe. This symposium, by comparing the current status of Digital Humanities in the West with the present situation in Japan, aims to clarify the lurking issues that serve to obstruct to the construction of our knowledge infrastructure and to explore the directions in which we should proceed.

Introduction: Masahiro Shimoda

Professor Shimoda started the conference . He talked about what he learned creating a digital database of buddhist works. The digitization was a process. The team realized it was important that humanities scholars involved in the specificity of the content need to be closely involved in the creation of digital archives. They can't leave it for informatics specialists, but need to engage them. Humanists who know about the idiosyncrasies of their field have to contribute to the proper digitization. This symposium tries to bring together humanists and people in the fields of informatics, digital archives and computer science.

The Japanese Association of Digital Humanities was just started this year and the symposium brought people like John Unsworth to talk about the field.

Expectations from Humanities Scholars for the Handling of Humanities Materials in the Digital Archives of the National Diet Library: Makoto Nagao (President, National Diet Library)

Nagao started by talking about the digital library the National Diet Library is developing. Assiduous readers of my conference reports will note he also spoke on a similar subject at DH-JAC 2011. Their slogan is "Through Knowledge we Prosper." They are making available online Diet minutes and Prime Ministerial communications. They have digitized journals, dissertations, and other materials from the Meiji period to now. They have identified 9.5 million documents important to digitize of which about 2 have been digitized.

However there are copyright issues that have to be handled. Much of what they have digitized they can't provide online. Only about 300,000 documents are out of copyright.

He showed photographs that one can browse like photos of the German embassy which was near where the Diet is now located. He showed maps, manuscripts and writings that have been scanned.

They are also participating in the World Digital Library which was proposed by UNESCO and the Library of Congress. It is designed to help bridge the digital gap making materials available about the entire world. He showed examples of what they have provided the WDL like images of wooden pagodas and images of the Imperial Diet.

Currently they are digitizing at 400 dpi, but hope to go higher, but are worried about storage and budget constraints.

He talked about some of the image quality issues they have like bleed through on manga scans that have to processed out. They are also trying to gather materials from the earthquake, digital photographs and so on. He talked about gathering movement data about Maiko dancers - the sort of stuff that Hachimura is trying to do with Noh. He mentioned gathering government and university web sites, but this poses metadata issues.

He mentioned the need for standards and metadata guidelines. They are using Dublin Core for metadata, but would like to automate the creation of metadata. They currently get about 70% accuracy - it needs to be better.

A challenge they would like to address is to have the ability to search across image databases so you could find photos of a particular person. Can fingerprint image searching be applied to other purposes? Can image recognition algorithms be applied to generating descriptions of the content. Nagao described a number of challenges for researchers that would help the Diet Library. Nonetheless he expects that they will continue to need textual metadata.

It struck me that the NDL probably has a much higher percentage of images (prints, manuscripts, writings, drawings and video ...) which is why Nagao is so interested in automatic image processing. In a visual culture like Japan the generation of metadata for millions of records that are not textual is an issue.

Nagao then started talking about digitizing video (films, anime, video) and using automated processes to link scenes. He mentioned that NHK (Japanese national TV broadcaster) has a lots of dramas. John Unsworth on Twitter wondered about archiving games.

He talked then about search and the problems of Google Search and enabling multi-lingual search. He mentioned a Wisdom system developed that they will implement. I didn't catch what it would do. He wondered if it is possible to aggregate across the web on subjects like the earthquake and following nuclear issues. They have a project that will search all relevant information around the world (across languages) in order to provide new world histories. He wants to avoid the Eurocentric histories with these aggregations on different subjects. The library will be developing new global histories that presumably gather and translate documents from sources other than English or Western sources. He also mentioned linking the worlds libraries for these new global libraries. (I'm not sure I got this right.)

John Unsworth asked what the relationship was between the WDL and his vision of global histories. It sounds like he sees something more than archives of images of cultural objects. More like interlinked rich resources including contemporary research from many countries.

The breadth of activities of the NDL are impressive. I was interested in how they seem to be articulating research challenges they need solved and then working with researchers to solve them. In many ways I think his talk was designed to alert people to the upcoming problems they see. What I didn't hear was a discussion of using crowdsourcing for metadata and linking.

Panel Discussion: Chaired by A. Charles Muller

The panel started with short (20 minute presentations) and then there was some discussion.

Hideaki Takeda (Professor, National Institute of Informatics) talked about "Digital Archive of Knowledge for Sharing and Re-using." He started on the life-cycle of information and how its value changes from its creation to collection. Information is valuable if it is used. To be used it has to distributed. He argued that archiving is the dead end of information. They need to publish and share what is in the archive.

The answer is linked data. Many libraries are now making linked data available including the National Diet Library. Museums are shifting like Europeana.

He then talked about the project he is involved in, the LODAC project (Linked Open Data for ACademia) - see also [] (site in Japanese). They are trying to aggregate their data and then republish as linked data. The key is original IDs. The value is that they can then associate information. He showed a mashup of information about Yokohama. Another interesting application they have developed allows users to annotate data. They may have only images of cultural artefacts, but users can provide extra information.

What is the impact for researchers? He argued that this is a change of research practices from reseaching with your own data to researching with shared data. This allows a broader coverage where you use distributed data. Its happening in the scientific domain.

Toshinori Egami (International Research Center for Japanese Studies, Nichibunken) talked about the challenges of transmitting Japanese culture. He is a librarian, but sits at the counter helping people. He supports scholars of Japanese studies from abroad. Foreign scholars access them on site and online.

Who accesses Japanese literature and culture? The people who access this information don't always know Japanese. People who don't know Japanese may still want to access things like maps. He showed a database map entry that was all in Japanese.

He then talked about the Google nGram viewer and how Japan seems to be disappearing from the global stage. Japanese e-books and e-journals in North American holdings are low, even compared to Korea. The problem is that Japanese publishers don't make e-books and e-journals available. (I wonder if that includes all the manage available.)

He closed with some solutions. Materials need to be written in English and be put on the web. He also talked about how someone who finds a digital resource often needs help understanding the context - they need to be able to contact the archive. Digital archives in Japan need to have international help and should have mechanisms to help people negotiate licenses for those who want to publish Japanese materials.

Requests have no boundaries. He pointed out the North American Coordinating Council On Japanese Library Resources and their efforts to help.

Michael Moss and James Currall (University of Glasgow) gave a talk titled "Mind the Gap". We had a handout with lots of gaps like the gap between handicraft projects and industrial scale digitization. There is a gap between perfection and adequacy. They talked about the dangers of joining/aggregating materials because there can be subtle semantic differences that lead to problems.

One of their concerns is the isolation of institutions and projects. Projects are frequently isolated from each other, even projects at the same institution. Often these big projects spend a lot of money and don't have much impact. There is also a serious problem with sustainability. We are in a time of budget cuts which will mean digital humanities demolition.

There is a growing consensus in Europe. We are moving from small handicraft projects to industrial scale. At the industrial scale one can bring together all sorts of materials for research. The value of Google books is not just access, but things like their Ngram viewer that let you ask questions you couldn't of smaller isolated projects.

They talked about family history/genealogy resources that have been put together by amateurs and companies. Scholars look down on amateur genealogists, but one can use their data to do larger scale research. The customer base of genealogists and family historians has influenced the government to digitize birth and death records in the UK which can be used with imagination by scholars. What scholars want digitized isn't always what gets done by crowdsourcing or for amateurs, but we should be able to use it.

They then showed Zooniverse project that lets people in their lunch hour transcribe weather logs from ships which might give us a really large dataset to understand climate change.

They then shifted to talking about digital literacy. As we put stuff up we need to worry about how it will be interpreted. The issue is context. In large systems there is no context. To provide context you need to circumscribe a set of data (turn it into a silo) and add lots of relevant materials.

The key is getting the granularity right.

I find myself bristling at the silo argument. More and more information doesn't make us any wiser. There has been too much to read for a long time. Bigger and bigger datasets don't necessarily help us think things through. Silos protect materials (grain) - they are the right size to do that for a community. Highly contextualized hand crafted datasets provide a similar protection against interpretation for a community.

Currall and Moss had a good answer to a question I posed about the limitations of more and more information. They asked if it was any different in the analogue world. They talked about how we need to understand the way outfits like Google privilege certain things and we need to make space for reflection.

Digitization and Humanities Scholarship: John Unsworth

Digitization implies the production of a digital surrogate for something not digital. There are important questions about the surrogacy as most digital libraries mostly make available digital representations. When can a digital surrogates stand in for its source? When can it replace its source? What risks are there to digital surrogates?

An extreme view sees digital imaging as only for access. There can be risks of loss to the original by scanning or loss of information. Here is where libraries and archives may have different missions. Digital surrogates may not offer all aesthetic pleasure or information of an original, but may help with library issues like access.

A risk is that with digital surrogates libraries may discard the originals and therefore decrease the chance that the original is preserved.

For materials that are rare digitization can help preserve the original by standing in and reducing access to the original. Most people will be comfortable with digital surrogates, though we probably don't know yet what is the best way to digitize which means that we probably will have to redigitize later.

For materials that are rare and infrequently accessed digitization can change the access rates by exposing materials to a broader audience. For that matter, fashions change and something not accessed today may prove popular later.

When can a digital surrogates stand in for its source? When it stands in for the needs of users? What is the cost of producing and maintaining digital surrogates? Maintenance costs are to some extend unknown which is a danger.

What risks are there to digital surrogates replacing originals? Digital surrogates provides a partial view that seems complete. Digital surrogates are decontextualized in a way that could be misunderstood.

Any scholarship already involves some use of digital resources from a library catalogue to stuff on the web. Scholarship is a continuum. What new opportunities for scholarship are presented by digitization? Digital primary resources presents humanists with new materials they never would have seen. Digital resources present search tools that let us ask questions we couldn't before. Large digital collections make it possible to try new computational methods. The process of digitization also provokes humanists involved to think about what they are digitizing or what is being represented. Digitization itself raises all sorts of questions of interest to the scholar.

Digitization externalizes interpretation - scholars are forced to make choices that they become visible. This is why humanists need to be involved.

Unsworth talked about the Hathi trust formed partly by the materials given by Google to the libraries they drew from. The Hathi trust is perhaps one of the best examples of what the Hathi trust might look like. The next step is to build something that will provide computational access to the whole textbase (including the part that can't be accessed directly due to copyright issues.) This reminds me of the project at the Cornelle Theory Centre led by Arms that is providing computational access to the internet archive. John then pointed out that humanists, even if they have access to cyberinfrastructure like the Hathi trust, will need support and that will come from IT specialists and librarians.

He talked about "nonconsumptive" use of copyrighted materials. They have shown that users can run queries on copyrighted materials for research without getting access to the records.

Unsworth talked about how digitization will change not only what we study, but what questions we think are important. With computational access to large textbases we will begin to ask different questions. He mentioned the importance of preserving context from schemas to documents about choices. E-texts should come dressed with information so that we can understand their histories.

I asked about how to fund cyberinfrastructure in these times of budget restraint. He sees an important role for public/private partnerships. The Hathi trust had Google as a partner. There are risks - Michigan (that houses the Hathi trust) has been sued. He also mentioned the dangers of relying on commercial entities. We have very little of the history of American film because of commercial interests. (The film itself was melted down for the chemicals.) We need both public and private partners.

Closing Discussion Panel

The speakers then gathered for a final panel discussion. Professor Yoshimi gave some comments about the effects of the explosion of printing in the 16th century. We may be seeing effects similar to the changes of print technologies. Digital humanities may the field that addresses this change. He then talked about barriers like copyright, publishers. He argued there are barriers between the following:

  • Existing humanities fields and informatics
  • Universities and libraries
  • Global barriers between Japan and other countries

Then I was addressed by the chair and given a chance to make a few comments. I talked about some of the dangers and challenges I heard:

  • The dangers of isolated centers. There is danger that Japanese centres may compete rather than collaborate.
  • The importance of cooperating regionally.
  • I drew attention to the way digitization changes the questions asked and the evidence.
  • I mentioned the digitization of intangible cultural heritage which Japan leads on
  • And, I mentioned the preservation of games.

There were then questions from the floor (actually gathered on paper and articulated by Dr. Muller, the chair.)

  • There was a discussion about digitization and tenure. It doesn't hurt to work on digital projects, but should scholars without jobs invest time in these activities. Currall argued that we need to have more hybrid roles where people combine IT and humanities.
  • There was a discussion of the importance of statistics. Statisticians are pattern recognizers, as are humanists, they just use different tools.
  • There are methodological conflicts as the traditional humanities in Japan have inherited methods and assumptions that make it hard to shift to digital methods. What does it mean to give up a traditional way of doing research? Should we preserve the traditions of scholarship that are endangered.
  • John asked an interesting question about how we can preserve linked data over the long term. Humanities project go for decades - how can that be supported? Data storage is cheap, data management is expensive.

Professor Ishida, the Dean of the Interfaculty Initiative in Information Studies, closed the symposium with a few words. He talked about the University of Tokyo's initiatives and how important the digital humanities is.



edit SideBar

Page last modified on November 29, 2011, at 02:09 AM - Powered by PmWiki