Main »

Tools For Data-Driven Scholarship

I was at the Tools for Data-Driven Scholarship meeting funded by the NEH, NSF, and IMLS and hosted by MITH and the Centre for New Media and History in Maryland, October 22nd - 24th.

Note: These notes are being written while we discuss issues. They are biased and only what I had the time to type.


Dan Cohen of Zotero introduced the two keynotes.

Chris Blizzard, Mozilla

Chris talked about Mozilla and the Open Web. He talked about the generative effect of Mozilla - how it enables the generation of new things (extensions and plug-ins.) He talked about the importance of marketing and localization. He talked about how there is an underlying XUL runtime that can be used to create internet applications like the Flickr uploader. Conversely we can begin to think about a web page becoming an application that someone goes to on a cell phone and use as an application.

"The web is much more fluid than people realize."

Maura Marx, Open Knowledge Commons

Maura talked about the challenge of a public digital library about the history and literature of North America (by which I think she actually meant the USA) and how to connect with a broader public. She talked about a project where they had great tools and all the small archives ended up just using Flickr. She talked about how content + tools = knowledge.

Brett Bobley and Challenges

Brett introduced the first full day and talked about the background of the event. He talked about the idea of a directory of tools that got this started. Neil Fraistat then introduced the questions:

  1. How do we connect tools with people?
  2. How do we connect tools with each other?
  3. How do we connect tools with collections?

We broke out into groups and gathered challenges.

  • Data: how do people find data that's available for use with tools?
  • People: how do we find the people we need to build our tools and how do we coordinate them?
  • Need: how do people with data but no programmers communicate needs with people with developers?
  • User Adoption: how to help users understand and use tools? We are an evangelical group and forget how many tools are being used from the web to word-processors.
  • Local vs Global: how do you find a balance between development for local problems and global use. How do you move money across borders? How do you make sure initiatives are capable of being international and multi-lingual from the beginning.
  • Management: how to avoid depending on Google for management of data and tools? What sorts of management tools do we need? How to manage projects when humanists are not trained to manage? We need a neutral ground. How to manage dependencies?
  • Roadmap: how to be transparent about what you are doing and where you are going?
  • Taxonomy of Tools: how do define tools? How do you understand their variety and application?
  • Tool Silos: how to create a scholarly social network that can leverage participation and sustain it. It is easier to build a community around a tool than a broad area.
  • Discoverability: who is the audience and how will they discover and use tools over time? How do we market and reach out from projects?
  • The Future: what do we know about what has been done? How can projects die usefully? How can we cull and combine tools? How can we capture the useful knowledge in tools so others can learn and we can create a sustainable culture of methods and tools? How can we learn from other disciplines like science? Where do tool projects imagine they will be?
  • Training: how do we train the next generation? How do we push our tools into the curriculum? Training is one of the assets we have.
  • Vision: there is the idea that there is a "killer-app" and a mythology around it. Do we need a "myth" to explain. How do we bridge the gap between the myth and a culture?
  • Usability: how to make tools usable without help on the other side of the world?
  • Ground truths: a) there are more problems and less money in the humanities. b) we need to think at how tools impact scholarly productivity? Is it utopic to try to develop tools that are generally useful across the humanities?
  • Models: scholar-centric models vs. technology-centric vs. content-centric. We need to replace the word "tools" with "workflows"? There is also a lack of models of use and data?
  • Domain Specificity: how to move tools across disciplines? Can we support different media - can ideas for tools work across media?
  • Data Driven: are there tools that are not data driven?
  • Outcomes: what are the outcomes of computer-assisted research? Is it always the book that is going to get rewards?
  • Software development practices: how do we weave in and train in development practices so that what is developed is useful? How do we create sustaining culture?
  • Problems: do we have large challenges? Is that the rhetoric that we want to adopt?
  • Success: what would success look like? We don't have the rhetoric to discuss success and we don't have the means to measure use?

Discussion of Challenges

We then had a general discussion of challenges and ways forward.

  • Watering Hole: there are communities that form around places and content. We need to find ways of working with these watering holes. Part of this is bringing tools together with content providers (or other watering holes) that have content.
  • Who needs what? There are two emerging communities: a) the community of tool developers who need support and rewards, and b) the community of users who need support from service outfits (and developers) to do their research. This second group is probably well represented by academic support units and their faculties.
  • What the future might look like? When we were asked what things might look like in the future we came up with two broad paradigms. a) The disappearing tool embedded in content which includes embedding processes in publications as we tried in Now Analyze That. b) Large frameworks that do everything or enable smaller developers to build extensions.


The second round of breakout groups after lunch looked at strategies to overcome the challenges. One of the outcomes will be a report from the meeting. We looked at questions about audience and content. Some of the solutions that emerged were:

  • A contest or exchange like MIREX or T-Rex that runs annually
  • A journal or online journal where developers can publish about tools and methods
  • Training books that are tied to the tool. A sort of Using TACT - perhaps we need a imprint in this area. Will Turkle has a wiki called the Programming Historian that sort of does this.
  • Cookbooks for code (pseudocode and algorithms) and search bars on specific areas.
  • A Wired like magazine that highlights innovation or a column in the Chronicle. Or could it be viral video and blogs. Sci Vie? is an example of this video idea.
  • Merlot is an example of a review community where learning objects are registered and reviewed.
  • How about a concerted assault on slashdot, wikipedia, Face Book?, and You Tube?.
  • Go off campus and create a consortium that can help with commercialization. There is a role for an entity that can act as a pass-through for IP.

A Tool Consortium or Registry

A model that brought a lot of these together seemed to emerge in the discussion group I was part of which was a consortium that institutions paid into that ran the subversion, the discussion lists, the discovery environment, the outreach, the recipe cookbook and possibly the review process. Some of the things such a consortium could do:

  • Provide a code repository where tool developers can drop off code or actually use for their project from the start
  • Provide basic development management tools from team management tools, to wikis to bug tracking
  • Provide an outreach function that explains tools, methods and practices
  • Provide a discovery function so tools can be found by naive users
  • Provide documentation and encourage standardization of documentation
  • Run contests and exchanges
  • Provide discipline specific textbooks, recipes, cookbooks and review
  • Run training seminars and develop training materials
  • Lobby on behalf of tool development, open access to content (for tools), and innovative methods

The case for not creating one project was discussed. If the problem of silos can't be overcome perhaps we want to create a field where competition and collaboration can take place.

Other issues raised were:

  • Trust: trusting the data and tools to survive over time and to work the same.
  • Mission: what are the missions of human institutions and where do they stop and interface with others?
  • Incentives: how do we provide incentives to tool developers to deposit archivable of a tool and how do we also let them explore experimental versions?
  • Material Culture (Stuff): what sorts of tools are needed by those to collect stuff.
  • Does it matter: Maybe people don't care about tools, perhaps they only care about the content.
  • What is a collection? What if libraries and museums thought of collections of machine-actionable content instead of end-user materials? What if you create a test-bed for innovative processing?
  • Best Practices: how do we share the best practices in tool development, management, sustainability or use?
  • Openness: we need to talk about multiple facets of openness. How can we make practices open not just code or content? Given the intellectual property issues, can we find forms of openness that don't involve giving up a corpus - ie. open analytical interfaces?

Words that keep coming up: curating, tools, stuff, recipes, embedding, management, development

Examples of what works came up. What are the paradigms?

  • Zotero
  • Word Cloud
  • Large science tool registries like the gene-sequence registries
  • Merlot

Funding Agencies

Funding agencies (NEH, JISC, IMLS, NSF and Mellon) where there. The issue of what their role was came up. What could the funding agencies do to advance the field? One idea is to issue RF Ps? for different services, registries, and standards. Funders want to increase the visibility of what is already funded - what can they do?


Througout the discussions we kept returning to the question of who is the audience for an initiative and what do they want? Some different audiences are:

  • Learner - people learning research practices that could use tools need to be able to find them, try them, and understand their interpretative implications. Think of graduate students in research methods and humanities computing courses.
  • "Naive" Researcher - researchers who don't think about tools still use them in different ways. They use things like Google; they use tools inside content like the search engine of a digital archive. They use word-processors and citatin tools. How do researchers hear about tools, decide if it is appropriate, learn how to use it, and weave it into their practices? They sometimes hear about tools and try them. They might publish with embedded tools.
  • Developers - developers want to know who is out there; they want code cookbooks, they want channels to share tools and get recognition. Of course, they also want to
  • Service Providers - deans and directors of computing centres have a responsibility to make sure their researchers and students have access to appropriate tools at a reasonable cost. They need to know what they can support and what the costs of supporting tools is.
  • Content Providers - e-text projects need to find and adapt the appropriate text management and search tools. If they are an editorial team then they need editing tools and project management tools. If they are a library they need be able to weave toos into their environments.
  • Readers - readers of research that makes claims based on analysis using tools and content need to understand the claims and be able to question the claims, evidence, and computer-assisted processes. Such readers might want to recapitulate it results and play with the parameters.

Other Projects and Imagining an Initiative

We began the last day hearing about related projects.

Robil Malitz talked about the CARPET project in Germany that is looking at as a model for e-publishing tools. They are funded to develop a social tool tagging environment. Long term support isn't clear.

Malcolm Read of JISC talked about what they are doing. He started by explaining why AHDS was cut. The reason was that the AHRC didn't want to continue funding their part of AHDS. They have been funding a number of Virtual Research Environments (VR Es?). They don't want to fund particular tool projects for particular people - they want to shift to generic and scalable tools. He feels that initiatives should be ambitious. One problem is that you have a lot of scholars spending lots of time acting as very poor computer scientists. It would be useful to have professional teams building generic tools.

David Greenbaum talked about Bamboo which is developing a consortium through a planning process. How can we advance arts and humanities research through the development of shared technology services? They are trying understand the practices, directions and commonalities in the humanities. They are trying to move towards generic tools and services that can be supported. They want to evolve a stable organizational partnership. Who is the "we"? Bamboo is in the middle of a community design process so they don't know what the outcome will be. One thing that stands out about their leadership is that it is made up of deans, CI Os? and directors, not the usual suspects. One unanticipated idea was the importance of telling stories. Bamboo will have various phases starting with a planning phase that support and fund a bunch of demonstrator projects. In an implementation phase they might develop social tool environments, but they are trying to figure out long term generic services. Bamboo is a highly flexible organic material.

Imagining an Initiative

We then had breakout sessions imagining possible initiatives and issues. Some of the suggestions were:

  • Reconceptualize as a publication rather than directories - we need an initiative that is more of a publication than a directory. Imagine a dynamic online publication that has formal and informal features.
  • Internationalize the project from the very start. Set up everything from the beginning to multi-lingual.
  • Target at two audiences: a) insiders, developers and service folk, and b) the other 95% who don't much about this area. We had a discussion about how you actually reach the 95%. Do you do it through graduate students, through deans, through people who would read stuff, or through disciplinary venues. The audience and strategies for reaching new audiences was clearly a major issue.
  • Have curated guide that looks at methods
  • Cover collections and projects not just tools - comprehensive project that
  • Funding is necessary from the start not just technical infrastructure but also for training and curating
  • We need an exit strategy for when it is no longer funded - possibly a consortial model that
  • Do the writing up better so it is accessible
  • Eduforge is an open educational sourceforge type project that might be a model
  • We need to experiment with new types of tool publication
  • We asked whether aspects of any idea has been done before?
  • What would the goals of funding agencies? Funding agencies have needs too.
  • Tools connected to content - is it possible to imagine an initiative that encourages the development of tools within content. Model content affordances rather than tools.
  • How does one avoid initiatives from single institutions. An "invisible college" with low membership rules might open this to a wide community. Conversations across multiple players are emerging that are interesting. An invisible college could stage outreach and prototypes.
  • Not invented here syndrome is strong. What would turn our colleagues on? Can you only work on tools if you are good at IT or could we imagine "meta-tool" development by everyone?
  • Would funders support curriculum development to get tools and tool building into hands of undergrads.
  • We discussed a craft model where we develop a craft community for developers.
  • We discussed the issue of what the developers really want and a registry wasn't really that exciting. They want a) rewards, credit, and review; b) long term funding models to really do tool development right; and c) we want methods and recipes exchange that brings atomic tools together around research practices
  • We want ways to pass on knowledge from projects when they get to the end of their life
  • We need more surveys and lightweight projects
  • There were concerns about the level of funding and the duration of funding needed to succeed.

Funders Last Word

Then we had closing words from Brett Bobley who funded this meeting. Brett thanked MITH and the Center for New Media and History for organizing the event.

Post-meeting reflections

Why tool developers are part of the problem

I started meditating on how I and people like me are part of the problem:

  • The not invented here syndrome - academics are in the credit (fame) game so they are unlikely to want to give up a project and subsume it under another if it means losing credit.
  • Staying on the cutting edge - even if we succeeded in developing generic tools or a tool development environment for others, would I stop doing work and return to Plato? Probably not - developers get addicted to development and would just push the envelope out and start imagining new new tools.
  • It is to the advantage of the academic developers to keep the field small so they get the credit and the grants. Why would we want to make it easy or think of solving it - that would put us out of development and force us back to the traditional work we left.

What are some things we should do?

  • Group blog on how to do things. A blog and wiki where people can post recipes and ideas. It could have a cookbook.
  • A tools competition and exchange page
  • A curated site of tools and projects
  • A matching service that gets collections/content projects together with tools projects
  • A peer-review mechanism for tools
  • A hardware hacks and toys commercial venture that packages and sells stuff for exhibits, public history, and digital play
  • The simple things project that encourages only simple tools that will work with a framework that has an API. It might be simple things for Eclips or Zotero. You bring people in with simple things and simple hacks.
  • What if there was a MAME project to preserve old tools.

What are the projects that we should look at:



edit SideBar

Page last modified on October 24, 2008, at 12:56 PM - Powered by PmWiki