Skip to Main Content
library logo

Ask us!

Digital Scholarship / Digital Humanities

Text Mining or Text Analysis

Marti Hearst, professor at the UC Berkeley School of Information, describes text mining as "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. A key element is the linking together of the extracted information together to form new facts or new hypotheses to be explored further by more conventional means of experimentation... In text mining, the goal is to discover heretofore unknown information, something that no one yet knows and so could not have yet written down" (What Is Text Mining?).

  • Words are data to be counted and analyzed
    • A concordance is one example of this: an alphabetical list of the words present in a text, often with citations of the passages where they are found
  • Not a replacement for close reading! Way to reexamine your understanding of a reading and find word usage or patterns that you did not catch
  • Starting point for deeper research and analysis, could inspire a hypothesis - should be combined with other processes
  • Scholarship that could not be performed at scale by a human - analysis of large bodies of text in a short period of time

Voyant Tools

Open source, web-based platform led by Geoffrey Rockwell, University of Alberta: https://voyant-tools.org/

  • Perform lightweight text analysis: Most frequent words, contexts, vocabulary density, trends, correlation, words per sentence, distinctive words
  • Upload text, link to URL, or work with Shakespeare’s plays or Austen’s novels
  • No account creation 
    • Data and visualizations can be exported/downloaded, corpus is cached on their server - bookmark your unique URL
    • Do not upload private data!
  • Learn more here: https://voyant-tools.org/docs/#!/guide/about

Voyant Tools tutorial video

This recorded webinar is a workshop with Voyant Tools co-creator Dr. Geoffrey Rockwell. He discusses why you might want to use Voyant, what it can and cannot do, and provides a detailed look at the various tools on this platform.

 

JSTOR Text Analyzer

Available through JSTOR library database or online at https://www.jstor.org/analyze/

  • Upload a text (a useful article, your paper, anything!) to find other sources in JSTOR about the same topics
  • The tool uses algorithms to define topics and determine which are most important in your uploaded text
  • Adjust the results by adding, removing, or adjusting the importance of the prioritized terms
  • JSTOR does not retain your uploaded text, so you do not need to worry about privacy!
  • The text analyzer tool is still in development, or "beta," so results may not be perfect!
    • Plus, the tool uses algorithms and machine learning to arrive at results
    • So, use your judgement and combine this tool with your own knowledge and understanding, rather than taking it to be the truth!
  • Learn more here: https://www.jstor.org/analyze/about

JSTOR Text Analyzer How-to video

This brief video provides an overview of how to use the JSTOR Text Analyzer.

 

HathiTrust

HathiTrust Digital Library (HTDL) - non-profit collaborative library for digital preservation: https://www.hathitrust.org/

HathiTrust Research Center (HTRC) - "Supports large-scale computational analysis of the works in the HathiTrust Digital Library to facilitate non-profit and educational research": https://analytics.hathitrust.org/

  • HTDL = register as guest (Google, Facebook, LinkedIn, etc.) to save datasets for use in HTRC
  • HTRC = register with academic email address
  • Algorithms:
    • Most frequent words
    • Topic modeling (changes each time - probability)
    • “Named entity recognizer”

HathiTrust 101: Search

This lengthy video describes in detail how to use various search functions in the HathiTrust Digital Library.

 

 

Introduction to the HathiTrust Research Center

A brief introduction to the HathiTrust Research Center's text and data mining tools.