dianoigo blog
Showing posts with label data science. Show all posts
Showing posts with label data science. Show all posts

Wednesday, 23 November 2016

Word counts by book for Septuagint, New Testament, Apostolic Fathers and Justin Martyr

Religious studies meets data science. The result:

Click on image for a larger result.

Texts used for this exercise were as follows. The LXX text is taken from freely available text files from the Center for Computer Analysis of Texts at the University of Pennsylvania. The NT text is taken from freely available text files of the SBL GNT maintained by James Tauber. The Apostolic Fathers text is taken from the Logos software edition of Michael W. Holmes' critical text.1 Justin Martyr's writings are taken from online Greek texts which are in turn based on Goodspeed's 1915 critical text (at least for the Apologies; no attribution is present for the Dialogue with Trypho). Text mining to obtain the word counts was conducted using R statistical software.

A few fun facts:

  • We have more words of Justin Martyr preserved (69741) than the entire Apostolic Fathers corpus (64757), thanks to the truly massive size of his Dialogue with Trypho.
  • Justin Martyr and the Apostolic Fathers combined (134498) are only slightly shorter than the New Testament (137554).
  • The Gospels and Acts make up over 60% of the New Testament by word count. The Pauline corpus makes up "only" 23.5%.
  • The whole of the LXX consists of 589013 words (based on the texts used here). Of this, 82% comes from books considered canonical by Protestants (albeit in Hebrew). An additional 13% (77806 words) comes from books considered canonical by Roman Catholics but not Protestants (1-2 Maccabees, Wisdom of Solomon, Sirach, Judith, Tobit, Baruch, Epistle of Jeremiah, Bel and the Dragon, Susanna).2 The other 5% comes from books not considered canonical by Protestants or Roman Catholics (1 Esdras, 3-4 Maccabees, Odes of Solomon, Psalms of Solomon).

A couple of caveats. In cases where two quite divergent text families exist for a single book (e.g., Joshua, Judges, Daniel, Susanna, Bel and the Dragon, Tobit) I've just represented one of the texts. It should also be noted that some of the texts have lacunae (Epistle to Diognetus; Dialogue with Trypho) or lost endings (Gospel of Mark; Didache), so the original word count would have been larger than the one reported here. Other texts have portions extant only in Latin (Polycarp's Epistle to the Philippians; Shepherd of Hermas) which will also have slightly affected the word count since, for example, there is no article in Latin. For the Martyrdom of Polycarp I've only included chapters 1-20 since the epilogues in chapters 21-22 are obviously added by later hands.


  • 1 Michael W. Holmes, The Apostolic Fathers: Greek Texts and English Translations (Grand Rapids: Baker, 2007).
  • 2 The Greek additions to Esther, also considered canonical by Roman Catholics, are not included here since I didn't go to the trouble of counting these words separately.