Word Analysis: The Bible


During the election, I spent a lot of time analyzing the word usage of speeches and debates and I thought it was interesting how much information you could glean by just analyzing the most commonly used words. As I was doing those, I began to wonder what I could learn by performing a similar analysis on the scriptures of various religious traditions. As I’ve noted previously on my posts, I’ve always been very interested in the world’s different religions—their histories, traditions, belief systems, differences between them, and also their similarities. So much of this come from each faith’s scriptures, so I thought that analyzing the word usage at a macro level could be particularly insightful. So, I’m going to be starting a series of posts (and visualizations) analyzing the scriptures of the world’s religions, including both western and eastern traditions.


The Bible
I’m going to start the series with the scriptures of Christianity, the Bible. The Bible, as most of us know, is broken into two major Testaments, which Christians refer to as “Old” and “New”. The Old Testament is also, more or less, the same set of books as the Hebrew Bible (though some interpretations are a bit different and the books are ordered differently), so this analysis will largely encompass the Jewish scriptures as well. The Bible is also generally broken up into eight subdivisions, the first five of which fall within the Old Testament.

  1. The Law (Pentateuch) – Genesis, Exodus, Leviticus, Numbers, Deuteronomy
  2. Historical Books – Joshua, Judges, Ruth, 1st Samuel, 2nd Samuel, 1st Kings, 2nd Kings, 1st Chronicles, 2nd Chronicles, Ezra, Nehemiah, Esther
  3. Poetical Books – Job, Psalms, Proverbs, Ecclesiastes, Song of Solomon
  4. Major Prophets – Isaiah, Jeremiah, Lamentations, Ezekiel, Daniel
  5. Minor Prophets – Hosea, Joel, Amos, Obadiah, Jonah, Micah, Nahum, Habakkuk, Zephaniah, Haggai, Zechariah, Malachi
  6. Gospels and Acts – Matthew, Mark, Luke, John, Acts
  7. Epistles of Paul – Romans, 1st Corinthians, 2nd Corinthians, Galatians, Ephesians, Philippians, Colossians, 1st Thessalonians, 2nd Thessalonians, 1st Timothy, 2nd Timothy, Titus, Philemon
  8. General Epistles and Revelation – Hebrews, James, 1st Peter, 2nd Peter, 1st John, 2nd John, 3rd John, Jude, Revelation

I decided it would be best to work with the New International Version (NIV) of the Bible, a modern English translation of the Protestant Bible (the Catholic Bible includes some additional Old Testament texts) which dispenses with the thee’s and thou’s of the more standard King James version. I was fortunate enough to find a copy of the NIV text on www.godoor.net. From there, I broke the text into each of the sixty-six books. I then used a tool from www.writewords.org to count the occurrences of each word in each book, which I compiled together into Excel. From here, I was able to begin my analysis.

Unfortunately, as was the case with my previous analyses of political debates and speeches, the most common words were ones like “the” and “and”, which give virtually no insight into the meaning of the text. (I’d be willing to bet that “the” is the most common word in pretty much every single text ever written in or translated to English.) But, of course, these are not the only common words which would rob the analysis of its insight. I, therefore, needed some way to remove them. As a starting point, I obtained a list of the 5000 most common words from www.wordfrequency.info. I did some of my own cleansing of the list to remove certain words which are common, but still very meaningful. In the end, I was able to create a fairly comprehensive list of word to be removed, which mostly consisted of pronouns (he, she, it), prepositions (at, from, after), conjunctions (and, but, so), and determiners (this, the, those). Of course, this process is a bit subjective and I may have made choices others would not have, but the result, I think, will be much better than simply leaving all of these commonly used words in the analysis. (If interested, I’d be happy to share the list of words I’ve excluded.)

Once I had excluded these common words, I began to visualize the word usage in Tableau (you can find the full visualization here). I started by analyzing the most commonly used words in the Bible as a whole. The following visualization shows the Top 10 words.


Top 10 Words in the Bible (List starts on the left and goes down)

The number 1 word is “Lord” (this includes both the upper case “Lord”, which refers to God and the lower case “lord”, which would refer to a someone’s superior), with 7,759 occurrences, followed by God, at number 2, with 3,977 occurrences. Interestingly, “Jesus” makes the list (ranked # 10 with 1,273 occurrences), despite the fact that the New Testament accounts for less than 23% of the 274,000+ words (Remember, this is a filtered list of words; without filters, the total word count is over 725,000, but the percentages of the Old and New Testament remain relatively the same).

I also created a Word Cloud, which allows you to select a specific Testament, Division, or Book. For example, here’s the cloud for the book of Job.



Working with this view of the data and experimenting with different filters can be quite insightful.

Finally, I created a single visualization showing bubble charts of each book’s word usage.



I think this is an interesting view of the data because it shows you the most commonly used words as well as giving you some perspective on the amount of words used in each book compared to one another. For example, you can see that the Epistles of Paul are, more or less, organized by size, with the largest book, Romans, appearing first, and the smallest, Philemon, appearing last.

But other insights can be gained here as well. For example, the term “Christ” is incredibly common in the Epistles of Paul and Acts (typically ranked first or second), but not nearly as often in the other books of the New Testament. In Hebrews, for example, “Christ” is only the 12th most commonly used word. In James, which is traditionally attributed to Jesus’s brother, “Christ” is the 84th ranked word, only appearing twice.

There is likely a lot more insight to be gained, but I’ll leave it there for now. Please take a look at the visualization and let me know if you discover anything interesting. Again, you can find the Tableau visualization, which includes separate tabs for each of the visualizations above, here.

Stay tuned. I’ll be back soon with another analysis of the scriptures of one of the world’s religions.

Ken Flerlage, December 12, 2016
 

15 comments:

  1. excellent, thank you. very helpful

    ReplyDelete
  2. I was just thinking about doing this a workshop I’m facilitating, and thought “some smart person must’ve done it by now!”
    Nice work. I will play around to see what insights jumps out and may reach out for underlying data if you don’t mind sharing.

    Kudos and thanks!

    ReplyDelete
    Replies
    1. Sure. If you send me an email, I will send the data to you. flerlagekr@gmail.com

      Delete
  3. I too love word studies.
    Here's one I (laboriously!) put together on Romans: https://www.charleswaugh.com/RomansWordFrequency.pdf
    It's very interesting to see the flow of thought and the shift of emphasis from start to end.

    ReplyDelete
  4. This is really great. Thanks for doing this. I love tableau! Are you able to allow it to filter by the words as well? I want to type a word and see the results. I'm interested in the words friend and friendship---I saw friends already. I think it would be helpful to search by the words too...just a thought. Thanks again for this work.

    ReplyDelete
    Replies
    1. That's a great idea. A friend of mine has a great site with all kinds of capabilities far beyond what I have here. You may want to check it out: https://viz.bible

      Delete
  5. dude u are amazing this must have taken forever u have a gift for this creativity

    ReplyDelete
  6. Thank you! I plan to use this information to create visuals for the most common words in order to teach my autistic stepson Biblical vocabulary!

    ReplyDelete
    Replies
    1. That's great. If you'd like the raw data, send me an email. flerlagekr@gmail.com

      Delete
  7. Is there another way to compare different author’s writing styles. I’m trying to find if there is an easy way to show that the various books had different authors or if it is all the same

    ReplyDelete
    Replies
    1. Yes, but that's much more involved than what I've done here. Take a look at Stylometry. From a technical standpoint, you'd probably need to employ machine learning.

      Delete
  8. This is actually fantastic! Thank you so much for creating it!

    ReplyDelete
  9. Curious if the simple word "in" has ever been counted in the New Testament ?

    ReplyDelete
    Replies
    1. I have that data but I removed common "stop" words from my analysis.

      Delete

Powered by Blogger.