Six Degrees of Kevin Bacon (with a Graph Database)


Just about everyone from my generation (I was an 80’s kid) knows the “Six Degrees of Kevin Bacon” game. The basic concept is that every single actor can be connected to Kevin Bacon through six or fewer “hops.” Players challenge each other by calling out a name and everyone races to find a connection to the iconic actor. For example, if I said “Jennifer Lawrence”, you could explain that “Jennifer Lawrence was in The Burning Plainwith Charlize Theron, who was in Trappedwith Kevin Bacon.” This is a two hop connection, resulting in a “Bacon Number” of 2. I used to play the game quite often with my wife in the late 1990’s and early 2000’s and almost always used A Few Good Men as my primary linking movie since it connects Kevin Bacon to many other heavily-connected actors, including Tom Cruise, Demi Moore, Jack Nicholson, and Kiefer Sutherland.


According to Wikipedia, the spark that led to the game was a 1994 interview in which Kevin Bacon commented that he had worked with everyone in Hollywood or someone who has worked with them. Shortly after that, three Albright College students, Craig Fass, Brian Turtle, and Mike Ginelli created the game. I doubt of them had any idea how it would take off from there.

Analysis & Visualization
I always had fun with the Six Degrees of Kevin Bacon game and thought it might be interesting to do any analysis and/or visualization on it. I was able to find some fairly comprehensive data sets with movies and their actors, but when I started to analyze the data, I quickly ran into some problems, specifically the fact that traditional relational databases  and tools like Excel aren’t great at understanding and analyzing relationships. This is where graph databases come into play. Graph databases are a type of NoSQL database—essentially, a non-relational database, the opposite of relational databases like Oracle, SQL Server, MySQL, PostgreSQL, etc.—but are specifically designed to understand relationships between data and the nature of those relationships. Neo4j is probably the most popular graph database today. They define graph databases as follows:

We live in a connected world. There are no isolated pieces of information, but rich, connected domains all around us. Only a database that embraces relationships as a core aspect of its data model is able to store, process, and query connections efficiently. While other databases compute relationships expensively at query time, a graph database stores connections as first class citizens, readily available for any “join-like” navigation operation. Accessing those already persistent connections is an efficient, constant-time operation and allows you to quickly traverse millions of connections per second per core.

Independent of the total size of your dataset, graph databases excel at managing highly connected data and complex queries. Armed only with a pattern and a set of starting points, graph databases explore the larger neighborhood around the initial starting pointscollecting and aggregating information from millions of nodes and relationshipsleaving the billions outside the search perimeter untouched.

As an analytics professional, I was familiar with the basics of graph databases, but I had never actually worked with one. So, I decided to play with Neo4j with a goal of building a “Six Degrees of Kevin Bacon” graph database. I downloaded and installed the software—a very easy process—and set out to learn the basics of the platform. As I started to experiment with it, I found that one of the sample databases is none other than one based on the Kevin Bacon game. What luck!! The database included with the software is actually a trimmed down version, but the Neo4j developer site has a more complete database, which I began to explore.

With some relatively simple commands, I was able to very quickly analyze and visualize things like:
  • All of Kevin Bacon’s movies and his relationship to them (actor, director, etc.)
  • All connections to other people through those movies (you can specify the number of “hops” to analyze).
  • Shortest path between actors (the basis of the game).
 For example, I asked for the shortest path from Kevin Bacon to John Malkovich.



But John Malkovich is easy. I wanted to push the theory (and the database) a little further by choosing someone who might not have an obvious link to Kevin Bacon. Agnès Jaoui is a French actor who stars almost exclusively in French-language films. Could she possibly be linked to Kevin Bacon? Yes and, interestingly, it only takes three hops.


Okay, but how about actors from previous eras? Surely, Kevin Bacon cannot be so easily linked to someone who starred in films in the 20’s and 30’s. Janet Gaynor, for example, was an Academy Award winning actor who starred in films such as Sunrise (1927), 7th Heaven (1927) and A Star is Born (1937). What’s her Bacon Number?


It turns out her number is only 4, which is pretty amazing, considering her last movie was almost 60 years ago!

Six Degrees of Helen Mirren?
After thinking about this theory a bit further, I came to the conclusion that it would probably work for just about any actor who, like Bacon, has appeared in quite a few movies. So, what about “Six Degrees of Helen Mirren”? Well, let’s start again with John Malkovich…Oooh, oooh, I got this one!! Helen Mirren has a direct connection with John Malkovich as they starred in “Red” together…But, what’s the database say?


The database also shows a one-hop connection, but chooses The Hitchhiker’s Guide to the Galaxy as its link. What about Agnès Jaoui and Janet Gaynor?



Their “Mirren Numbers” are 3 and 4 respectively, just like their Bacon Numbers.

Other Graph Database Use Cases
Of course, this very simple analysis only barely scratches the surface of a Graph database’s capabilities. Want to see every movie Kevin Bacon has acted in or directed and every other actor or director related to those movies? No problem, a simple query will give you just that. But with all these connections, the result set grows very quickly. For example, here are the results when querying all actors within 3 hops of Kevin Bacon (since it’s difficult to view on this post, I’ve made the full image available here).



As you can see, the power and capabilities of graph databases are second-to-none when it comes to analyzing relationships. You can easily begin to see the value of such platforms for use in social media, but there are a number of other use cases as well. Neo4j talks about seven major use cases. Here are just a few examples:
  • Network and IT Operations – Multiple, interconnected layers of networks, applications, etc.
  • Recommendation Engines – “Connect the dots of seemingly unrelated interests and relationships to make recommendations that balance fresh with familiar.”
  • Fraud Detection – Analysis of relationships to help detect fraud rings or other scams.
  • Social Networks – Perhaps the most obvious use case is to understand how people are related to one another within social networks such as Facebook. LinkedIn uses this type of analysis when it shows you “How You’re Connected”.
So, if you encounter such a use case, be sure to check out a graph database. It might just prove to be your best option.

SixDegrees.org
After over a decade of dealing with “Six Degrees of Kevin Bacon”, Mr. Bacon decided to parlay the game into something that could have a positive impact on the world. Thus, in 2007, SixDegrees.org was born. As the website states, “SixDegrees.org is social networking with a social conscience,” allowing people and celebrities to connect to and help with a variety of different causes. For more information or to get involved, check out http://www.sixdegrees.org/.

Header Image from The Wisdom Daily

Ken Flerlage, April 2, 2017
 

No comments:

Powered by Blogger.