Zepfanman’s mega spreadsheet of movie ratings

Some people like to write reviews on films. I prefer to crunch movie numbers…. I will update this post from time to time with the latest Google Doc mega spreadsheet of movie ratings. Read on (below the spreadsheet) for an explanation. Bookmark this page: zepfanman.com/mymovies

Movies spreadsheet

Last Google upload: 2015 July 22 (3,488 entries). Formats: Web-based, comment-able Google Docs file (embedded below). I keep the frequently-updated XLSX on Dropbox here for download.



It’s best to download the XLSX file to browse the spreadsheet more easily, but here is a brief explanation of some of the column headers. I’ve attached comments to some of the header cells, too.

  • My: My IMDb vote (1 to 10)
  • Calculated: A recent method I’ve adopted for rating movies based on performance, cinematography, script, plot, and mood (this one counted twice) to give an average score out of 100.[1]
  • Source: The medium I’ve used—or want to use—to watch the film (in theater, Blu-ray, Netflix, etc.)
  • AVG rating: An average of the IMDb, Rotten Tomatoes, and Metacritic scores.
  • Bechdel: Assigned a 1, 2, or 3 according to the film’s Bechdel Test for Women in Movies.
  • LGBTQ: Scale of relevance to LGBTQ issues. Similar to Bechdel, there is now a Vito Russo Test.
  • LFPL: Louisville library system DVD availability. The next column shows the Wild & Woolly video store availability.
  • MIDZ: A unique identifying number for each film in my spreadsheet.

Explanation

Over the years, I’ve used various methods[2] to track and rate the films I’ve seen and want to see. Likely the only thing that will remain consistent are my IMDb lists (which I’ve been using since 2000), but I’m going to give this more detailed method a go.

I got to thinking about doing this when my friend Priscilla expressed interest in watching all of Lars von Trier’s films (which I posted about in April). We then did Miyazaki’s films. Next, I thought it would be good to dig a little further for more obscure films by directors in marginalized groups (women, LGBT, foreign, etc.). Clearly, a spreadsheet or database is the best way to organize these kinds of films. I started by using the handy IMDb list export (to CSV) on my “Seen” and “WantToSee” lists. With some tweaking, I was able to come up with a single file mega spreadsheet of “Zepfanman movies”. It will be a challenge to keep it updated, but this makes it easy for me to sort through and filter to find the movies I’m looking for.

Here are a couple of interesting stats off the top of my head:

  • I’ve seen 229 of the 1001 Movies You Must See Before You Die (actually about 1056 after 10 editions of the book). And NO, I do not plan on seeing all 1001!
  • I don’t have an exact count, but it looks like about 30 movies of the 1001 are directed by women. I’m pretty sure NONE of the IMDb Top 250 are directed by women. This is disturbing.

1 The inspiration for my rating system comes from Jack Lessard at Filmsquish.com; see his categories in blue on that page for details. He also hosts the 1001 Movies You Must See Before You Die Blog Club (notice my “1001 Must See…” column). Another member of the club, Chip Lary, posts lots of Lists from Chip similar to my spreadsheet, but separated by topic.

2 Here are a few notable movie-list posts on Zepfanman.com:

  • eric

    Thank you for this! Using it in a statistics lab at my university if I have permission to do so :) no data will be shared or published

    • Feel free to use this however you like. What kind of statistics are you running?

  • HappyFeet

    Nice list. Great job. I’m a newbie to big data but big fan of movies. I’ve been looking for a way to extract all the RT/MC/IMDB ratings data. Your additional information is quite astonishing. Never considered the additional characteristics.
    Just wondering, for the Rotten Tomatoes rating, did you use Critic or Audience? Have you considered importing both?
    Your movie selection, is it a pull of movies for a given period or did you select those movies and then pull the data?
    Now i just need to figure out how to use API (i am new to programming but an excel pro). If you posted a blog on something like this, i’m sure it would be a huge hit. A how-to for dummies (like me)… :)

    • Thank you for your kind words! Sorry for the late reply, but I want to make sure I take the time to give you a proper response. Not sure it would be worth the effort to do a whole blog post since the answer to most of your questions is that there’s no way to exactly extract the data from these various sources. Stay tuned and I’ll give you a more detailed response. Have you started building your own spreadsheet yet?

  • JoeBlowYo

    Please man, update to 2015, please please…. Guess I’ll get to work where you left off. Got to add about a hundred movies or so

    • The films from the 2015 edition are already in there if you download the latest xlsx file. Is that what you mean by “update to 2015”? I have been meaning to do a roundup of the movies I’ve watched this past year since it’s been two years now since I started the project.

      • JoeBlowYo

        You’re the man.

        • No, I’ve found Excel to be the easiest to customize and it’s more universal. Have you used Ant or MBase? And you’re welcome to use my spreadsheet however you like. Please send me a link to Drive – I don’t know what that is. tech@zepfanman:disqus.com

      • JoeBlowYo

        Hey by the way I updated all of your IMDb view counts and ratings since a lot of them have changed, some significantly so. I deleted a few columns I didn’t care about and added a lot of movies but if memory serves there is a number sort option so transferring the new numbers to your spreadsheet should be easy if you wanted to do so and if you haven’t been updating films view count etc. I have not checked since most recent DL.

        I have a completely different spreadsheet for about 2000 films I have seen or feel are top 2k if you’d like to check it out, it’s in .xlsx format and has a similar nature to yours with a double multiplier for my vote. Though MetaCritic is dumping on some of my favorite movies so when I sort from “high to low” on “Avg” rating some of the movies that come up are definitely not my favorites. I think I might need to add a condition for entries with under 10,000 IMDb votes as well, as some of those could ‘possibly’ be gamed up. Maybe Ill add a 3x multiplier to my personal vote as I only vote 1-100 overall I don’t account for categories like cinematography, soundtrack, etc like you do, you certainly add much more detail. I’m anxious to see if it’s been filled out more!

        I hope you don’t mind me using your list I just use it for personal movie selection and used it to base my own spreadsheet off of, no copy paste. I use Drive. I’ll send a link if you want to check it out. I have to put a little more time into getting all the RT and MC ratings into the entries though.

  • Ruchir Mathur

    Why dont i see Television series Freinds in the Excel?

    • I have a few shows in there, but these are mainly for movies.

  • Phillip McSween

    Amazing list. 2 Things: 1. Do you plan on updating this list anytime soon with the rest of 2015 as well as 2016? 2. I have a similar list of my own that includes all Roger Ebert movies that got three stars or higher. If you would like to get ahold of this list, I would love to share.

    • Thanks for the feedback. I need to make it more clear that the xlsx file on Dropbox gets updated several times a month. It doesn’t have all the blockbusters, but if you sort by MIDZ or Modified, you can see more recent entries, including the sneak peek of 2015 1001 Movies that will be officially published in October-ish. Please email tech@zepfanman.com if you’d like to send that Ebert list! For the record, Chip Lary also has an Ebert spreadsheet: https://listsfromchip.blogspot.com/2012/05/critics-lists.html

      • Phillip McSween

        Thanks for explaining about the two lists. I default to Google Sheets since that is what I use primarily. As for my Roger Ebert list, I will share it with you once I get caught up to 2016. I am still in 2012 now so I have about a week’s worth of data updates.