What I ended up forging was a matrix of champion matchup win-rates based on every professional game from 2018-19 across all major regions and a few minor regions. Professional League of Legends players love nothing more than to troll their opponents with riot-inciting hovers, pick off-meta Champions that they’ve only practiced on their 95th alt in SoloQ, and of course, the historic pocket picks that come back after months, and sometimes years, of dormancy. Before we get neck-deep in data, let’s take a step back and look at the bigger picture.

From the sound of it, we can utilize one of Machine Learning's most well-loved binary classifiers, the Logistic Regression. Arguably the largest factor in Pick/Ban, and the one I've decided to base this project on, is going to be Matchups, or how any given Champion plays into and against team compositions. How do you quantify a matchup? Not too bad for a quick day-long challenge! After correcting for the accidental oversight when creating the win-rate matrix, I ran just the Win/Loss Result prediction again and arrived at what I think is still a pretty interestingly high accuracy rating of 73.52%.
I told you I felt like my intuition behind the math was a solid B+. All that's left to do is fit the model, and test it out! To my surprise, on the blue sum and red sum features alone, I had an 84.9% Accuracy Rating for predicting the binary outcome of a match! After I had my 5 scores per-team, I added those together to get the blue team sum and red team sum features. I downloaded all the match data from 2018-19, and through some creative data engineering, I created our second data set which consists of the following features: first blood, first tower, first baron, and the result (0-1 scaled floats). My intuition when deciding on this method was that given a score for each Champion based on the 5 Champions it's opposing will give an accurate representation of not only the positional matchup, but also of the team play abilities towards the mid-late game.
One thing you don't have, at least not yet, is a built-in computer in your brain that can run through thousands of epochs of training data and give you more than those recency-based intuitions. What we're doing is determining the outcome of a game based only on factors from the pre-game. I calculated the win-rate matrix with the entirety of my data, not just the test data. These single numbers are what the Logistic Regression uses to predict the outcome of a match. Now that we have our data set, let's quantify the matchups! In fact, around that same time, I learned just how much the Pick/Ban phase can alter the course of a match, which leads me to our topic: Can you predict the outcome of a match solely based on data from the Pick/Ban phase?

Thank you to all the Redditors who pointed out an oversight on my part. There's a wealth of new ways to look at the game of League of Legends that we've only begun to discover. Having only been in the scene since the start of 2018, I didn't catch on to these occurrences until Spring 2019, a year after I started with Team Liquid's League of Legends team. My aforementioned data sets can be found here (Champion Win-rate Matrix), and here (Match Data). Thanks to SciKit Learn, I didn't have to make my own Logistic Regression class (although I did earlier in the day for fun), and I was able to implement the algorithm with a few simple lines of code. That being said, I will be putting this data up for download alongside the win-rate matrix! With this matrix, we can now start quantifying team compositions as they relate to the opposing team. Whether it's a comment here or an email, I'd love to hear from you! About 4-5 hours length for one playthrough, more if you play it again to test other paths. While there is a certain level of entropy that we could, and eventually should, account for, today we're going to be focusing on one of the more obvious features: win-rates! What are we doing here, and what information do we have readily available that we can use after Pick/Ban and before the kind voice of the announcer says "Welcome to Summoner's Rift"? We're going to have to be creative data engineers and to make the most of our less-than-ideal situation. What information, or features, do we have access to when it comes to pre-game data? There is a lot that we know before we load into Summoner's Rift, and maybe one day with Elon's Nural Link, we can process this kind of data without recency and confirmation biases, but until that day, I urge teams to look into building out their Data Science and Analytics programs. For some time now I've been procrastinating and it's time I finally create this data set. When you're a professional League of Legends analyst, you have an intuition about these data points, like which side is better to be on based on whether you want to counter-pick position x or y against your opponent, or how your Jungle-Mid synergy works in comparison to the opposing team's. I know it's not 100% sound logic, but I feel like I'm floating around a B+ with it. There are two possible outcomes of a game: winning or losing. I decided to use those same features to produce predictions for our other target features, such as First Tower, First Blood, and First Baron. So…not much, but that's not to say we don't have some meaningful information already at our fingertips! The process of condensing 12 rows of data per game down to 1 with just the features I wanted was no easy feat, and I must admit this is what took the majority of my time on this project. How many times have you been watching a hyped rivalry match in the LEC and out of nowhere, during Pick-and-Ban of all times, the crowd starts going off as if a game-changing gank was …

