Developing a Practical Algorithm
I developed an algorithm to optimize my daily fantasy soccer lineups. I discuss my approach to the problem and my eventual solution.
Abigail Keenan / Unsplash
By Rob Schwartz
December 1, 2020
I’ve participated in daily fantasy soccer on DraftKings for a few years. In an effort to increase my profitability, I recently began working with a statistician to develop a model that predicts player performances. Our model has been performing well, but I’ve had to painstakingly sift through the model’s output to actually select lineups; keep in mind, the statistical model merely predicts the number of fantasy points a player will earn for a given match. It does not advise on how to select from the players, even if we now have insight into their likely performances.
Before I get into any more technical details, though, I ought to explain how DraftKings soccer works.
Each match, you’re asked to choose 6 players from the 22 who are on the field. Those 6 players, over the course of the match, will accumulate points through certain actions. For example, scoring a goal earns 10 points. Saving a shot, as a goalkeeper, is worth 2 points. Your objective is to choose 6 players that will perform better than the players that other DraftKings participants choose, as payouts only go to lineups above a certain percentile of the competition. Did I mention this is gambling?
As you can imagine, some players are more likely to perform well than others. Cristiano Ronaldo, for example, will probably be the highest scorer in a given game. So why doesn’t everyone choose Ronaldo to be in their lineup? Because every player “costs” a certain amount, and he’s probably the most “expensive” player available. The price of a player generally reflects his likelihood of high performance, and Ronaldo, a reliably exceptional player, will cost an outsize portion of your budget. You have to determine whether the price of a player is worth his possible point output. That’s the name of the game: maximizing the points you can get from your lineup, given a fixed budget.
Accurately predicting how players will perform, then, is a huge aspect of success in DraftKings. If you can predict with reasonable confidence that Ronaldo will outperform his average against, say, the worst team in his league, then maybe his steep price will be worthwhile. This is the value of my statistical model: it predicts with reasonable confidence how each player will perform. But predicting performance isn’t the whole picture. You still have to choose the players.
After spending hours over the past weekends hand-selecting my lineups (as advised by the model), I decided I needed to refocus my efforts on one of the more difficult tasks I’d set before myself: optimizing the lineup.
This may not sound tricky. Why can’t you just test every lineup, right? And choose the one that gets you the most points while remaining below the spending limit? Unfortunately, checking every option isn’t really a possibility. In case you don’t believe me, check out the rice and chessboard proverb.
So I’ve known for a while that, if I want to really streamline this whole DraftKings thing and potentially automate the entire process of setting a lineup, I’ll need to tackle lineup optimization. I could probably approximate an ideal lineup by taking players with high value/price ratios, or something like that, but I want to solve this thing for real. Luckily, I studied a similar problem in a college algorithms course: the knapsack puzzle.
Consider a thief in a department store. He sneaks in without tripping the alarms, and he discovers a vast array of items before him, spanning multiple floors. But he has only one backpack to fill. The thief, of course, is planning on selling the items he takes from the store, so he wants to fill his backpack with valuable items. How does he choose which items to take? His backpack has finite volume, and each item in the store has an associated value and volume. For instance, a piece of hanging art may be quite valuable, but it’s also likely to take up the majority of space in his bag. Pebbles from the koi pond, though they may be small, are worthless.
Ideally, I think, he’d head over to the jewelry section and stuff his bag with diamond rings, which each have high value and take up very little space, so he can fit more valuable items into the bag.
The knapsack algorithm, which solves the knapsack puzzle, is meant to determine, efficiently and accurately, how to select items given a “value” and “weight” (volume in the last example; price for DraftKings) associated with each item and a cap on the total “weight.” If you’re mathematically inclined, check out this resource from a Data Structures and Algorithms course at the University of Nebraska-Lincoln. It explains the premise well.
At any rate, the standard knapsack algorithm involves a technique called dynamic programming. I promise it’s not as daunting as it sounds.
Imagine you want to count to 10 using as many vowels as are possible in your numbers, but each number “costs” the value of that number. And say you only have 10 points to spend. For instance, “eight” has two vowels, and it costs, well, 8 points.
Yeah, this would be a nuisance to count out. But we can approach it strategically. We can count up from the bottom and keep track of the highest-vowel combinations for each point threshold using a simple table.
Let’s start with one. It has 2 vowels. So the maximum value we can obtain from using 1 point is 2 vowels. Two only has 1 vowel, and since we can’t repeat numbers, our maximum value from 2 points is 1 vowel. But when we get to three, we have a choice: to choose the combination of one and two or to choose three. One and two, combined, cost 3 points, and they have 3 vowels between them. Three, on the other hand, only has 2 vowels at the same cost of 3 points. So we choose the one-two combo and jot it down. Now, whenever we need to know the maximum value of 3 points, we can simply look it up in the table, rather than calculate it again unnecessarily. We can continue in that fashion — calculating values by using past calculations — until we’ve essentially calculated every combination and arrive at the answer (7 vowels).
For that problem and other traditional knapsack problems, you can take as many items as you want; there’s no hard cap on quantity; only on weight. The DraftKings lineup optimization problem involves a limit on the number of items you can choose. There’s an additional wrinkle I haven’t addressed: the captain of your lineup. When you’re constructing a DraftKings lineup, you select one player to be your “captain”. The captain earns 1.5x points (a goal is worth 15, rather than 10), but he also costs 1.5x his normal price. Choosing the correct captain can often determine whether your lineup makes or loses money.
So there are two dimensions of this problem that are beyond the standard knapsack algorithm: a 6-player limit and the captaincy. After some digging online, I concluded that this would not be a copy-and-paste sort of algorithm, as there was no clear solution on typical forums (StackOverflow, etc.). Given the complexity of the problem, I want to warn non-technical readers: the rest of this article may not be very digestible.
I wrestled first with the capacity limit. I spent considerable time trying to tally the item count in my tracking matrix, like the one I described in the vowel-counting example. Instead of populating my matrix with simple values, though, I tried populating it with [value, num_items] tuples, thinking that I could stop adding new items once num_items reached the item limit. But that didn’t work. Any players toward the end of the list were excluded, even if they would be included in an optimal solution.
I eventually realized the solution: dynamic programming. Yeah, the technique we’ve been using this whole time.
Just like I’ve been using dynamic programming to calculate values for standard knapsack problems, I can use it to calculate the best value for each weight and item number combination, for each item limit (i.e. add a dimension). Meaning, I could start by creating a new matrix, for which each cell would have an item limit of 1. I populate that table, incorporating my prior [value, num_items] data structure, and set the limit of num_items that comprise each cell value to be 1. So if I’m considering adding to a value that already has incorporated the maximum value of items for that matrix, I won’t add it. Then, for the next matrix, I’ll increase the item limit by one, eventually creating a matrix of matrices, with one matrix for each number 1…Q, where Q = the lineup size.
And instead of simply choosing the prior iteration’s value, as one normally would when the other potential prior value already boasts the item limit, I can check against the prior matrix. The values in the prior matrix are limited to one fewer item than the current matrix’s limit, so if the sum of that value and the current player’s value exceeds the prior iteration’s value, I’ll have found extra value that would otherwise have been missed. This aspect of the algorithm — matrix jumping — is integral to the solution.
The final cell in the matrix set (matrixSet[Q-1][n][W]) will contain the maximum possible value. With that value and matrix set in hand, I can employ the standard 0-1 knapsack backtracking technique, modified to hop to prior matrices as necessary, to determine which players comprise the solution.
If you’re code-literate and want to see the algorithm, please check out the source code. The program outputs an Excel file, which contains the optimal matrix set, although you have to install the OpenPyXl package.
After tackling the limit problem, I resolved to take a slightly different approach for the issue of captaincy. I decided to reformulate the matrix set n times, and each time I would designate a different player captain (i.e. multiply his value and and weight by 1.5). I would keep track of the solution that yielded the highest point total and the captain of that matrix set. After iterating through all possible captains, I’ll have found the optimal captain and lineup, given that captain. When backtracking, I reestablish that player as captain in the data set.
Finally, the algorithm is complete.
For each potential captain, create Q matrices, and run a somewhat standard 0-1 knapsack algorithm for each matrix. Because the algorithm utilizes dynamic programming to build up to Q players, it occasionally backtracks to a prior matrix to improve on the value of that iteration/weight cell.
For the techies out there, the algorithm has a time complexity of O(nWQ), where n = number of players, W = weight limit, and Q = quantity limit. When selecting a captain, the time complexity increases to O(n2WQ), as the algorithm cycles through each player one additional time.
Still, there are variations of this problem to solve. Notably, other DraftKings soccer lineups require certain positions, such as Defender or Midfielder, to be filled. I’ll update this post once I’ve completed that variation of the algorithm, but I may not get to it for a bit.
Before I get into any more technical details, though, I ought to explain how DraftKings soccer works.
Each match, you’re asked to choose 6 players from the 22 who are on the field. Those 6 players, over the course of the match, will accumulate points through certain actions. For example, scoring a goal earns 10 points. Saving a shot, as a goalkeeper, is worth 2 points. Your objective is to choose 6 players that will perform better than the players that other DraftKings participants choose, as payouts only go to lineups above a certain percentile of the competition. Did I mention this is gambling?
As you can imagine, some players are more likely to perform well than others. Cristiano Ronaldo, for example, will probably be the highest scorer in a given game. So why doesn’t everyone choose Ronaldo to be in their lineup? Because every player “costs” a certain amount, and he’s probably the most “expensive” player available. The price of a player generally reflects his likelihood of high performance, and Ronaldo, a reliably exceptional player, will cost an outsize portion of your budget. You have to determine whether the price of a player is worth his possible point output. That’s the name of the game: maximizing the points you can get from your lineup, given a fixed budget.
Accurately predicting how players will perform, then, is a huge aspect of success in DraftKings. If you can predict with reasonable confidence that Ronaldo will outperform his average against, say, the worst team in his league, then maybe his steep price will be worthwhile. This is the value of my statistical model: it predicts with reasonable confidence how each player will perform. But predicting performance isn’t the whole picture. You still have to choose the players.
After spending hours over the past weekends hand-selecting my lineups (as advised by the model), I decided I needed to refocus my efforts on one of the more difficult tasks I’d set before myself: optimizing the lineup.
This may not sound tricky. Why can’t you just test every lineup, right? And choose the one that gets you the most points while remaining below the spending limit? Unfortunately, checking every option isn’t really a possibility. In case you don’t believe me, check out the rice and chessboard proverb.
So I’ve known for a while that, if I want to really streamline this whole DraftKings thing and potentially automate the entire process of setting a lineup, I’ll need to tackle lineup optimization. I could probably approximate an ideal lineup by taking players with high value/price ratios, or something like that, but I want to solve this thing for real. Luckily, I studied a similar problem in a college algorithms course: the knapsack puzzle.
Consider a thief in a department store. He sneaks in without tripping the alarms, and he discovers a vast array of items before him, spanning multiple floors. But he has only one backpack to fill. The thief, of course, is planning on selling the items he takes from the store, so he wants to fill his backpack with valuable items. How does he choose which items to take? His backpack has finite volume, and each item in the store has an associated value and volume. For instance, a piece of hanging art may be quite valuable, but it’s also likely to take up the majority of space in his bag. Pebbles from the koi pond, though they may be small, are worthless.
Ideally, I think, he’d head over to the jewelry section and stuff his bag with diamond rings, which each have high value and take up very little space, so he can fit more valuable items into the bag.
The knapsack algorithm, which solves the knapsack puzzle, is meant to determine, efficiently and accurately, how to select items given a “value” and “weight” (volume in the last example; price for DraftKings) associated with each item and a cap on the total “weight.” If you’re mathematically inclined, check out this resource from a Data Structures and Algorithms course at the University of Nebraska-Lincoln. It explains the premise well.
At any rate, the standard knapsack algorithm involves a technique called dynamic programming. I promise it’s not as daunting as it sounds.
Imagine you want to count to 10 using as many vowels as are possible in your numbers, but each number “costs” the value of that number. And say you only have 10 points to spend. For instance, “eight” has two vowels, and it costs, well, 8 points.
Yeah, this would be a nuisance to count out. But we can approach it strategically. We can count up from the bottom and keep track of the highest-vowel combinations for each point threshold using a simple table.
Let’s start with one. It has 2 vowels. So the maximum value we can obtain from using 1 point is 2 vowels. Two only has 1 vowel, and since we can’t repeat numbers, our maximum value from 2 points is 1 vowel. But when we get to three, we have a choice: to choose the combination of one and two or to choose three. One and two, combined, cost 3 points, and they have 3 vowels between them. Three, on the other hand, only has 2 vowels at the same cost of 3 points. So we choose the one-two combo and jot it down. Now, whenever we need to know the maximum value of 3 points, we can simply look it up in the table, rather than calculate it again unnecessarily. We can continue in that fashion — calculating values by using past calculations — until we’ve essentially calculated every combination and arrive at the answer (7 vowels).
For that problem and other traditional knapsack problems, you can take as many items as you want; there’s no hard cap on quantity; only on weight. The DraftKings lineup optimization problem involves a limit on the number of items you can choose. There’s an additional wrinkle I haven’t addressed: the captain of your lineup. When you’re constructing a DraftKings lineup, you select one player to be your “captain”. The captain earns 1.5x points (a goal is worth 15, rather than 10), but he also costs 1.5x his normal price. Choosing the correct captain can often determine whether your lineup makes or loses money.
So there are two dimensions of this problem that are beyond the standard knapsack algorithm: a 6-player limit and the captaincy. After some digging online, I concluded that this would not be a copy-and-paste sort of algorithm, as there was no clear solution on typical forums (StackOverflow, etc.). Given the complexity of the problem, I want to warn non-technical readers: the rest of this article may not be very digestible.
I wrestled first with the capacity limit. I spent considerable time trying to tally the item count in my tracking matrix, like the one I described in the vowel-counting example. Instead of populating my matrix with simple values, though, I tried populating it with [value, num_items] tuples, thinking that I could stop adding new items once num_items reached the item limit. But that didn’t work. Any players toward the end of the list were excluded, even if they would be included in an optimal solution.
I eventually realized the solution: dynamic programming. Yeah, the technique we’ve been using this whole time.
Just like I’ve been using dynamic programming to calculate values for standard knapsack problems, I can use it to calculate the best value for each weight and item number combination, for each item limit (i.e. add a dimension). Meaning, I could start by creating a new matrix, for which each cell would have an item limit of 1. I populate that table, incorporating my prior [value, num_items] data structure, and set the limit of num_items that comprise each cell value to be 1. So if I’m considering adding to a value that already has incorporated the maximum value of items for that matrix, I won’t add it. Then, for the next matrix, I’ll increase the item limit by one, eventually creating a matrix of matrices, with one matrix for each number 1…Q, where Q = the lineup size.
And instead of simply choosing the prior iteration’s value, as one normally would when the other potential prior value already boasts the item limit, I can check against the prior matrix. The values in the prior matrix are limited to one fewer item than the current matrix’s limit, so if the sum of that value and the current player’s value exceeds the prior iteration’s value, I’ll have found extra value that would otherwise have been missed. This aspect of the algorithm — matrix jumping — is integral to the solution.
The final cell in the matrix set (matrixSet[Q-1][n][W]) will contain the maximum possible value. With that value and matrix set in hand, I can employ the standard 0-1 knapsack backtracking technique, modified to hop to prior matrices as necessary, to determine which players comprise the solution.
If you’re code-literate and want to see the algorithm, please check out the source code. The program outputs an Excel file, which contains the optimal matrix set, although you have to install the OpenPyXl package.
After tackling the limit problem, I resolved to take a slightly different approach for the issue of captaincy. I decided to reformulate the matrix set n times, and each time I would designate a different player captain (i.e. multiply his value and and weight by 1.5). I would keep track of the solution that yielded the highest point total and the captain of that matrix set. After iterating through all possible captains, I’ll have found the optimal captain and lineup, given that captain. When backtracking, I reestablish that player as captain in the data set.
Finally, the algorithm is complete.
For each potential captain, create Q matrices, and run a somewhat standard 0-1 knapsack algorithm for each matrix. Because the algorithm utilizes dynamic programming to build up to Q players, it occasionally backtracks to a prior matrix to improve on the value of that iteration/weight cell.
For the techies out there, the algorithm has a time complexity of O(nWQ), where n = number of players, W = weight limit, and Q = quantity limit. When selecting a captain, the time complexity increases to O(n2WQ), as the algorithm cycles through each player one additional time.
Still, there are variations of this problem to solve. Notably, other DraftKings soccer lineups require certain positions, such as Defender or Midfielder, to be filled. I’ll update this post once I’ve completed that variation of the algorithm, but I may not get to it for a bit.
Sign in to leave a comment.