我有一个来自4支不同球队的20名球员的数据框架(每队5名球员),每个球员都从幻想草案中分配了一份薪水。我希望能够创建薪水等于或小于10000的8名球员的所有组合
这是我的数据帧的样子:
Team Player K D A LH Points Salary PPS
4 ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692
2 ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913
1 ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560
3 ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061
5 ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861
11 CL Swiftending 6.0 5.8 7.8 360.5 22.285 1606 1.3876
13 CL Pajkatt 13.3 7.5 9.3 326.8 37.248 1489 2.5015
15 CL SexyBamboe 6.3 8.5 9.3 168.0 20.660 1256 1.6449
14 CL EGM 2.8 6.0 13.5 78.8 21.988 989 2.2233
12 CL Saksa 2.5 6.5 10.5 59.8 15.898 967 1.6441
51 DBEARS Ace 7.0 3.4 6.9 195.6 23.596 1578 1.4953
31 DBEARS HesteJoe 5.4 5.4 6.1 176.7 16.927 1512 1.1195
61 DBEARS Miggel 2.8 6.8 11.0 141.8 17.818 1212 1.4701
21 DBEARS Noia 3.0 6.0 8.0 36.1 13.161 970 1.3568
41 DBEARS Ryze 2.7 4.7 6.7 74.6 12.166 937 1.2984
8 GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935
9 GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305
10 GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071
7 GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251
6 GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930
我遵循这篇文章中描述的一般概念:我想从R数据帧中的一列生成5个名称的组合,其在不同列中的值加起来等于或小于某个数字
调整代码以满足我的需求。这是我到目前为止所拥有的:
## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn, 8))
## convert the names to a string,
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)
使用上面的代码,我能够生成所有可能的8名球员阵容,然后通过各种标准(总工资和积分)对其进行子集,但是当涉及到排除来自同一球队的3名以上球员的阵容时,我很挣扎。
我想阵容需要被排除在newdf之外,但是我真的不知道从哪里开始做。
以下是dput结果:
structure(list(Team = c("ATN", "ATN", "ATN", "ATN", "ATN", "CL",
"CL", "CL", "CL", "CL", "DBEARS", "DBEARS", "DBEARS", "DBEARS",
"DBEARS", "GB", "GB", "GB", "GB", "GB"), Player = structure(c(2L,
5L, 4L, 1L, 3L, 15L, 12L, 14L, 11L, 13L, 16L, 18L, 19L, 20L,
21L, 6L, 7L, 10L, 8L, 9L), .Label = c("eL lisasH 2", "ExoticDeer",
"Nisha", "sasu", "Supreme", "Keyser Soze", "Madara", "MNT", "SKANKS224",
"SkyLark", "EGM", "Pajkatt", "Saksa", "SexyBamboe", "Swiftending",
"Ace", "DruidzOzoneShoc", "HesteJoe", "Miggel", "Noia", "Ryze"
), class = "factor"), K = c(6.1, 6.8, 3.6, 2.6, 2.7, 6, 13.3,
6.3, 2.8, 2.5, 7, 5.4, 2.8, 3, 2.7, 6, 5.4, 1.8, 2.3, 1.4), D = c(3.3,
5.3, 6.4, 6.1, 5.6, 5.8, 7.5, 8.5, 6, 6.5, 3.4, 5.4, 6.8, 6,
4.7, 5, 5.3, 5.3, 5.9, 7.6), A = c(6.4, 7.1, 11, 7.9, 7.5, 7.8,
9.3, 9.3, 13.5, 10.5, 6.9, 6.1, 11, 8, 6.7, 5.6, 6.6, 7, 6.1,
7.4), LH = c(306.9, 229.4, 95.7, 29.7, 48.2, 360.5, 326.8, 168,
78.8, 59.8, 195.6, 176.7, 141.8, 36.1, 74.6, 316, 334.5, 71.8,
85.6, 52.5), Points = c(22.209, 21.954, 19.357, 12.037, 12.282,
22.285, 37.248, 20.66, 21.988, 15.898, 23.596, 16.927, 17.818,
13.161, 12.166, 19.12, 19.405, 10.218, 9.316, 7.565), Salary = c(1622,
1578, 1244, 998, 955, 1606, 1489, 1256, 989, 967, 1578, 1512,
1212, 970, 937, 1602, 1577, 1266, 1007, 954), PPS = c(1.3692,
1.3913, 1.556, 1.2061, 1.2861, 1.3876, 2.5015, 1.6449, 2.2233,
1.6441, 1.4953, 1.1195, 1.4701, 1.3568, 1.2984, 1.1935, 1.2305,
0.8071, 0.9251, 0.793)), .Names = c("Team", "Player", "K", "D",
"A", "LH", "Points", "Salary", "PPS"), class = "data.frame", row.names = c("4",
"2", "1", "3", "5", "11", "13", "15", "14", "12", "51", "31",
"61", "21", "41", "8", "9", "10", "7", "6"))
我认为最好用长形式构建这个:
构建团队
library(data.table)
setDT(FantasyPlayers)
xx <- combn(as.character(FantasyPlayers$Player), 8)
mxx <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))
head(mxx,10)
# jersey_no team_no Player
# 1: 1 1 ExoticDeer
# 2: 2 1 Supreme
# 3: 3 1 sasu
# 4: 4 1 eL lisasH 2
# 5: 5 1 Nisha
# 6: 6 1 Swiftending
# 7: 7 1 Pajkatt
# 8: 8 1 SexyBamboe
# 9: 1 2 ExoticDeer
# 10: 2 2 Supreme
一组8名玩家共享一个team_no
,并由他们的jersey_no
索引。查看? melt.array
看看这是如何工作的。setDT
只是将生成的data.frame转换为data.table,以便更轻松地合并。
合并以恢复Player
属性
FantasyTeams <- FantasyPlayers[mxx, on="Player"]
# Team Player K D A LH Points Salary PPS jersey_no team_no
# 1: ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692 1 1
# 2: ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913 2 1
# 3: ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560 3 1
# 4: ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061 4 1
# 5: ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861 5 1
# ---
# 1007756: GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935 4 125970
# 1007757: GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305 5 125970
# 1007758: GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071 6 125970
# 1007759: GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251 7 125970
# 1007760: GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930 8 125970
默认情况下,只打印data. table的前几行和最后几行。要检查整个内容,请尝试?View
或查看?print.data.table
的参数。
筛选到一组具有选定功能的团队
要过滤到那些team_no
来自同一个团队
的玩家不超过三名…
my_teams <- FantasyTeams[, max(table(Team)) <= 3, by=team_no][V1==TRUE]$team_no
V1
是分配给构造变量max(table(Team))的默认名称
my_new_teams <-
FantasyTeams[team_no %in% my_teams, sum(Salary) < 10000, by=team_no][V1==TRUE]$team_no
要节省一些按键和微秒,请将(V1)
替换为V1==TRUE
。这是惯用的方式。
从一组团队中恢复名册
要获取与每个团队关联的花名册,请使用mxx
加入/合并
mxx[.(team_no = my_new_teams), on="team_no"]
如果你想在一行中列出玩家,如OP中所示:
mxx[.(team_no = my_new_teams), .(roster = toString(Player)), on="team_no", by=.EACHI]
如果您想要每个团队的汇总统计信息,则需要使用FantasyTeams
加入:
FantasyTeams[.(team_no = my_new_teams), .(
roster = toString(Player),
tot_salary = sum(Salary),
tot_points = sum(Points)
), on="team_no", by=.EACHI]
# team_no roster tot_salary tot_points
# 1: 3716 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, Ryze 9913 149.018
# 2: 3720 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, MNT 9983 146.168
# 3: 3721 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, SKANKS224 9930 144.417
# 4: 3725 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, MNT 9950 145.173
# 5: 3726 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, SKANKS224 9897 143.422
# ---
# 40202: 125663 EGM, Saksa, Miggel, Noia, Ryze, Keyser Soze, MNT, SKANKS224 8638 117.032
# 40203: 125664 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, MNT 8925 119.970
# 40204: 125665 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, SKANKS224 8872 118.219
# 40205: 125666 EGM, Saksa, Miggel, Noia, Ryze, Madara, MNT, SKANKS224 8613 117.317
# 40206: 125667 EGM, Saksa, Miggel, Noia, Ryze, SkyLark, MNT, SKANKS224 8302 108.130
要理解by=. EACHI
在做什么,需要一点背景知识。这里的合并语法是DT[i,j,on=有,by=.EACHI]
。
j
和by
被省略,它只是进行合并,就像FantasyTeams
的构造一样。,但包含了j
,则在合并后计算j
。
by=. EACHI
,则j
为i
中的每个值单独计算。
这里有一个方法:
splt.names <- strsplit(as.character(newdf$Player), ", ")
indices <- lapply(splt.names, function(x) match(x, FantasyPlayers$Player))
exclude <- lapply(indices, function(x) any(table(FantasyPlayers$Team[x]) > 3))
newdf2 <- newdf[!unlist(exclude), ]
首先用逗号分割Player
列。然后将玩家名称与Fantova Players
玩家名称列匹配。使用这些索引
,我们可以完成主要工作,即any(table(FantasyPlayers$Team[x])