提问者:小点点

查找列(player_name)中3的所有组合,由另一列分组(team_namematch_id)并计算每个组合的实例


我一直在通过玩不同的运动统计数据来教自己一些R,但我遇到了困难。

match_id    player_name player_team points
Match1  Player 1    Team 1  20
Match1  Player 2    Team 1  23
Match1  Player 3    Team 1  24
Match1  Player 4    Team 2  26
Match1  Player 5    Team 2  21
Match1  Player 6    Team 2  22
Match1  Player 7    Team 2  43
Match1  Player 8    Team 2  38
Match2  Player 9    Team 3  24
Match2  Player 10   Team 3  29
Match2  Player 11   Team 3  23
Match2  Player 12   Team 3  22
Match2  Player 13   Team 4  20
Match2  Player 14   Team 4  32
Match3  Player 15   Team 5  24
Match3  Player 16   Team 5  27
Match3  Player 17   Team 5  23
Match3  Player 18   Team 5  20
Match3  Player 19   Team 5  23

数据贯穿了整个赛季,所以球队和球员会随着时间的推移而重复。我试图利用上述方法,找到同一支球队的3名不同球员的所有组合,他们在一场比赛中获得20分或更多(分数已经过滤到只包括20分),然后找到每个组合出现在多少场比赛中,以便告诉我同一支球队的哪一组3名球员在一起比赛时经常得分20分。

由于不同球队的一些球员有相同的名字,我使用突变来结合player_team和player_name以及player_team和match_id,只是因为一些尝试最终结合了来自不同球队的球员。

我能得到的最接近的是使用下面的代码,但它只适用于2的组合。

data <- players %>%
  filter(disposals >= 20)

data <- data %>%
  select(match_id, player_name, player_team)

data <- data %>%
  mutate(match_id = paste(player_team, match_id, sep = "_"))%>%
  mutate(player_name = paste(player_team, player_name, sep = "_"))

data <- data %>%
  select(match_id, player_name)

dataout <- get.data.frame(
  graph_from_adjacency_matrix(
    crossprod(table(data)),
    mode = "directed",
    weighted = TRUE,
    diag = FALSE,
  )
)

这给了我下面的信息(权重是基于整个数据集的事件,而不是上面的例子,到目前为止每个队都打了3场比赛)

请注意,组合不会在所有可能的顺序中重复(即认识到团队1_Player1团队1_Player2与团队1_Player2团队1_Player1相同)

有没有其他解决方案可以让我包括三名(或更多)球员,而不仅仅是两名?


共1个答案

匿名用户

您可以使用函数compn(m=3)来获取所有可能的三元组:

library(tidyverse)

data <- tribble(
  ~match_id, ~player_name, ~team_name, ~points,
   "Match1",           1L,         1L,     20L,
   "Match1",           2L,         1L,     23L,
   "Match1",           3L,         1L,     24L,
   "Match1",           4L,         2L,     26L,
   "Match1",           5L,         2L,     21L,
   "Match1",           6L,         2L,     22L,
   "Match1",           7L,         2L,     43L,
   "Match1",           8L,         2L,     38L,
   "Match2",           9L,         3L,     24L,
   "Match2",          10L,         3L,     29L,
   "Match2",          11L,         3L,     23L,
   "Match2",          12L,         3L,     22L,
   "Match2",          13L,         4L,     20L,
   "Match2",          14L,         4L,     32L,
   "Match3",          15L,         5L,     24L,
   "Match3",          16L,         5L,     27L,
   "Match3",          17L,         5L,     23L,
   "Match3",          18L,         5L,     20L,
   "Match3",          19L,         5L,     23L
  )

combinations_data <-
  data %>%
  filter(points >= 20) %>%
  nest(-c(team_name, match_id)) %>%
  mutate(
    combinations = data %>% map(possibly(~ {
      .x$player_name %>% unique() %>% combn(3)
    }, NA))
  )
#> Warning: All elements of `...` must be named.
#> Did you want `data = -c(team_name, match_id)`?

combinations_data %>%
  filter(match_id == "Match1" & team_name == 2) %>%
  pull(combinations) %>%
  first()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,]    4    4    4    4    4    4    5    5    5     6
#> [2,]    5    5    5    6    6    7    6    6    7     7
#> [3,]    6    7    8    7    8    8    7    8    8     8

由reprex包(v2.0.0)于2022-04-08创建

在第一场比赛中,第二队有10个独特的球员组合,得分都高于20分。