first learning of human drivers, which then 'mutate` to CAVs, are trained to optimize routing policies with the implemented algorithm. When the training is finished, it uses raw results to compute a ...