This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.

这项研究探讨了利用强化学习在S&P 500指数上进行交易的可行性，并采用了Value Iteration (VI)、State-action-reward-state-action (SARSA)的在线策略和Q-Learning的离线策略进行实验。该研究使用包含2000年至2023年多年的股市数据集进行训练和测试，并给出了包括COVID-19流行年份和排除COVID-19年份两种不同时间段的实验结果和发现。实验结果表明，在训练数据集中包含COVID-19时期的市场数据可以比基准策略获得更好的性能。在测试中，在线策略方法（VI和SARSA）胜过Q-Learning，突显了偏差-方差权衡和简单政策的泛化能力。然而，需要注意的是，Q-Learning的性能可能会因未来市场情况的稳定性而有所不同。未来的工作包括在测试和交易不同的个股时尝试更新的Q-Learning策略，并探索替代经济指标用于训练模型。

多元投资组合交易强化学习技术评估