In November 2022, the artificial intelligence research lab, Open AI, released its uber-popular ChatGPT AI application, which took the world by storm. The technology which drives the ChatGPT technology is the large language model (LLM) called GPT-3 (or more accurately GPT-3.5 – for a description of what a LLM is and our initial take on this technology and how it relates to sport check out our two-part series here (Part 1 & Part 2). A couple of weeks ago, OpenAI released their new large language model (LLM), GPT-4. According to OpenAI, the new model can do more things and is more accurate than the previous LLM. But what really blew people’s hair back was the introduction of the image understand capability, especially when the Co-Founder of OpenAI uploaded a drawing of a design for a joke website [LINK]. Following suit, a few days later, Google released their chatbot “Bard” in research mode. There are a lot of good overview articles which give you a sense of what both these recent releases can do, but in this article we discuss what implications the GPT-4 model has on sport. The thoughts shared here are a summary of what I have presented in the last month at the MIT Sloan (summary here) conference and our Opta Forum.
Has GPT-4 Solved the “Hallucination of Facts” Problem in Sport?
In our previous articles, we highlighted the key problems when using ChatGPT in sport: 1) it hallucinates facts, and 2) the cutoff on the dataset it is trained on was September 2021. Let’s discuss the first issue. As already mentioned, GPT-4 is supposed to be more accurate – so let’s look at the example we used in the previous article (as a reminder, we asked how many tries did Cheslin Kolbe score in the 2019 Rugby World Cup? The answer it gave was 2 tries (one vs NZ and the other against Japan (which was wrong, he scored 3 tries – 2 vs Italy in the group stages and 1 vs England in the final).
As you can see from the above, when the chatbot is given the same question, it is a lot more constrained in it’s response. As such, the response is correct, as he did score 3 tries in the Rugby World Cup, but it lacks the level of detail that was provided in the previous answer. When asking for more detail, this is what we got:
From the results, it answered some of the questions right (2 tries vs Italy), but it got the other part wrong (he didn’t score against Canada, he scored a try in the final against England). So with respect to GPT-4 solving the hallucination problem – it is still an issue and is likely to be the case for a long-time due to the autoregressive nature of these artificial intelligence models (i.e., the model predicts the next word in a sequence given the previous word and is not tied to any knowledge of reality).
In a presentation last week at NYU, one of the most prominent figures in AI, Yann LeCunn, highlighted this issue and said that the current approach to LLMs is doomed to fail as they will never solve this hallucination problem. These models need to be tied to reality in some way (see slide deck here), which aligns with what we said in the previous articles, that we need to ground the outputs to the real world: it is vital that the source of truth in sport comes from trusted sports data providers like Stats Perform. Such is the concern about this problem and the potential for harm and misinformation – there was an open letter from over 1000 tech leaders to pause development on these LLMs as they present “profound risks to society and humanity” [LINK]. More on that at the end.
What About Having an Up-to-Date Dataset? Has Anything Changed on That Front?
Yes and no. While the hallucination problem will persist, as mentioned, OpenAI last week released a slew of plugins for ChatGPT that will enable the chatbot to interact with 3rd part APIs [LINK]. With these plugin advancements, it is possible to incorporate the ChatGPT functionality into the existing code stack, enabling developers to retrieve real-time information calls. Watch this space.
Additionally, this raises the question on how artificial intelligence technology such as ChatGPT can be used in a live setting. Generally, most of the questions people have are based on a static knowledge base that doesn’t change. However, sport is different, especially for a live game. Obviously, you could ask a simple question, such as “Who is winning?” or “Who scored?” but those answers are currently already available via the “pre-ChatGPT” chatbots. To get deeper insight, the ability to query things live is difficult because the time it takes to type a query on a specific element of the game, something else may have already occurred, which diminishes the value or interestingness of that query. This is why automatic querying or “highlight detection” is required, which is triggered by an interesting event. We have this capability in our PressBoxLive platform, where when something interesting such as a goal occurs, we can automatically generate interesting insights on that immediately. The beauty of this is that this scales. For example, in a recent game in the second division of the Bundesliga (2. Bundesliga) between Arminia Bielefeld vs Nurnberg, when a goal is scored, we can generate an insight for that game like we would for the top division, which underscores the value of AI in sports – the ability to scale (and do it live). This doesn’t just have to be about text insights either; we can generate automatic overlays on video, which can give additional color to performance, which we have been doing recently in tennis.
The Ability to Input Images and Drawings Was Cool – Can We Put in a Sports Image or Drawing into ChatGPT and get an Output?
As mentioned in the previous article, at Stats Perform, we pioneered the interactive sports analytics domain, where you could draw a play and retrieve plays similar, or do analytics on the play [LINK], or even predict where players should have been in a given situation using our Ghosting work [LINK]. In terms of the GPT-4 demo, however, this works slightly differently. The text-to-image or the image-to-text transformer network learns from an enormous amount of text-to-image pairs (i.e., for each image, there is a textual description like a caption). For this training set, the transformer learns the correlation between textual descriptions and parts of the image. Due to the emergent behavior of these very large neural networks, it can do some reasoning about certain elements of the image (e.g., why is a certain image funny?).
For sport, we are still expanding the vocabulary of events, which can be thought of a caption of the play. It is not enough to just have the event stream; using our metrics and machine learning models, we can detect the quality of an action (such as a shot with xG, or a pass using our possession value in soccer). Expanding the vocabulary is one thing, but scaling out the amount of data is another thing that needs to be done.
This is why the Opta Vision project is so important, as it does both things: a) expands the event vocabulary in soccer and b) scales the number of games that we have this rich vocabulary for.
Additionally, having paired the expanded event stream (which can be seen as the captioning) to the tracking data, we can add to our capabilities of doing reasoning around each play through this paired dataset (i.e., expanded vocabulary and tracking data). We are currently at the tipping point, which will lead to the next level of sports analytics.
LLMs: Trust, Disruption and Impact on Society
As mentioned above, due to the sheer impressiveness and rapid improvement of the LLM technology, many prominent people worldwide are becoming extremely wary of the impact that such technology will have on society and the potential for harm– and as such, have called for a pause on this technology. This is an extremely important topic, but with regard to sport, this is my personal view on how I currently view this issue:
- As an assistive tool, LLM-based AI chatbots are a game-changer in terms of learning aid and efficiency. Like all technology, you have to know what it can and can’t do. In my view, these AI tools are the ultimate assistive tool and can help novices get to a higher level of efficiency which was highlighted in a recent study by researchers at MIT [LINK]. Another study showed that developers who used GitHub Copilot completed tasks 55.8% quicker than the control group [LINK]. Where the knowledge base is static, reliable and up-to-date, being able to question and drill down on specific bits of knowledge is amazing (but the key thing is being able to ask the right question and understand if the answer relates to the problem you want to solve).
- However, there have to be guardrails in place. If the data is trusted and reliable (like the data we have at Stats Perform), there is little harm in the utilisation of such data for such purposes – but it has to ensure that the facts are preserved (and not hallucinated). In areas outside of sport where a mix of fact and fiction exists in the ether – this is more problematic as misinformation can be propagated, so there need to be checks in place to guard against such things happening. Additionally, protecting private and sensitive information is important, as once this information is within the LLMs, it is almost impossible to stop it from being propagated.
- In sport, there are these natural guardrails in place. For example, at Stats Perform, we are the keeper of the public record of match data, but we do not have private information such as players’ medical, psychological, training and contract information (and would never expect to). This natural guardrail gives protection. It also gives the opportunity for players/clubs/clients to use our match-level analysis/models as one of the inputs, which they can then merge with the private data they have.
- Also, humans need to be the ultimate decision-maker (and they need to be able to know when the technology errs and when it can be trusted). You could think of this as a pilot on a plane. Over the last 100 years of commercial air travel, the number of pilots has not changed. However, the technology on the plane has improved dramatically, which has improved decision-making, safety and efficiency of plane travel. Despite this, the number of pilots has still remained the same. Essentially what the AI technology is doing in our area of sport is creating assistive technology to help domain experts make the best decisions, as well as being as efficient as possible.
- Also, the world does not solely exist in natural language or sports data. There are many things we still can not digitise (and probably won’t due to public vs private data mentioned above), such as whether a player had a good night’s sleep the night before, whether he/she fought with their significant other, or kids being sick/upset, or the interactions between player personalities. The decision-makers are able to digitise that themselves through the many sensors inputs they have and, as such, will have the most relevant information at hand to make the best decision. Our job is to give them the best inputs we have on the data available to us.
The artificial intelligence space changes daily, and we will do our best to keep everyone updated on progress in this field and how it relates to us. In the next article, we will also do a deeper dive into the Opta Vision project and how it draws parallels to what is happening in the autonomous vehicle domain. Keep posted on that.
Dr. Patrick Lucey is the Chief Scientist at sports data giants Stats Perform, leading the AI team with the goal of maximising the value of the company’s deep treasure troves of sports data. Patrick has studied and worked in the AI field for the past 20 years, holding research positions at Disney Research and the Robotics Institute at Carnegie Mellon University as well as spending time at IBM’s T.J. Watson Research Center while pursuing his Ph.D. Patrick hails from Australia, where he received his BEng(EE) from the University of Southern Queensland and his doctorate from Queensland University of Technology. He has authored more than 100 peer-reviewed papers and has been a co-author on papers in the MIT Sloan Best Research Paper Track, winning best paper in 2016 and runner-up in 2017 and 2018.