Reddit is one of the most used social media platforms where users put out their opinions.
Reddit is now enforcing its Robots Exclusion Protocol (robots.txt file) to shield its content from automated web bots. They will also go for rate limiting and blocking unidentified bots and crawlers.
This move from the social media platform is aimed at preventing the new AI companies from using Reddit’s content for free to train their models. In simple words, they don’t want companies using Reddit content without their permission.
The company told TechCrunch that “bots and crawlers will be rate-limited or blocked if they don’t abide by Reddit’s Public Content Policy and don’t have an agreement with the platform.”
The company also says that this change in its policy shouldn’t affect the majority of its users or good faith actors, like researchers and organizations, such as the Internet Archive. This update is specifically aimed at AI companies, blocking them from using Reddit content to train their LLMs. However, it’s entirely possible that the AI crawler could ignore the robots.txt file and go for it anyway.
The changes to its policy won’t affect companies that it has an agreement with. If you don’t know, Reddit has a deal with Google to allow them to train their AI models on the social platform’s content which is worth $60 million.
With this change, Reddit is saying that you can use Reddit content but not for free. So, any AI company that wants to train its models on Reddit content, would have to pay like Google.
Here is what Reddit mentioned in its blog post.
“Anyone accessing Reddit content must abide by our policies, including those in place to protect Redditors. We are selective about who we work with and trust with large-scale access to Reddit content.”
Thanks for choosing to leave a comment. Please keep in mind that all comments are moderated according to our comment Policy.