Tumblr's Data Deal Raises Concerns

404 Media recently reported that Auttomatic, the parent company of WordPress and Tumblr, is entering into a deal to provide user data from their platforms to aid in training AI developed by OpenAI and Midjourney. This revelation has sparked concerns about privacy and the potential implications for users.

Transparency and Opt-Outs

Following the 404 Media article, a representative for Auttomatic directed inquiries to a public blog post addressing the issue. The post highlights that while Auttomatic’s sites currently block AI crawlers, they plan to offer users the option to opt-out of sharing their data with AI companies in the future. The company asserts that their partnerships with AI firms will prioritize user concerns such as attribution, opt-outs, and control over data usage.

Data Compilation Challenges

404 Media’s report revealed internal communications among Auttomatic employees detailing the process of compiling posts from 2014 to 2023 for AI training. However, errors were noted, including the inclusion of content from deleted or suspended blogs, private posts on public blogs, and private answers from the “Ask” function. Of particular concern was the inclusion of NSFW or “mature” content, contrary to the platform’s policies. While Tumblr revised its guidelines regarding nudity in 2022, the inadvertent inclusion of inappropriate content raises questions about data integrity and privacy safeguards.

Implications for AI and Fan Communities

The prospect of AI algorithms trained on user-generated content from platforms like Tumblr raises intriguing possibilities and ethical dilemmas. Tumblr’s vibrant community of fandoms and niche interests could potentially enrich AI-generated content, including fanfiction. However, concerns arise regarding the ethical use of personal writing, photography, and art for AI training purposes, particularly without explicit user consent.

Broader Industry Trends

Tumblr’s data-sharing initiative is not unique in the social media landscape. Platforms like Reddit and Facebook have also engaged in data licensing agreements for AI training. While such arrangements offer opportunities for AI development, they also raise ethical concerns about data privacy and user consent.

Conclusion

The intersection of user-generated content and AI training underscores the importance of transparency, consent, and ethical considerations in data-sharing practices. As platforms navigate these challenges, users must remain vigilant about their privacy rights and the responsible use of their data in emerging technologies.