Solving the data dilemma: Balancing AI innovation with ethics
Public data is a vital channel partner tool that must stay “public”
Stay up to date with the latest Channel industry news and analysis with our twice-weekly newsletter
You are now subscribed
Your newsletter sign-up was successful
The current landscape between websites and AI companies is fraught with conflict. Disney launched a legal battle with AI company Midjourney in 2025, and the BBC threatened AI firms with legal action.
This ongoing friction is likely to change the way we use the internet, making it more difficult to access information as sites lock away content. However, access to public web data is important. It allows companies globally to develop tools for consumers to easily compare prices, for example.
Public web data is also vital for channel partners who use it to optimize marketing campaigns, verify ads, and track the price fluctuations of competitors, among other use cases. As a result, we need to ensure public web data stays just that, public and equally accessible for innovation, and that the open internet can continue.
An outline of web scraping practices
Multiple industries need public web intelligence, such as AI, ecommerce, marketing, finance, and cybersecurity. They utilize proxy IPs and public data collection solutions to access it.
With these tools, businesses can compete to offer the lowest prices for consumers. Meanwhile, cybersecurity experts use proxies to collect threat intelligence only accessible from specific locations. Many universities and NGOs use proxies for their research and to track propaganda or disinformation.
For years, e-commerce has used web intelligence to compare the prices of products against competitors. E-commerce companies also track price and inventory changes ahead of the holiday season or a specific promotion. For this reason, closing off public data access would be detrimental to channel partners who depend on intelligence gathered from the web.
The role of public data in Google search
Web scraping is almost as old as the internet itself, with Google being the biggest and best-known scraper. Originally, Google made the internet usable. But, as the internet has evolved, the way they’re interacting with public data has changed. In the age of AI, Google visits every new website, clicks on each link, gathers all available information, and stores this in its vast datacenters.
Stay up to date with the latest Channel industry news and analysis with our twice-weekly newsletter
The data is then processed and indexed so that whenever you need to search for something, you simply enter specific keywords and Google will display the top websites with the desired content.
Google can do this not only because it has previously visited these websites and gathered the content, but also because the owners of those websites are happy with this process. They want Google to list them, as it increases the chances of new visitors clicking on their website.
The rise of anti-scraping measures in the age of AI
In recent years, a new player has entered the market with huge resources, impacting the whole industry and disrupting the data ecosystem. This new player is, of course, AI. According to live polls during OxyCon sessions, 57% of respondents reported that public web-scraped data remains their main source for training AI models.
With the sudden surge of AI development leading to complicated legal battles on both sides regarding training data, there are no clear rules for compliance. This is evident in the legal battle we saw X take on recently. X did not want its data being used for LLM training and, as a result, they started blocking the traffic and using their lawyers to deter organizations from gathering web intelligence from their site.
However, since then, they have lost two legal cases where the judge argued that content on X generated by users and accessible without login is, in actuality, public data - and therefore does not belong exclusively to X.
These legal battles are occurring because the landscape is unclear and difficult to navigate. Unsurprisingly, Europe is putting regulations in place the fastest; however, this has not provided a solution, and many consider Europe’s recent regulations to be unclear and too strict. Truthfully, no one in the region has a clear, confident understanding of how to comply.
Meanwhile, the US has passed 280-plus pieces of legislation in the past 12 months, while Australia is considering an entirely new approach with a focus on innovation. While AI is developing fast, it’s not apparent what needs to be regulated first. Naturally, by the time a new regulation is in place, the AI will have changed so much that it’s difficult to catch up and stay relevant.
The end of commerce as we know it
At this point, it may seem like AI is going to lead every website to block each other to keep hold of their data. However, the huge popularity of AI suggests that this is not the best route, especially as more and more users are opting to use ChatGPT as a form of ‘search engine’ before making final purchase decisions.
Once someone has used an LLM as a product research or price comparison tool, they are far more likely, - four times by some estimates - to make their purchase, because the research is done. As an enterprise, if you block AI agents from scraping data from your website, your product will not be reflected in the LLM results given to consumers, and, unsurprisingly, the sales will plummet.
Web intelligence is critical for the whole digital ecosystem, allowing the building of automations and solutions with consumers in mind. However, the ongoing data wars halt innovation and cause some actors to be excluded. In this scenario, it’s crucial to find a solution for how to access data without damaging the equal opportunities of others
We’re living in truly fascinating times where technology that was created 10 years ago feels ancient now. If you’re playing by the rules, open data access helps businesses to create innovative solutions. For this reason, it’s important that public web data stays open and equally available for all to utilize.

Vaidotas is the chief risk officer at Oxylabs, a market-leading web intelligence collection platform.
Having over 10 years of experience in payment and digital risk management, Vaidotas has established himself as an influential force in the web data gathering industry, employing innovative methods to ensure the most ethical and secure SaaS business processes.
Before coming to Oxylabs, Vaidotas spent seven years at Western Union, working as a risk analyst and, later, leading digital risks and digital payments teams.
Currently, Vaidotas is leading a team of 17 professionals that is successfully overseeing risk-vulnerable areas of business operations and countering emerging threats.
-
Does your business need cyber insurance?In-depth As the cyber insurance market matures, do firms actually need it and if so, how should they choose a policy?
-
Geekom A5 Pro reviewReviews It's not a mini PC for power users or intense graphics work, but as a productivity machine or media server, it will do very nicely
-
AI readiness and legal compliance: Practical strategies for MSPs in the age of CopilotIndustry Insights How MSPs can respond effectively to the rising demand for AI services
-
From AI hype to AI reality: The steps businesses need to take to adopt AI responsiblyIndustry Insights Responsible AI adoption requires a strategic, long-term approach rather than simply deploying new tools
-
The UK’s AI ambitions depend on channel partnersIndustry Insights Strong AI rollout hinges on channel partners driving successful adoption
-
How to build trust into automation at scaleIndustry Insights How channel partners can scale robotics securely while building customer trust
-
Why ‘buy vs build’ Is the wrong question for AI strategyIndustry Insights AI is now central to modern enterprises, but many struggle to match hype with results
-
AI and Sustainability: The dual forces reshaping the data center ecosystem - and the channel opportunity aheadIndustry Insights Data centers face power and sustainability limits, creating new opportunities for channel partners
-
Empowering customers in the AI era: The new role for partnersIndustry Insights As businesses embrace agentic AI, partners play a critical role in helping customers adopt and secure it with confidence
-
The importance of pilots, open source, and consultancy in the new world of AIIndustry Insights As AI complexity grows, open source models and partner expertise prove critical