It’s 2025, and we’ve reached peak GenAI hype. Financial platforms now market sustainability chatbots and copilots that promise to “democratize insights” or “simplify compliance.” The potential is extraordinary. But there’s an inconvenient truth: no matter how smart your GenAI assistant is, it can’t fix the biggest problem in sustainable finance: bad data.
This isn’t a philosophical objection. It’s a technical and strategic one. Large Language Models (LLMs) like GPT are brilliant at generating coherent, convincing text, and can be exceptionally useful for summarizing large datasets, generating reports, and automating routine tasks, doing in minutes what would take human analyst days. However, they’re deeply dependent on the quality and structure of the information they’re fed. In the messy, high-stakes world of sustainability data, relying on GenAI alone without addressing the data problem is like asking a world-class architect to build a skyscraper with faulty materials: the facade might be grand, but the structure will be unsound, and it certainly won’t stand the test of time or fulfill the industry’s needs.
Sustainable finance needs more than polished words
The urgency for robust sustainability data is undeniable. Investors need clarity on carbon emissions, social impact, and governance risks. Companies are facing escalating regulatory demands, while stakeholders are becoming increasingly vocal about the need for transparency and accountability.
Yet the data landscape remains deeply flawed. In sustainable finance, much of the information is self-reported, incomplete, unstandardized, or buried in PDFs and footnotes. Our research on carbon reporting shows that only 60% of companies disclose any Scope 3 emissions. Even when disclosed, definitions vary widely: what one company calls “renewable energy use,” another might categorize differently. This lack of standardization makes comparisons difficult and confidence elusive.
Beyond being frustrating for analysts, these problems directly affect investment decisions, compliance reporting, and the credibility of the entire sustainable finance ecosystem. Imagine asking a GenAI model to reconcile the methodologies behind Scope 3 emissions for two multinationals. It might give you a well-phrased answer, but with Scope 3 disclosure quality worldwide still far below acceptable standards — and some regions scoring as low as 2.2 out of 5 — there is no reliable way to trace or validate the response.
This is not how GenAI will deliver real change.
Why even the best AI needs a solid data foundation
It’s easy to be seduced by the idea that LLMs can summarize, analyze, or explain data on demand and at unprecedented scale. And they can, but only once the raw inputs are ready. In sustainable finance, preparing those inputs is complex, technical work: extracting information from multi-column PDFs, scanned reports, and multilingual regulatory filings; verifying credibility and timeliness; and standardizing metrics so they mean the same thing across companies and sectors.
It’s also about capturing nuance. Sustainability data often hides in details: a CEO’s compensation plan linked to emissions targets, or the severity of a human rights controversy buried in a regional news source. Without that structured context, GenAI can produce summaries that are broad and even elegant in phrasing, but disconnected from the realities on the ground.
Take climate commitments, for instance. On the surface, two companies may both claim to “link executive pay to environmental performance.” But in practice, one may tie 20% of executive compensation to supplier decarbonization and emissions reduction, while the other offers only vague, unquantified promises. If the underlying data isn’t codified and verified, AI analysis treats both as equal — misleading decision-makers with elegantly worded equivalence.
From data chaos to clarity
This is why, in our experience, 80% of the value AI delivers in sustainable finance happens before the first LLM prompt is even invoked. Take controversy tracking. AI models continuously scan unstructured content from news sources, regulatory alerts, and reports (often in multiple languages) to detect potential environmental disasters, labor rights violations, or governance scandals. Once identified, incidents must be linked to the correct companies, categorized under the right sustainability metrics, and assigned accurate severity scores.
The same challenge exists for extracting metrics from corporate disclosures. Reports are sprawling documents full of footnotes and complex tables, rarely expressed in standardized terms. Here, specialized AI systems (from advanced optical character recognition (OCR) and layout parsing to fine-tuned models trained on large, labeled datasets) capture and structure the raw details, detecting sustainability-specific KPIs that would otherwise be lost.
In one case, a company referenced its employee childcare benefits obliquely, describing “staff kindergartens,” “backup care,” and “nursery vouchers.” Because the report never used the precise wording “day care services,” human annotators had skipped over these passages. Our extraction tools, however, recognized the semantic equivalence and correctly mapped them to the relevant social metric. That single step reshaped the company’s profile, revealing a stronger employee support program than previously assumed. A detail small enough to skip, yet significant enough to influence investment decisions.
That is why each data point must be tagged with provenance — its source, timestamp, extraction method, and confidence score — so it can be traced and verified. With that in place, the AI foundation becomes reliable. Only then can GenAI be truly transformative: delivering insights that are not only accurate and auditable, but also faster to generate and more contextually intelligent. Without that foundation, GenAI is operating in the dark.
GenAI hype is distracting us from real innovation
The real revolution in sustainable finance will not come from plugging ChatGPT into a dashboard. It will come from solving the dirty, unsexy problem of data infrastructure of building pipelines that can take raw, unstructured, multilingual, and often contradictory inputs, and turn them into reliable, usable, and actionable outputs.
Once that’s in place, AI systems can generate the verified inputs, and GenAI can then amplify them, creating reports, insights, and analyses at speed and scale, dramatically improving efficiency for investors, regulators, and companies alike. The path forward is judicious AI: not just large models, but smart ones, purpose-built for sustainability, balancing accuracy, efficiency, and cost-effectiveness.
In sustainable finance, flashy tech won’t move the needle. Trustworthy data will.




