Can textual analysis help auditors identify fraud?

Preliminary findings from research conducted by Patrick Fan and Greg Jenkins, associate professors of accounting and information systems in the Pamplin College of Business, suggest that the technique — used extensively in the social sciences to scrutinize written and oral communication — can be used to identify language patterns in management communications that are inconsistent with either the company’s financial performance or with the communications of other companies in the same industry. Such inconsistencies may indicate fraud.

“The results of our initial analysis suggest that our model has substantial predictive power,” says Jenkins. “When fraud is committed in companies, there appear to be patterns in corporate communications that imply wrongdoing.”

The professors hope to develop their methodology, based on knowledge from auditing and information systems, into a more precise new tool to help auditors and regulators detect fraud. They have received a grant of about $196,000 from PricewaterhouseCoopers for their two-year project, expected to be completed in 2009.

Fraud detection, Jenkins says, is a top priority for the auditing profession, because of its “dramatic, negative effect on the public’s confidence in capital markets” and its “staggering” costs. “Glass Lewis & Co. recently estimated that high-profile frauds resulted in the loss of almost $900 billion in market capitalization from 1997 to 2004.”

Jenkins, a former auditor at Ernst & Young, currently serves on a research task force that is providing guidance to the Public Company Accounting Oversight Board on matters related to audit firms’ quality control. “Investors and regulators would obviously prefer that auditors find all frauds,” he says, “but, the current standards don’t require them to do that — and, frankly, auditors don’t have precise enough testing procedures to identify all frauds. An annual report, for example, contains tremendous amounts of information. The numbers of transactions represented in it are such that it is difficult to audit enough transactions to catch all frauds.”

Firms, he adds, are working to develop more sophisticated fraud-detection techniques. “Extensive testing is very expensive. Much of the audit still requires human effort.” He and Fan hope to aid that human effort with a computerized tool to handle the tedious, time-consuming, and, often, impossible task of manually examining all available text documents from the firm being audited.

Fan, a specialist in data and text mining and business intelligence research, says their model uses text-mining techniques to automatically identify word patterns that might be highly associated with financial fraud. “Once we build a very good mining tool, we can use it to screen all firms within an industry.” By recognizing language patterns or trends that are inconsistent with either the company’s financial performance or communications issued by other companies in the same industry, the software would guide auditors to particular areas — revenue recognition policies and practices, or disclosures of liabilities, for example — that may need further examination.

Explaining the need to compare the company with others in the same industry, Jenkins says that in many instances of fraud, inconsistencies between a company’s communications and its financial performance may be difficult to discern. In such cases, benchmarking a company’s communication patterns against those of other companies in the same industry may help reveal unusual or unexpected differences. “Companies in the same industry — with similar products, business lines, competitive regions, and, sometimes, customer bases — tend to describe their transactions in very similar terms.”

A company’s financial performance may be similar to that of its competitors, he says, “yet the language it is using to describe its prospects seems overly optimistic or overly specific or vague relative to others in the industry.” Enron’s annual reports from the late 1990s, for example, exuded an “unlimited optimism” at a time when other companies were starting to struggle, he says. “The way Enron described its prospects was inconsistent with how its competitors were describing their prospects. Moreover, its descriptions of related-party transactions were incomplete and overly vague.”

Developing the benchmark data itself, Jenkins says, is a tremendous challenge. He and Fan have compiled a list of cases of known fraud — companies that have been sanctioned by the Securities and Exchange Commission for committing fraud — and are completing identification of another set of companies, those in the same industries “whose financial statements have stood the test of time.” The professors will use their methodology to compare large volumes of corporate communications — annual reports, letters to shareholders, and transcripts of analyst conference calls, for example — from these two groups of companies, which represent a variety of industries: technology, retail, energy, and consumer products.

“We’re tracking tens of thousands of words from multiple companies and multiple periods,” Jenkins says. “We’re using computing power to go through and look at language to identify patterns — words and frequency of usage — that would be very difficult for a human reader to discern. Our findings so far show that there are systematic differences in textual communications between the two groups of companies.”

He and Fan say they envisage their software serving as a decision-support tool that would improve the efficiency of the auditing process, help auditors gain additional sources of evidence, and, ultimately, enable detection of financial fraud.

The grant from PricewaterhouseCoopers, the professors say, will allow their research to be completed more rapidly. The firm launched its “PwC INQuires” program of funding for applied research last spring to assist faculty and doctoral students “seeking to increase the knowledge base that contributes to the practice of auditing and tax.” In its inaugural year, the program awarded more than $580,000 to 37 researchers for 13 projects. The grant awarded to Jenkins and Fan represents a third of this total.

Share this story