[Home]   [Full version]  

Scientists devise means to test for phony technical papers

Apr 24 ,Technology


Authors of bogus technical articles beware. A team of researchers at the Indiana University School of Informatics has designed a tool that distinguishes between real and fake papers. It's called the Inauthentic Paper Detector -- one of the first of its kind anywhere -- and it uses compression to determine whether technical texts are generated by man or machine.

"This is a potential problem since no existing systems, the Web for example, can or do discriminate between content that is meaningful or bogus," says assistant professor Mehmet Dalkilic, a data mining expert. "We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."

Joining Dalkilic on the IPD project are Assistant Professor Predrag Radivojac, informatics doctoral student James Costello, and Wyatt T. Clark, who will graduate in May with a bachelor's degree in informatics.

The IPD system is based on a combination of compression algorithms that reduce the amount of data to save space and speed transmission time.

To begin their study, the team identified two kinds of texts they would analyze. "Authentic text" (or document) is a collection of several hundreds or thousands of syntactically correct sentences that are wholly meaningful. "Inauthentic text" (or document) is a collection of several hundreds of thousands of syntactically correct sentences that, taken all together, have no meaning.

The researchers' work is documented in the very authentic paper, "Using Compression to Identify Classes of Inauthentic Texts," which they presented at the Society for Industrial and Applied Mathematics Conference on Data Mining in Bethesda, Md., this weekend.

The informatics study largely was inspired by a prank pulled by three Massachusetts Institute of Technology students, who in 2004 developed a computer program that churned out randomly generated fake computer science language, essentially a four-page compilation of gibberish. They submitted it as a research paper to an international conference on computer science and informatics – and it was accepted without review.

Radivojac, whose research expertise is machine learning, says the IPD easily detected numerous inauthentic technical papers tested, including the MIT students' spurious submission.

"We hypothesized we could build a reliable and fast model that recognizes fake papers automatically," says Radivojac. "We combined these with machine-learning methods to build a predictor of these kinds of papers."

In general, identifying meaning in a technical document is difficult, Dalkilic says. "We don't claim we have found a way to distinguish between meaning and nonsense, but we do emphasize that there are many nontrivial classes of inauthentic documents that can be easily distinguished based on compression algorithms."

Source: Indiana University School of Informatics

Related stories:

Research team develops systems that process and understand spoken language, especially Basque
A research team drawn from the Department of Systems and Automation Engineering of the Polytechnic University School and from the Faculty of Informatics at the Donostia-San Sebastián campus of the University of the Basque Country (UPV/EHU) and led by lecturer Miren Karmele Lopez de Ipiña, is developing systems that process and understand spoken language and automatically obtain information particularly from Basque radio and television.
Analysis of quickly stopped Rx orders provides new tool for reducing medical errors
By studying medication orders that are withdrawn ("discontinued") by physicians within 45 minutes of their origination, researchers at The University of Pennsylvania School of Medicine have demonstrated a systematic and efficient method of identifying prescribing errors. The method, they say, has value to screen for medication errors and as a teaching tool for physicians and physicians-in-training. The report is published in the July/August 2008 issue of the Journal of the American Medical Informatics Association.
Sounding out heart problems automatically
Sounding the chest with a cold stethoscope is probably one of the most commonly used diagnostics in the medical room after peering down the back of the throat while the patient says, "Aaaah". But, research published in the inaugural issue of the International Journal of Medical Engineering and Informatics looks set to add an information-age approach to diagnosing heart problems. The technique could circumvent the problem of the failing stethoscope skills of medical graduates and reduce errors of judgment
Taking action against hospital acquired infection
Patients enter hospitals every day for a variety of reasons but usually without the thought of developing a new health problem. Yet every year thousands of hospitalized Americans acquire infections during hospital stays, causing risk of complications, prolonged stays and an increased burden on the health-care system.
Passports for penguins
Ground-breaking technology that will enable biologists to identify and monitor large numbers of endangered animals, from butterflies to whales, without being captured, will be shown to the public for the first time at this year's Royal Society Summer Science exhibition [30 June to 3 July].
Facebook concepts indicate brains of Alzheimer's patients aren't as networked
This is your brain on Facebook. Researchers at the Stanford University School of Medicine used concepts borrowed from the popular social networking site to analyze the brains of people with Alzheimer's disease. They found that patients' brains were less well-connected than the brains of people without the disorder.
Simulations means 'smarter traffic decisions'
Kyoto University and IBM's Tokyo Research Laboratory have developed a system that can simulate urban transport situations encompassing millions of individual vehicles in complex traffic interactions. A simulation can predict, for example, what will happen if a new office building, sports arena or other major facility is built and lead to improved planning of roads and public transportation.
Paralysed man takes a walk in virtual world
A paralysed man using only his brain waves has been able to manipulate a virtual Internet character, Japanese researchers said Monday, calling it a world first.

News discussion:

Technology news

[Home]   [Full version]