Good Data In, Good Data Out

Good Data In, Good Data Out

IHEs put in systems to capture quality data, and keep those records clean.

When database records indicated that 200 students had signed up to play on the 2004 football team at DePauw University (Ind.), Administrative Upgrade Project Director Daniel Pfeifer realized there was either something seriously wrong with how data was being handled, or the university would be ordering a lot more uniforms.

The difficulty, it turned out, was with a spiffy new database for the Athletic department that was designed to simplify sports registrations and gym facilities access. Instead, it ended up duplicating student records in the larger system at a rate of 10 to 15 per day.

By the time Pfeifer realized what was happening, hundreds of duplicate records were in the university's general database, with some students sporting three or four records.

Although the records themselves could be untangled manually, DePauw also had data coming in about potential candidates from numerous other sources, such as high school visits, PSAT and SAT scores, and a university marketing company. Having a system that allowed duplicate records would become a nightmare, and quickly.

"It's taken us about two years to really figure out how to stop duplicates, and make sure we're inputting quality data in the first place," says Pfeifer. "Now that we have it down, it's a huge help in knowing our numbers are right."

DePauw is far from alone in its quest to manage multiple databases, a task that's challenging for any sized institution of higher education. Data about prospective students mingles with alumni records, parent information, fundraising databases, and even financial data. Keeping it all straight takes effort and communication, but as many IHEs have discovered, the task is crucial for keeping costs in check and boosting data accuracy.

As nearly anyone associated with a college or university knows, databases abound in an academic environment. Not only are student records rife with grade reports and tuition statements, but schools have data on everything from how many desks are in a certain classroom to which donors have contributed in the past year.

The result for some institutions is an abundance of databases that contain similar information. The Bursar office might have student records online, but so does the Admissions office, as well as individual undergraduate or graduate program offices, and even Food Services.

As these databases are created and expanded, eliminating duplicates can be costly work, sometimes requiring days of manual deletion. If the databases are used to send out catalogs or other mail, bad addresses increase the university's costs.

A program at DePauw University crawls the database every day, searching for duplicate entries.

"It's a huge issue for any institution," says Nancy Krogh, registrar at the University of Idaho. "Having multiple databases has become a bigger and bigger problem over the past few years, as more information is being input and used to make decisions on both the departmental and university-wide level."

Although IHEs may have data pulled from a number of sources, many administrators have found that centralizing the information is key to managing it.

At DePauw, university staff spent four years integrating data into a central repository, known as Client Information Service, which also serves as a university-wide address book. In addition to records for the school's 2,400 current students and their parents are records for about 60,000 prospective students and their parents, plus around 80,000 alumni and donor records.

Prior to unifying the information, the university had a legacy system where prospectives, current students, and alumni were stored in separate systems. But record overlap became a significant problem, says Pfeifer. Not only did some prospective students enroll, subsequently appearing in two databases, but alumni tended to send their children to the school as well-meaning a parent could conceivably land in all three databases.

To manage the data and eliminate duplicates, DePauw's IT department wrote programs that crawl the database every day, searching for matches based on different fields like address or phone number. Also created were relationship algorithms that evaluate connections between records. For example, a student's parents are checked separately for errors but also evaluated against each other, to see if they live at the same address or are still married.

"In a sense, we have to triangulate people, relationships, and their location," says Pfeifer. "Because we started from scratch, we were able to design a system specifically for university communication and the typical relationships that are meaningful in a university context."

Although in-house applications like DePauw's can be developed, there are also many "cleaning solutions" on the market, usually in the form of a service. At Yale University, for example, records are sent out to AlumniFinder, which puts together address and other contact information from multiple sources.

Centralized databases are also easier to lock down. As IHEs like the University of Connecticut, Boston College, Stanford University (Calif.), and Tufts University (Mass.) have found, data stores are attractive targets for hackers that can sell the information on the digital black market. Putting data in an extremely secure environment and letting users pull only selected records can help prevent breaches.

Because a database wouldn't be spread out on multiple servers, it would be easier in many cases for IT staff to create firewalls or limit access, experts say. Even physical access to database servers could be strengthened if the servers are brought together within the same data center.

Centralizing databases into one comprehensive warehouse is crucial-but also a challenge in making sure that what's being input is quality data worth saving in the first place. In a classic example of "garbage in, garbage out," duplicates and bad information can be created not just by systems clashing with each other, but by questionable data entry.

Often, users themselves input their own information incorrectly, Pfeifer notes, or hit Submit too many times on a web form, essentially duplicating themselves in the system. "Whenever we allow clients to enter their own information, we get duplicate records," he says. "People put in their ZIP codes wrong, or transpose letters, or sometimes even spell their names wrong if they're typing too fast."

Lack of proper data ownership can also present difficulties, says Krogh of Idaho. She has seen departments attempting to create "shadow databases," pulling information from a central database so that records can be manipulated on a desktop and reports generated from it. The problem with such a tactic is that data is often changed on an individual's computer. If the record gets put back in the system, it doesn't match what's in the central storehouse.

Integrating databases into a centralized framework can spark resentment from those who believe they should have more control over data.

"To keep data clean, you have to establish a data owner, almost a custodian of data," Krogh notes. "These are the people that control the data, who can create new fields or enter new data."

Although such a severe clampdown on who gets to change data is likely to take effort, especially at an institution used to having every department able to do input, the tactic can reduce the need for intense strategies to reduce duplication.

At Yale, 10 people are in charge of updating contact information, tracking down addresses, and data input, says Angelyn Singer, manager of alumni records. Keeping changes confined to just those staff members keeps Yale's formidable database as clean as it can be. Considering that the staff makes more than 5,000 address changes per month, having that information changed by other departments would create a sizeable problem with knowing what's accurate and what isn't.

"It's an incredible challenge trying to keep up with where people are," says Singer. "It can take a very concerted effort sometimes." The department has even written to relatives of alumni to try and get current information and make sure that the database is correct.

The effort is worthwhile, Singer adds. "Clean data is so important, and that's why you need multiple strategies for making sure that what you put in is accurate," she points out.

In addition to having technological systems set up properly, IHEs must also navigate the much murkier waters of interdepartmental communication if they want databases to be managed expertly.

Many databases sprout from within each university department in order to keep records straight, and often these data storehouses are either simple Microsoft Excel spreadsheets with loads of data stuffed into them, or homegrown applications that address a specific department's needs. Integrating these various databases into a centralized framework will go a long way toward reducing duplication and assuring quality data, but such a switch can also spark frustration and even downright resentment from those who believe they should have more control over the databases they used to manage. Administrators who don't put at least some thought into preventing hurt feelings could have a harder time making database management into a group effort.

"Today's information systems are integrated," says Dwight Fischer, CIO of Plymouth State University (N.H.). "They require teams to work integrated with one database, one set of standards, one means of making changes. The teams that are most successful are those that come to realize if they work together they can deliver better services. On the other hand, teams that continue to focus more on their individual offices rather than seamless services tend to have far more organizational friction."

To help ease user grumbling, training is important-and not utilized often enough at schools, believes Tim Cooper, vice president of Sales for Higher Education Technology at Oracle. "Getting everybody to work together to see data as one source represents a paradigm shift," Cooper says. "At many universities, there are too many people who want to control the data." In Cooper's view, IHEs could create more goodwill among users by ramping up education and training efforts, rather than just spending most of their budget money on technology itself. "If users understand their role, and how to get data back out of the system, they won't be frustrated," he says.

While conflict is being smoothed at the user level, IHEs should also be contemplating ways to keep data management policies strong for the future. A systems user group was created at Plymouth State to represent each department with a stake in how the central database is utilized. Included are the Registrar, Bursar, Financial Aid, Admissions, Graduate School, and IT offices. This steering committee creates data standards, and policies about who can extract data and what long-term goals need to be achieved.

Users inputting their own information often introduce errors or hit Submit too many times on a web form, essentially duplicating themselves in the system.

"When you implement a new, integrated information system, you agree on standards, on data and business practices, and you make that the context within which everyone works," notes Fischer. "When everyone reads from the same sheet music, the harmony is wonderful."

In addition to the larger, executive-level group, a subgroup can be established that deals with database particulars, says Krogh of the University of Idaho. "Rather than build on the fly, you should have a team working together to think about what fields are needed, or which categories are just one-time use," she says. "If you have a mishmash of fields that people created because they needed them once, you can't see any kind of longitudinal trends or build a comprehensive data store."

Software, whether it's purchased or created in-house, can help to streamline database management, but some IHEs have decided to go with managed services as well. Benefits to outsourcing this type of work can include more frequent de-duplication of records, access to multiple information storehouses, and manual data entry that's done by the service rather than by university staff.

An entire database doesn't have to be taken off campus just to get some of the perks that come with managed services, though. For example, Melissa Data offers multiple options for cleaning up existing databases through the use of postal service databases, de-duplication programs, and telephone records.

IHEs can spend anywhere from $1,000 for a one-time duplicate check to more than $10,000 for regular cleaning, notes Jack Schember, the company's marketing manager. "There are many tiers for these type of services. Universities can take advantage of address correction features without spending much money."

Some also appreciate the consulting expertise that comes with buying managed services, notes Fred Siff, CIO at the University of Cincinnati. When his school blended several legacy systems, Siff turned to dbaDIRECT, so staffers could focus on infrastructure issues rather than spend much of their time on data cleansing and other routine tasks.

"Many times, universities don't have time to aggregate data and think about performance issues," says John Bostick of dbaDIRECT. "There are a number of layers, and they might be more interested in expanding their tactical skills instead of focusing on the whole data warehousing aspect of the task."

Whether service firms are used heavily or only occasionally, the input from an "outside" source can be helpful for an IHE trying to manage all aspects of a major database, Siff notes.

"With something as complex as bringing together systems like SAP and Oracle, it's helpful to have assistance in handling that transition," he says. "You need to have a strong team at the university, but you should also have an expert in more broad-ranging database issues. When everyone can get together for the effort, you'll be much more successful with implementing a database that works for everybody."

Elizabeth Millard, a freelance writer based in Saint Louis Park, Minn., specializes in covering technology.


Advertisement