Analytics: Making sense of the jargon
- Learn the difference between business intelligence, analytics, and reporting
- A data dictionary is a collection of detailed information that defines a piece of data, its value, and its use within a database
- Data lakes hold both unstructured and structured data
Confused about the difference between a data lake and a data warehouse? Here’s a useful primer on a few common terms.
In the world of higher education technology, analytics is all the rage—and justifiably so. In the past, higher ed decision making was based upon guesswork or vague estimations. Now, with the adoption of powerful technology, institutions have a vast pool of data at their disposal. But the problem is figuring out what to do with all of that data, how to organize it, and—most importantly—how to glean insights from it. That’s where analytics comes in: institutions everywhere are scrambling to find ways to harness all of their campus-wide data and use it intelligently.
It’s certain that any specialized field will inevitably generate a set of jargon over time—and that’s the case with analytics as well. Here, we’ve gathered a handful of some of the more commonly used terms to help you get a handle on the language of analytics:
Big data: The term “big data” has been around for nearly 20 years. In its simplest form, big data is all of the information that’s available to an institution. It includes both structured data (for example, information that has a high degree of organization, such as student ID numbers, dates of birth, and so on) and unstructured data (naturally, the information that isn’t organized in any meaningful way, such as emails, social media posts, etc.) Big data is complex and chaotic, and, as the term suggests, there is a lot of it.
Higher education institutions are swimming in data—from student records, research data, and alumni records to emails, tweets, and parking records. The challenge for today’s institutions is to find some way to make sense of this pool of data. Analytics is basically a method for sifting through this big data to find relationships, patterns, or insights—and turn all of that chaos into something useful.
Business intelligence (versus reporting and analytics): These terms are often confused with one another. But the primary differences are in the depth of the information.
A report is designed to answer a specific question (How much did enrollment decline last year?) and presents information in a structured way so that it can be easily digested. On the other hand, analytics provides a robust data set designed for the user to discover and answer a series of questions (Why did enrollment decline? How did we perform in the five years prior to that? How can we stop it? And what are the projections for next year?). Analytics are multi-dimensional, allowing you to drill down, discover, and gain insights from the information. With analytics, you get a deeper understanding of the relationships between pieces of information.
Business intelligence is, in some ways, a method for combining all of these tools to help make informed decisions to guide the institution. Business intelligence brings together reports, analytics, and other pieces of information to grant a comprehensive view of the current state of affairs or challenges facing an institution. It builds a solid foundation based upon accuracy and depth, so that planning and decisions are more informed and insightful, rather than guesswork.
Dark data: Sure, it’s a rather ominous sounding term that might conjure images of cloaked villains, but in reality, dark data is simply all of your data that is collected and processed—but not analyzed. Random notes, pieces of information from various parts of campus, and so on. This data quite possibly has potential to yield insights, but it’s just sitting there, taking up space. Nothing more, nothing less.
Data dictionary: Just like a linguistic dictionary, a data dictionary is a collection of detailed information that defines a piece of data, its value, and its use within a database. Within higher education, for example, the data dictionary would set forth definitions for what constitutes a credit-hour, a semester, a full-time enrolled student, etc. Establishing these definitions allows databases to share the same language, so to speak, so that information can be analyzed and parsed across databases.
On a human level, the data dictionary helps everyone at an institution understand the meanings of these terms so that the information can be handled and processed correctly. A data dictionary ensures that everyone is on the same page, so that when data is entered in a field or manipulated, it’s handled according to the same definition of that term. In other words, “semester credit hour” will mean the same thing to those in the Department of Biology as it does to those in student affairs—and the corresponding information in all databases should be accurate. It will also determine how these pieces of information can be used, what values can be applied to them, and their relationships to other pieces of data.
Data governance: Think of data governance as a system of rules and regulations designed to help an institution decide which data is important, how it will (or will not) be used, and who has access to it. Roughly speaking, data governance has two parts: management and security.
The management piece of the data governance puzzle dictates how an institution would control and store its data, as well as access to it. It eliminates uncertainty and surprises by providing an agreed-upon framework for the use of the institution’s information. Security is concerned with the protocols established to protect that data—from misuse and abuse both inside and outside the institution.
Data governance can be handled through numerous methods, but it’s best to draft policies based upon input from a range of stakeholders across the institution. It’s also worthwhile to note that data governance is rather like auto insurance: everyone should have it, but not everyone does.
For advice on how to build a successful data governance program, read our Data Governance: How to get started blog.
Data lake versus data warehouse: A data lake and a data warehouse are two methods for storing data. The difference is in how that data of stored.
A data lake holds raw, unstructured information, as well as structured information. Think of it as a container that catches everything that flows through an institution. It’s a vast repository of everything from across the institution and its various departments, schools, divisions, and so on.
A data warehouse also stores information—but in a more structured fashion, such as within files and folders. The data here has been organized and the uses for it have been defined. Information within a data warehouse has been scrubbed clean and pre-packaged for easy consumption, so to speak.
A data lake will retain all data, versus a data warehouse, which must sometimes eliminate or cull information to make it fit within the parameters of the warehouse. A data lake keeps everything, for all time, whereas a data warehouse may not have adequate space.
You can begin to see the advantages of data lakes—having that kind of historical information from across a broad spectrum of the institution can yield valuable insights that might not otherwise be discovered. A data lake, in effect, grants the opportunity for deeper and more meaningful insights. However, some institutions have chosen to utilize a hybrid approach—retaining a data warehouse alongside a data lake. As with much of analytics, there is no one-size-fits-all approach.
Enterprise analytics: This term refers to the ability to analyze data across the institution. In other words, it’s about your methods and processes for collecting, interpreting, and analyzing information from an institutional perspective, rather than a departmental or divisional angle. Enterprise analytics grants an institution a comprehensive view of information and influential factors that affect the entire organization.
ETL versus ELT: Broadly speaking, both of these terms refer to methods by which data is moved around in order to get it ready for analytical analysis and reporting. ETL (extract, transform, load) is the traditional approach, wherein raw data is retrieved from a data pool, moved to a temporary location to be structured and organized before being loaded to a data warehouse for analysis and reporting. Each stage (the E, the T, and the L) must be completed before the data can be analyzed or reports can be generated.
ELT (extract, load, transform), a relatively new method, changes this approach. After extracting data, it can be loaded into a repository—such as a data warehouse—and immediately be structured for analysis. Although this process is still evolving, it has the distinct advantages of affording more flexibility and speed, enabling access to large amounts of data at any time.
As with just about any aspect of technology, there are varying opinions and strategies surrounding analytics—and the best methods to approach it. When it comes right down to it, there is no one-size-fits-all solution. Each institution will have its own needs and unique requirements. But making sure everyone speaks the same language—and understands the jargon—is the best way to build a foundation for success.