Here's a Glossary of Useful Database Terms. Lower down the Page, you will find
IT Terms, (and for other Terms, check out
Whatis.com) ...
TERM |
DEFINITION |
3NF (or TNF) |
Third Normal Form can be summarised as 'The fields in the Table depend on the Key, the Whole Key and Nothing but the Key,
so help me, Codd'.
|
Access Path Analysis |
Access Path Analysis is the specific path chosen by the Optimizer to execute the SQL.
In Oracle, the EXPLAIN feature can be used to determine the path chosen by the optimizer.
You can take a Course on APA,
or use a product like BMC's APPTUNE.
|
ACID |
Refers to Transactions which are Atomic, Consistent, Isolated and Durable.
- Atomic means that the entire Transaction succeeds or fails as a complete unit.
- Consistent means that a completed Transaction leaves the affected data in a consistent or correct state.
- Isolated means that each Transaction executes in isolation and does not affect the state of other transactions.
- Durable means that the changes resulting from committed Transactions are persistent.
|
Business Intelligence |
BI is the process of providing critical information to corporate management to help them
make better business decisions. It has become increasingly important with the growth in Web Sites,
that present a new source of information to be integrated with existing sources to enable management
to obtain a comprehensive view of how their enterprises are performing.
The term Business Intelligence was used as early as September, 1996, when a Gartner Group report said:
"By 2000, Information Democracy will emerge in forward-thinking enterprises, with Business Intelligence
information and applications available broadly to employees, consultants, customers, suppliers, and the public.
The key to thriving in a competitive marketplace is staying ahead of the competition.
Making sound business decisions based on accurate and current information takes more than intuition.
Data analysis, reporting, and query tools can help business users wade through a sea of data to synthesize
valuable information from it - today these tools collectively fall into a category called Business Intelligence."
|
ClickStream Analysis |
Analysis of the User Actions when visiting a Web Site. These Action are stored in a Web Server Log, where each Log Record corresponds to a single Page Event.
Click stream Analysis can help Web site owners understand how visitors are using their site and which pages are
getting the most use.
|
Cube |
A Cube is a multi-dimensional way of representing data in a Data Warehouse, and is Used in
OLAP.
|
Customer Life Cycle |
In customer relationship management (CRM), Customer Life Cycle is a
term used to describe the progression of steps a customer goes through when considering, purchasing, using, and maintaining loyalty to a product or service.
The Customer Life Cycle consists of five distinct steps:- reach, acquisition, conversion, retention,and loyalty.
This means getting a potential customer's attention, teaching them what you have to offer, turning
them into a paying customer, and then keeping them as a loyal customer whose satisfaction with the product or service urges other customers to join the cycle.
The customer life cycle is often depicted by an ellipse, representing the fact that customer retention
truly is a cycle and the goal of effective CRM is to get the customer to move through the cycle again and again.
|
Dashboard |
1) A dashboard is a user interface
that,somewhat resembling an automobile's dashboard, organizes and presents
information in a way that is easy to read. However, a computer
dashboard is more likely to be interactive than an automobile
dashboard (unless it is also computer-based). To some extent, most
graphical user interfaces (GUIs) resemble a dashboard. However, some
product developers consciously employ this metaphor (and sometimes
the term) so that the user instantly recognizes the similarity.
2) Dashboard Products
Some products that aim to integrate information from multiple
components into a unified display refer to themselves as dashboards.
For example, a product might obtain information from the local
operating system in a computer, from one or more applications that
may be running, and from one or more remote sites on the Web and
present it as though it all came from the same source. Hewlett
Packard developed the first such product, which began as a tool for
customizing Windows desktops. Called Dashboard, the HP product was
subsequently acquired by Borland and then a company called Starfish.
Microsoft's Digital Dashboard tool incorporates Web-based elements
(such as news, stock quotes, and so on) and corporate elements (such
as e-mail, applications, and so on) into Outlook. Dashboards may be
customized in a multitude of ways and named accordingly, generally,
for example as a general corporate or enterprise dashboard, or more
specifically, as a CIO or CEO dashboard.
Microsoft offer an 'Interative Digital Dashboard'.
|
Database |
A Relational Database is a logical way of looking at data and provides a level of abstraction on top of the physical files where the data is actually stored on a computer.
So you can say things like 'Find me all my Customers' (SELECT * FROM customers in SQL) without worrying how and where the Customers data is stored.
Tim McLellan has produced
an excellent introduction.
It talks about Oracle but the basic principles are very general.
|
DTS |
Microsoft's Data Transformation Services.
|
eb XML |
ebXML (Electronic Business XML) is a project to use the Extensible
Markup Language (XML) to standardize the secure exchange of business data.
More details are available.
|
EII |
EII stands for 'Enterprise, Information and Integration'.
It refers to software tools that retrieve data live from Operational Data Stores and loads it into Data Marts and Data Warehouse.
Some of the leading products are Avaki, Composite Software, Ipedo and MetaMatrix.
|
Encapsulation |
In Object-Oriented terminology, Encapsulation means 'bundling' Methods with the Objects,
so that access to data is only by using the published Methods.
|
Encryption |
Encryption is a technique for 'scrambling' text in a character string.
It is commonly used for passwords and credit card numbers so that they cannot be read as
'clear text'. Oracle provides an
'Obfuscation Toolkit', which is an example of software for Encrypting and Decrypting.
|
Entity |
In general terms, an ENTITY is a 'Thing'.
For example, a Car, a Person, and a Building.
For a Data Model, with a defined scope, then an entity is a 'Thing of Interest' in the scope.
Data Modeling is the first step in the creation of a database.
In a database, an Entity becomes a Table, and this is why an Entity is very important.
For example, if you were creating a Data Model for vehicle rental, then a vehicle would
certainly be on of your 'Things of Interest', along with Customers,Rental Bookings, and so on.
Entities have Relationships to other entities, which are defined in Entity-Relationship Diagrams,
called ERDs.
You can check out hundreds of examples of ERDs showing Entities
elsewhere on this Web Site.
|
ETL |
ETL stands for 'Extract, Transform and Load'.
It refers to software Tools which are typically part of a migration of data into Data Marts and Data Warehouse.
Two of the market leading products are Business Objects
and Cognos.
|
GPRS |
GPRS stands for 'General Packet Radio Services'.
GPRS is a packet-based wireless communication service that is used to transmit Geographic Data.
|
GPS |
GPS stands for 'Global Positioning System'.
A GPS System is based on 24 satellites that orbit the Earth and makes it possible for people with receivers to
pinpoint their geographic location without 10 or 100 metrers.
A GPS System can be installed in a vehicle or hand-held.
|
Inheritance |
In a Data Model, Inheritance shows where there is one Entity, with 'Sub-Entities', which inherit
characteristics from the Super-Entity.
For example, a Customer could be a Super-Entity, with Commercial and Personal Customers as Sub-Entities,
which inherit all the attributes of the Customer, and add some of their own.
An example is shown in this Data Model for a
Financial Information Portal,
which shows a 'Published Source' Entity (on the right), with Data Feeds, Databases, Documents and News Feeds as Sub-Entities
which inherit the common attributes from the Published Source Entity.
|
LDAP |
LDAP stands for Lightweight Directory Access Protocol and is a simple registry
where information is stored about authorized users and their privileges.
Chaning or removing all privileges is done simply by changing an entry in the LDAP directory.
LDAP is a bit confusing because original implementations were presented as alternatives to Web and the
relational database management system.
The tendency is to implement LDAP registries using standard RDBMSes and to use XML to exchange
information about Users with other systems.
|
Legacy Systems |
Legacy Systems are those which has been in operation for many years.
They are usually large mainframe systems with complex file systems.
Therefore, retrieving data from Legacy Systems is not simple.
The most common examples are IBM Mainframes from the 1970's and 80's.
A common requirement these days is to retrieve Legacy Data and integrate it with
other data to present over the Internet.
This is not a simple process and many vendors sell
Data Migration Tools to solve this problem.
|
 
Many-to-Many |
A Many-to-Many Relationship in a Data Model exists where many examples of one Entity can be associated with many of the other.
For example, in a CD Collection, an Artist can appear on
many CDS, and a CD can feature many Artists.
In a Physical Model, or Database design, these Many-to-Many's are resolved with
the addition of an 'Association Table', that identifies, for the CD Collection, the 'Artists on CDs'.
|
MDB |
Multi-Dimensional Database, also See Cube and OLAP.
|
MDX |
A Microsoft term for Multi-Dimensional Expressions.
You can also use
English Query to bypass the complexity of using MDX.
|
Normalization |
This definition is taken from WhatIs.com.
In creating a database, normalization is the process of organizing it into tables in
such a way that the results of using the database are always unambiguous and as intended.
Normalization may have the effect of duplicating data within the database and often results
in the creation of additional tables.
While normalization tends to increase the duplication of data, it does not introduce redundancy, which is unnecessary duplication.
Normalization is typically a refinement process after the initial exercise of identifying the
data objects that should be in the database, identifying their relationships, and
defining the tables required and the columns within each table.
Ted Codd defined degrees of Normalization of relational database tables and include:
First normal form (1NF).
This is the "basic" level of normalization and generally corresponds to the
definition of any database, namely:
It contains two-dimensional tables with rows and columns.
Each column corresponds to a sub-object or an attribute of the object represented by the entire table.
Each row represents a unique instance of that sub-object or attribute and must be different in some way from any other row (that is, no duplicate rows are possible).
All entries in any column must be of the same kind. For example, in the column labeled "Customer," only customer names or numbers are permitted.
Second normal form (2NF).
At this level of normalization, each column in a table that is not a determiner of the contents of another column must itself be a function of the other columns in the table.
For example, in a table with three columns containing customer ID, product sold, and price of the product when sold,
the price would be a function of the customer ID (entitled to a discount) and the specific product.
Third normal form (3NF).
At the second normal form, modifications are still possible because a change to one row in a table may affect data that refers to this information from another table.
For example, using the customer table just cited, removing a row describing a customer purchase (because of a return perhaps) will also remove the fact that the product has a certain price.
In the third normal form, these tables would be divided into two tables so that product pricing would be tracked separately.
Domain/key normal form (DKNF).
A key uniquely identifies each row in a table.
A domain is the set of permissible values for an attribute.
By enforcing key and domain restrictions, the database is assured of being freed from modification anomalies.
DKNF is the normalization level that most designers aim to achieve.
|
 
OLAP |
Online Analytical Processing, used for complex analysis of Data Warehouses
and here's a Web Site dedicated to
OLAP Report Solutions.
|
Page Hit |
In a 'Data Webhouse', A Page Hit is the sending of a single file whether an HTML file, an image,
an audio file, or other file type.
Since a single Web page request can bring with it a number of individual files,
the number of hits from a site is a not a good indication of its actual use
(number of visitors).
It does have meaning for the Web site space provider, however, as an indicator
of traffic flow.
|
Page Impression |
According to the "Basic Advertising Measures," from FAST,
an ad industry group, a Page Impression is
"The count of a delivered basic advertising unit from an ad distribution point."
Impressions are how most Web advertising is sold and the cost is quoted in terms
of the cost per thousand impressions (CPM).
|
Polymorphism |
In Object-Oriented terminology, Polymorphism means that an operation can take on a different shape.
Therefore the same operation function can be interpreted differently according to the type of the specific object.
The simplest example is the '+' sign in programming, which means :-
1) 'add' if it applies to numbers
2) 'concatenate' if it applies to character strings,
3) 'Search for Word 1 AND Word 2' when specifying search criteria for a Search Engine.
|
Predicate |
Here is the definition given by Joe Celko in his book
'SQL for Smarties' in the Section on Normalization ...
"A Predicate is a statement of the form A(X), which means that X has the property A.
For example, "John is from Indiana" is a predicate statement; here, "John" is the subject and "is from Indiana" is the predicate.
A relation is a predicate with two or more subjects.
"John and Bob are brothers" is an example of a relation. The common way of visualizing a set of relational statements is
as a table where the columns are attributes of the relation and each row is a specific relational statement.
When Dr.Codd defined the relational model, he gave
12 rules for the vizualisation of the relation as a table."
That's the end of the quote from Joe's book.
I have always thought of a Predicate as a sentence which defines some useful Fact about a Database.
I have developed a General Approach to Database Design
based on this view.
|
RDF |
RDF stands for Resource Description Framework, and
here's a recommended specification from
the WorldWide Web Consortium.
|
Snowflake Design |
A Snowflake design or Schema is used for Data Warehouses,
which are characterised by the need to support a wide range of enquiries,
many of which are unpredictable.
The design is therefore quite different from databases which support
Transaction-processing systems.
A Snowflake has many levels of Dimensions, for example, Product Category,
Sub-Category, Sub-Sub-Category and so on.
|
Star Design |
A Star design is similar to a Snowflake design,(defined above),
except that it contains only one level of Dimension.
For example, a Product Category which also contains Sub-Categories,
Sub-Sub-Categories and so on.
A Star Design is usally preferred to a Snowflake for improved performance.
|
Surrogate Key |
A Surrogate Key is an integer which has no meaning in itself and is used as
a Primary Key in a Database Table.
Values for Surrogate Keys are usually generated automatically, (using a Sequence in Oracle
or an Identity in SQL Server). This guarantees uniqueness in the values.
They are particularly useful in Databases which support Web Sites, where new Users
can register, and have the appropriate Primary Key values generated automatically.
|
Touch Point |
A Touch Point is a point at which organisations can make contact with their Customers.
Call Centers and Web Sites are creating a range of new Touch Points.
A White Paper is available on
the Data Management Review,
with more detail provided by Acxiom.
|
UDDI |
Universal Description, Discovery, and Integration. |
UUID |
Unique Universal identifier -
A UUID (Universal Unique Identifier) is a 128-bit number used to uniquely identify some object or entity on the Internet.
Depending on the specific mechanisms used, a UUID is either guaranteed to be different or is, at least, extremely likely to
be different from any other UUID generated until 3400 A.D.
The UUID relies upon a combination of components to ensure uniqueness.
A guaranteed UUID contains a reference to the network address of the host that generated the UUID, a timestamp
(a record of the precise time of a transaction), and a randomly generated component. Because the network address identifies
a unique computer, and the timestamp is unique for each UUID generated from a particular host, those two components should
sufficiently ensure uniqueness.
However, the randomly generated element of the UUID is added as a protection against any unforseeable problem.
A UUID is specified as part of the tModel data structure, which represents a service type (a generic representation of a
registered service) in the UDDI (Universal Description, Discovery, and Integration) registry.
This mechanism is used to discover Web services.
UUIDs could be generated to refer to almost anything imaginable. Microsoft and some other software companies refer to GUIDs
(global unique identifiers), a type of UUID used to refer to Component Object Module objects and other software components.
The first UUIDs were created in the Network Computing System (NCS), and subsequently became a component of the Distributed
Computing Environment (DCE) of the Open Software Foundation (OSF).
This description ios taken verbatim from SearchWebServices, where
more details are available.
|
UNICODE |
Unicode is the 'Universal Code'. It takes two bytes for each Character. This description is copied from the Microsoft Page on XML.
The Unicode Consortium decided it would be a good idea to define one universal code page (using 2 bytes instead of one per character) that
covers all the languages of the world so that this mapping problem between different code pages would be gone forever.
So if Unicode solves cross-platform character encoding issues, why hasn't it become the only standard?
The first problem is that switching to Unicode sometimes means doubling the size of all your files-which in a network-bound world is not ideal.
Some people therefore still prefer to use the older, single-byte character sets such as ISO-8859-1 to ISO-8859-15, Shift-JIS, EUC-KR, and so forth.
The second problem is that there are still many systems out there that are not Unicode-based at all, which means that on a network, some of the
byte values that make up the Unicode characters can cause major problems for those older systems.
So Unicode Transformation Formats (UTF) have been defined; they use bit-shifting techniques to encode Unicode characters as byte values that will be
"transparent" (or flow through safely) on those older systems.
The most popular of these character encodings is UTF-8. UTF-8 takes the first 127 characters of the Unicode standard (which happen to be the basic
Latin characters, A-Z, a-z, and 0-9, and a few punctuation characters) and maps those directly to single byte values.
It then applies a bit-shifting technique using the high bit of the bytes to encode the rest of the Unicode characters.
The result of all this is that the little Swedish character å (0xE5) becomes the following 2-byte gibberish Ã¥ (0xC3 0xA5).
So unless you can do bit shifting in your head, data encoded in UTF-8 is not human readable.
|
XML |
XML stands for 'Extensible Markup Language'.
It is a flexible way to create common information formats and share the information with other people.
XML allows you to say 'My data is structured like this ...', and then add the data.
In this way, you can exchange information with other people and they can interpret what you are sending them
when they receive it.
This makes XML potentially tremendously powerful. Unfortunately, the power has not been fully exploited,
partly because of a lack of agreement on standards and common definitions about things like Customers.
There are a number of competing standards, in areas such as Business and Finance, and this contributes
to the lack of progress.
Check out our XML Page for more details, and here's the 'official' XML Organization.
|
|
And here's a Glossary of general IT Terms ...
TERM |
DEFINITION |
100% Java |
Sun allows this term to be applied only to programs that are composed entirely of standard Java with no platform dependence. |
Application Service Provider (ASP) |
Not to be confused with Microsoft's Active Server Pages,
this ASP an Internet service provider which hosts Applications which are available to Customers
on a 'pay-as-you-go' basis. |
Application Service Provider (ASP) |
Not to be confused with Microsoft's Active Server Pages,
this ASP an Internet service provider which hosts Applications which are available to Customers
on a 'pay-as-you-go' basis. |
Byte Code |
The output of the Java compiler is in the form of byte codes that are instructions for a Java Virtual Machine (JVM). Since any platform that implements a JVM can execute byte codes, a compiled Java program is portable. |
Client/Server |
Client/server describes the relationship between two computer programs in which one program, the client, makes a service request from another program, the server, which fulfills the request. Although the client/server idea can be used by programs within a single computer, it is a more important idea in a network. In a network, the client/server model provides a convenient way to interconnect programs that are distributed efficiently across different locations. Computer transactions using the client/server model are very common. For example, to check your bank account from your computer, a client program in your computer forwards your request to a server program at the bank. That program may in turn forward the request to its own client program that sends a request to a database server at another bank computer to retrieve your account balance. The balance is returned back to the bank data client, which in turn serves it back to the client in your personal computer, which displays the information for you.
The client/server model has become one of the central ideas of network computing. Most business applications being written today use the client/server model. So does the Internet's main program, TCP/IP. In marketing, the term has been used to distinguish distributed computing by smaller dispersed computers from the "monolithic" centralized computing of mainframe computers. But this distinction has largely disappeared as mainframes and their applications have also turned to the client/server model and become part of network computing.
In the usual client/server model, one server, sometimes called a daemon, is activated and awaits client requests. Typically, multiple client programs share the services of a common server program. Both client programs and server programs are often part of a larger program or application. Relative to the Internet, your Web browser is a client program that requests services (the sending of Web pages or files) from a Web server (which technically is called a Hypertext Transport Protocol or Hypertext Transfer Protocol server) in another computer somewhere on the Internet. Similarly, your computer with TCP/IP installed allows you to make client requests for files from File Transfer Protocol (File Transfer Protocol) servers in other computers on the Internet.
Other program relationship models included master/slave, with one program being in charge of all other programs, and peer-to-peer, with either of two programs able to initiate a transaction.
|
Common Gateway Interface (CGI) |
The conventions for sending data from web browsers for processing by auxiliary programs. A typical use would be submitting a search request to a database. |
Common Object Request Broker Architecture (CORBA) |
An industry standard from the Object Management Group, this standard provides for communication between distributed objects that may be running on different platforms within a network. |
Database Management System (DBMS) |
A generic term for databas products, such as Oracle, DB2, etc.. |
Enterprise JavaBean (EJB) |
The Enterprise JavaBean specification creates a component architecture for creating portable server applications.
Enterprise JavaBeans constitute the core of J2EE applications, which execute on a J2EE platform. |
Java 2 Enterprise Edition (J2EE) |
Sun's highest level of packaging of Java libraries and specifications for application server functions.
A server meeting the J2EE specifications can support multi-tier applications that work with web browsers and network applications on the client side and a variety of databases on the enterprise side.
A J2EE server has many functions that make multi-tier applications simpler for the developer
|
JavaSpaces |
In Sun's Jini system for spontaneous networking, the JavaSpaces API provides a way for applications to read, write, and modify objects in a common storage system. |
Java Database Connectivity (JDBC) |
The JDBC API provides access to a wide range of databases and is now widely supported by database vendors. A J2EE platform must provide JDBC version 2 for access to databases. |
Jini |
Sun's next generation networking approach designed to support distributed computing using Java. |
Java Naming and Directory Interface (JNDI) |
The API that supports naming and directory services for Java programs. This API is used with Enterprise JavaBeans and J2EE systems. |
Java Native Interface (JNI) |
The Java standard for connecting Java programs to platform specific "native" code. |
Java Virtual Machine (JVM) |
This is the core idea of the Java system. A JVM accomplishes all platform specific tasks required in executing a Java program so that 100% Java programs can be truly portable. |
Native (as in native method) |
A native method uses the machine language and operating system required by a specific platform. |
Open Database Connectivity (ODBC) |
Microsoft's database programming interface used to provide a common language for Windows applications to access databases on a network. |
Reflection |
In Java this means the ability of a program to dynamically determine the available functions of an object. |
Remote Method Invocation (RMI) |
The Java programming standard and protocol that allows a program on one machine to execute methods on objects that reside elsewhere in a network. |
Structured Query Language (SQL) |
The very widely used standard for formulating database queries and receiving results. |
TCP/IP |
The protocol used by web browsers and servers that ensures delivery of streams of data. |
Virtual Machine |
The generalized and idealized CPU that Java byte codes are designed for. Virtual because actual implementation of the functions on real systems must be performed by a program specific to particular hardware, the Java Virtual Machine or JVM. |
|