Search This Blog

Showing posts with label database. Show all posts
Showing posts with label database. Show all posts

Tuesday, 5 July 2011

UNIQUE Column with multiple NULL values

UNIQUE Column with multiple NULL values

As you know, when you create a UNIQUE constraint on a nullable column, SQL Server allows only one NULL value, thereby maintaining the UNIQUEness. However, there are situations when we need more than one NULL value in the column but still have to maintain uniqueness, ignoring all those NULL values.

In this article, I am going to illustrate how to maintain uniqueness on a column and also allow multiple NULL values.

Let us assume that we have a database, MyDB, and we want to create a table that holds social security numbers, as shown below.

USE [master]
GO
IF  EXISTS (SELECT name FROM sys.databases WHERE name = N'Employee')
DROP DATABASE [Employee]
go
Create Database [Employee]
go
use [Employee]
go


Create Table Emp
([Employee id] int not NULL constraint Emp_pk primary key clustered,
[First Name] varchar(100) NULL,
[Last Name] varchar(100)  NULL,
[Nick Name] varchar(100) NULL,
[Social Security Number] int NULL)
go


We know social security number is unique, so let us add UNIQUE constraint to the column [Social Security Number].

Alter table Emp Add constraint ssn_unique UNIQUE 
 ([Social Security Number] )


Now let us try to insert some data as shown below.

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(1,'Robert','Bates','Bob',111213422)


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(2,'Robert','Bates','Bob',121213422)


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(3,'Robert','William','Rob',131213422)


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(4,'Sonia','Keira','Sony',131413426)


As we know, social security number is unique. However, in situations of people entering this country using a work visa, it would take sometime for them to get a social security number. Until that time, the data would be NULL.

Now let us try to insert one row without Social Security Number as shown below.

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(5,'Mellisa','Brown','Mel',NULL)

Result
(1 row(s) affected)


Now let us try to insert another row without Social Security Number as shown below.

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(6,'Sibey','Chikhs','Ciby',NULL)

Result
Msg 2627, Level 14, State 1, Line 1
Violation of UNIQUE KEY constraint 'ssn_unique'. 
 Cannot insert duplicate key in object 'dbo.Emp'.
The statement has been terminated.

This is the normal behaviour of UNIQUE constrain on a NULL column. It allows one row of data with NULL values. However, that is not the behaviour we want for this column. We want the column to accept unique values and also accept multiple NULL values.

This can be achieved using a computed column and adding a contraint to the computed column instead of on the actual Social Security Number column.

Now let us recreate the database from the scratch, only this time we will add a computed column as shown below.

USE [master]
GO
IF  EXISTS (SELECT name FROM sys.databases WHERE name = N'Employee')
DROP DATABASE [Employee]
go
Create Database [Employee]
go
use [Employee]
go


Create Table Emp
([Employee id] int not NULL constraint Emp_pk primary key clustered,
[First Name] varchar(100) NULL,
[Last Name] varchar(100)  NULL,
[Nick Name] varchar(100) NULL,
[Social Security Number] int NULL)
go


Alter table Emp Add MySSN as case when [Social Security Number]
 is NULL then [Employee id] else [Social Security Number] end
go


Now let us add the UNIQUE constraint to the computed column as shown below.

Alter table Emp Add constraint ssn_unique UNIQUE ([MySSN] )
go


Now let us try to insert some data as shown below.

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(1,'Robert','Bates','Bob',111213422)


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(2,'Robert','Bates','Bob',121213422)


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(3,'Robert','William','Rob',131213422)


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(4,'Sonia','Keira','Sony',131413426)


Let’s try to insert one row without Social Security Number as shown below.

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(5,'Mellisa','Brown','Mel',NULL)

Result
(1 row(s) affected)

Try to insert another row without Social Security Number:

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(6,'Sibey','Chikhs','Ciby',NULL)

Result
(1 row(s) affected)


If the Social Security Number column is a varchar column then we can create the computed column and the constraint as shown below in order to achieve the same goal of having unqiueness with multiple NULL values.

USE [master]
GO
IF  EXISTS (SELECT name FROM sys.databases WHERE name = N'Employee')
DROP DATABASE [Employee]
go
Create Database [Employee]
go
use [Employee]
go


Create Table Emp
([Employee id] int not NULL constraint Emp_pk primary key clustered,
[First Name] varchar(100) NULL,
[Last Name] varchar(100)  NULL,
[Nick Name] varchar(100) NULL,
[Social Security Number] varchar(12) NULL)
go


Let’s create the computed column for Social Security Number as shown below.

Alter table Emp Add MySSN as case when [Social Security Number]
 is NULL then convert(varchar(12),[Employee id]) else [Social Security Number] end
go


Now let’s add the UNIQUE constraint to the computed column:

Alter table Emp Add constraint ssn_unique UNIQUE ([MySSN] )
go


Let’s try to insert some data as shown below.

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(1,'Robert','Bates','Bob','111-21-3422')


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(2,'Robert','Bates','Bob','121-21-3422')


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(3,'Robert','William','Rob','131-21-3422')


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(4,'Sonia','Keira','Sony','131-41-3426')
Now try to insert one row without Social Security Number:


Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(5,'Mellisa','Brown','Mel',NULL)

Result
(1 row(s) affected)

Now let’s try to insert another row without Social Security Number as shown below.

Insert into Emp ([Employee id],[First Name],[Last Name],
[Nick Name],[Social Security Number])
values(6,'Sibey','Chikhs','Ciby',NULL)

Result
(1 row(s) affected)

Conclusion
We have created a column on a table, which holds UNIQUE values and also multiple NULL values using Primary key values, computed column and UNIQUE constraint on the computed column.

Sunday, 29 May 2011

Which DBMS,Database Google is using??....It's Google Bigtable

Which DBMS,Database Google is using??....It's Google Bigtable

Google Bigtable: A Distributed Storage System for Structured Data


Google Bigtable
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.


Some features:-
* Fast and extremely large-scale DBMS.
* A sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases.
* Designed to scale into the petabyte range
* It works across hundreds or thousands of machines
* It is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration
* Each table has multiple dimensions (one of which is a field for time, allowing versioning)
* Tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.

Architecture:-
BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each
cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time."In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows
tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.

Implementation:-
BigTable is built on Google File System (GFS), which is used as a backing store for log and data files. GFS provides reliable storage for SSTables, a Google-proprietary file format used to persist table data. Another service that BigTable makes heavy use of is Chubby, a highly-available, reliable distributed lock service. Chubby allows clients to take a lock, possibly associating it with some metadata, which it can renew by sending keep alive messages back to Chubby. The locks are stored in a filesystem-like hierarchical naming structure.
There are three primary server types of interest in the Bigtable system:
* Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
* Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
* Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

API:-
Typical operations to BigTable are creation and deletion of tables and column families, writing data and deleting columns from a row. BigTable provides this functions to application developers in an API. Transactions are supported at the row level, but not across several row keys.
Related Posts Plugin for WordPress, Blogger...