Mastering SQL for Data Science: A Comprehensive Guide
Ø Data:
The piece of any information is called data. E.g. Name of
any person or books or place. The data is only data until it is organized in a
meaningful way at which point it will become information.
Ø Information:
Collection of different types of data is called
information. E.g. Details of ranker in the class.
Ø Table:
Table is a collection of different number of rows and
columns.
Ø Database:
A database can be defined as a collection of coherent and
meaningful data.
Ø DBMS: (Database Management System)
A
Database Management System is essentially a collection of interrelated data and
a set of programs to access this data.
Ø RDBMS: (Relational Database Management System)
RDBMS
is acronym for Relational Database Management System. Dr. Edger F. Codd first
introduces the Relational Database Model in 1970. It allows data to be
represented in a simple row-column format. Each data field is considered as a
column and each record is considered as a row of a table. RDBMS stores data in
to the form of related tables. RDBMS are powerful because they require few
assumptions about how data is related or how it will extracted from the
database.
Ø DBMS V/S RDBMS:
DBMS:
Ø In DBMS relationship between two tables or files are maintained programmatically.
Ø DBMS does not support Client/Server Architecture.
Ø It is also not supported to Distributed Databases.
Ø In DBMS there is not any type of security of data.
Ø Each table is given an extension in DBMS is .dbf
Ø DBMS may satisfy less than 7 to 8 rules of Dr. E. F. Codd.
RDBMS:
ü In
RDBMS relationship between two tables or files can be specified at the time of
table creation.
ü Most of the RDBMS supports Client/Server Architecture.
ü Most of the RDBMS supports Distributed Database.
ü In RDBMS there are multiple levels of security.
1. Logging
at O/S level
2. Command
Level (RDBMS level)
3. Object Level
ü Many tables are grouped in one database in RDBMS.
ü RDBMS usually satisfy more than 7 to 8 rules of Dr. E. F. Codd.
Oracle’s
Query language has structure, just as English or any other language has
structure. This language allows end users to manipulate information of table.
To use SQL you need not to require any of the programming experience. SQL is
database language used for storing and retrieving data from the database.
Rules of SQL:
1) SQL starts with a verb. (SQL action words). E.g. select
2) Each verb is followed by different number of clauses. E.g. from,where..
3) A space separates clauses. E.g. Drop table stud;
4) A comma(,) separates parameters without a clause.
5) A semicolon (;) is used to end a statement.
6) Statements may be split across lines but keywords may not.
7) Reserved words cannot be used as an identifiers unless enclosed with double quotes.
8) Identifiers can contain upto 30 characters and must start with an alphabetic character.
9) Character and Date literals must be enclosed with a single quotes.
10) Comment of single line may be represented with – symbol and multiline comment represented with /* and */.
Ø Introduction of SQL* plus:
SQL* plus is an oracle tool
(specific program) which accepts SQL commands and PL/SQL blocks and also
executes them. SQL* plus enables manipulations of SQL commands and PL/SQL
blocks. It also performs additional tasks such as calculations store and print
query results in the form of reports, list column definitions of any table,
access and copy data between SQL databases and send messages to and accept
responses from the user.
Ø SQL Commands (components):
To communicate with oracle, SQL supports the following categories of languages or commands.
1) DDL (Data Definition Language):
It is
a set of SQL commands used to create, modify and delete database structures but
not data. These commands are normally not used by a general user, who should be
accessing the database via an application. This language supports the following
commands;
Create,
alter, desc, drop, truncate.
It is
the area of SQL that allows changing data within the database. This language
supports the following commands;
Insert,
Update, Delete.
It is
the component of SQL statement that control access to data and to the database.
The database occasionally, DCL statements are grouped with DML statements. This
language supports the following commands;
Grant,
Revoke.
4) TCL (Transaction Control Language):
It is
the language of SQL transaction that allows to save any work, to identify any
point or to restore the database. This language supports the following
commands;
Commit,
Rollback, Savepoint.
It is
the components of SQL statement that allows getting data from the database and
commanding ordering upon it. It includes the select statement; this command is
the heart of the SQL. When a select is fired against a table or tables the
result is also compiled into a further temporary table, which is displayed or
perhaps received by the programs. i.e. front-end. This language supports
following command.
Select
When we
create a table in oracle, a few items should be important. We have to specify a
field or column type at the table creation time.
1) Char(size):
This data type is used to store character strings value of
fixed length. The size in brackets determines the number of characters the cell
can hold. The maximum number of characters this data type can hold is 255 characters.
Default and minimum size is 1 byte.
2) Varchar(size)/Varchar2(size):
This data type is used to store variable length
alphanumeric data. It is a more flexible form of the character data type. The
storage capacity of this data type is 4000 characters.
3) Date:
This data type is used to represent date and time. The
standard format is DD-MON-YY as in 02-JUL-09. By default, the time in a date
field is 12:00:00 am, if no time portion is specified.
4) Number(P,S):
The number data type is used to store numeric values of
virtually. This data type can hold up to 38 digits of precision. in particular
scientific notations the precision (p), determines the maximum length of data,
whereas the scale(s), determines the number of places to the right of the decimal.
5) Long:
This
data types is used to store variable length character string containing upto 2
GB. Only one long value can be defined per table. Long values cannot be used in
subqueries, functions and also expressions where clause or indexes and the
normal character functions.
6) Raw/Long Raw:
This
data type is used to store binary data such as digitized pictures or images.
Raw data types can have a maximum length of 255 bytes. Long Raw data type can
contain up to 2 GB. Values stored in columns having long raw data types cannot
be indexed.
0 Comments