|
6.1.1 Are
there alternatives to row at a time processing?
6.1.2 When
should I execute an sp_recompile?
6.1.3 What are
the different types of locks and what do they mean?
6.1.4 What's
the purpose of using holdlock?
6.1.5 What's the difference between an update in place
versus a deferred update? - see Q1.5.9
6.1.6 How do I
find the oldest open transaction?
6.1.7 How do I
check if log truncation is blocked?
6.1.8 The timestamp
datatype
6.1.9 Stored
Procedure Recompilation and Reresolution
6.1.10 How do I
manipulate binary columns?
6.1.11 How do I remove duplicate rows from a table?
Someone asked how they could speed up their processing. They were batch
updating/inserting gobs of information. Their algorithm was something as follows:
... In another case I do:
If exists (select record) then
update record
else
insert record
I'm not sure which way is faster or if it makes a difference. I am doing this for as
many as 4000 records at a time (calling a stored procedure 4000 times!). I am interesting
in knowing any way to improve this. The parameter translation alone on the procedure calls
takes 40 seconds for 4000 records. I am using exec in DB-Lib.
Would RPC or CT-Lib be better/faster?
A netter responded stating that it was faster to ditch their algorithm and to apply a
set based strategy:
The way to take your approach is to convert the row at a time processing (which is more
traditional type of thinking) into a batch at a time (which is more relational type of
thinking). Now I'm not trying to insult you to say that you suck or anything like that, we
just need to dial you in to think in relational terms.
The idea is to do batches (or bundles) of rows rather than processing a single one at a
time.
So let's take your example (since you didn't give exact values [probably out of
kindness to save my eyeballs] I'll use your generic example to extend what I'm talking
about):
Before:
if exists (select record) then
update record
else
insert record
New way:
- Load all your rows into a table named new_stuff in a separate work
database (call it work_db) and load it using bcp -- no third GL needed.
- truncate new_stuff and drop all indexes
- sort your data using UNIX sort and sort it by the clustered columns
- load it using bcp
- create clustered index using with sorted_data and any ancillary non-clustered
index.
- Assuming that your target table is called old_stuff
- Do the update in a single batch:
begin tran
/* delete any rows in old_stuff which would normally
** would have been updated... we'll insert 'em instead!
** Essentially, treat the update as a delete/insert.
*/
delete old_stuff
from old_stuff,
new_stuff
where old_stuff.key = new_stuff.key
/* insert entire new table: this adds any rows
** that would have been updated before and
** inserts the new rows
*/
insert old_stuff
select * from new_stuff
commit tran
You can do all this without writing 3-GL, using bcp and a shell script.
A word of caution:
Since these inserts/updates are batched orientated you may blow your log if you
attempt to do too many at a time. In order to avoid this use the set rowcount
directive to create bite-size chunks.
Back to top
An sp_recompile should be issued any time a new index is added or an update
statistics. Dropping an index will cause an automatic recompile of all objects that are
dependent on the table.
The sp_recompile command simply increments the schemacnt counter for the
given table. All dependent object counter's are checked against this counter and if they
are different the SQL Server recompiles the object.
Back to top
First off, just to get it out of the way, Sybase does now support row
level locking! (See Q6.1.11 for a
description of the new features.) OK, that said and sone, if you think
you need row level locking, you probably aren't thinking set based -- see
Q6.1.1 for set processing.
The SQL Server uses locking in order to ensure that sanity of your queries. Without
locking there is no way to ensure the integrity of your operation. Imagine a transaction
that debited one account and credited another. If the transaction didn't lock out
readers/writers then someone can potentially see erroneous data.
Essentially, the SQL Server attempts to use the least intrusive lock possible, page
lock, to satisfy a request. If it reaches around 200 page locks, then it escalates the
lock to a table lock and releases all page locks thus performing the task more
efficiently.
There are three types of locks:
- page locks
- table locks
- demand locks
Page Locks
There are three types of page locks:
shared
These locks are requested and used by readers of information. More than one connection
can hold a shared lock on a data page.
This allows for multiple readers.
exclusive
The SQL Server uses exclusive locks when data is to be modified. Only one
connection may have an exclusive lock on a given data page. If a table is large enough and
the data is spread sufficiently, more than one connection may update different data pages
of a given table simultaneously.
update
A update lock is placed during a delete or an update while the SQL Server
is hunting for the pages to be altered. While an update lock is in place, there can be
shared locks thus allowing for higher throughput.
The update lock(s) are promoted to exclusive locks once the SQL Server is ready to
perform the delete/update.
Table Locks
There are three types of table locks:
intent
Intent locks indicate the intention to acquire a shared or exclusive lock on a data
page. Intent locks are used to prevent other transactions from acquiring shared or
exclusive locks on the given page.
shared
This is similar to a page level shared lock but it affects the entire table. This lock
is typically applied during the creation of a non-clustered index.
exclusive
This is similar to a page level exclusive lock but it affects the entire table. If an update
or delete affects the entire table, an exclusive table lock is generated. Also,
during the creation of a clustered index an exclusive lock is generated.
Demand Locks
A demand lock prevents further shared locks from being set. The SQL Server sets a
demand lock to indicate that a transaction is next to lock a table or a page.
This avoids indefinite postponement if there was a flurry of readers when a writer
wished to make a change.
Back to top
All select/readtext statements acquire shared locks (see Q6.1.3) to retrieve their information. After the
information is retrieved, the shared lock(s) is/are released.
The holdlock option is used within transactions so that after the select/readtext
statement the locks are held until the end of the transaction:
- commit transaction
- rollback transaction
If the holdlock is not used within a transaction, the shared locks are released.
Example
Assume we have the following two transactions and that each where-clause qualifies a
single row:
tx #1
begin transaction
/* acquire a shared lock and hold it until we commit */
1: select col_1 from table_a holdlock where id=1
2: update table_b set col_3 = 'fiz' where id=12
commit transaction
tx #2
begin transaction
1: update table_a set col_2 = 'a' where id=1
2: update table_c set col_3 = 'teo' where id=45
commit transaction
If tx#1, line 1 executes prior to tx#2, line 1, tx#2 waits to acquire its exclusive
lock until tx#1 releases the shared level lock on the object. This will not be done until
the commit transaction, thus slowing user throughput.
On the other hand, if tx#1 had not used the holdlock attribute, tx#2 would not have had
to wait until tx#1 committed its transaction. This is because shared level locks are
released immediately (even within transactions) when the holdlock attribute is not used.
Note that the holdlock attribute does not stop another transaction from acquiring a
shared level lock on the object (i.e. another reader). It only stops an exclusive level
lock (i.e. a writer) from being acquired.
Back to top
select h.spid, u.name, p.cmd, h.name, h.starttime,
p.hostname, p.hostprocess, p.program_name
from master..syslogshold h,
master..sysprocesses p,
master..sysusers u
where h.spid = p.spid
and p.suid = u.suid
and h.spid != 0 /* not replication truncation point */
Back to top
System 11 and beyond:
select h.spid, convert(varchar(20), h.name), h.starttime
from master..syslogshold h,
sysindexes i
where h.dbid = db_id()
and h.spid != 0
and i.id = 8 /* syslogs */
and h.page in (i.first, i.first+1) /* first page of log = page of oldest xact */
Back to top
The timestamp datatype is user-defined datatype supplied by Sybase, defined as:
varbinary(8) NULL
It has a special use when used to define a table column. A table may have at most one
column of type timestamp, and whenever a row containing a timestamp column is inserted or
updated the value in the timestamp column is automatically updated. This much is covered
in the documentation.
What isn't covered is what the values placed in timestamp columns actually represent.
It is a common misconception that timestamp values bear some relation to calendar date
and/or clock time. They don't - the datatype is badly-named. SQL Server keeps a counter
that is incremented for every write operation - you can see its current value via the
global variable @@DBTS (though don't try and use this value to predict what
will get inserted into a timestamp column as every connection shares the same counter.)
The value is maintained between server startups and increases monotonically over time
(though again you cannot rely on it this behaviour). Eventually the value will wrap,
potentially causing huge problems, though you will be warned before it does - see Sybase
Technical News Volume 5, Number 1 (see Q10.3.1). You cannot convert this
value to a datetime value - it is simply an 8-byte integer.
Note that the global timestamp value is used for recovery purposes in the event of an
RDMBS crash. As transactions are committed to the log each transaction gets a unique
timestamp value. The checkpoint process places a marker in the log with its unique
timestamp value. If the RDBMS crashes, recovery is the process of looking for transactions
that need to be rolled forward and/or backward from the checkpoint event. If a transaction
spans across the checkpoint event and it never competed it too needs to be rolled back.
Essentially, this describes the write-ahead log protocol described by C.J.
Date in An Introduction to Database Systems.
So what is it for? It was created in order to support the browse-mode functions of
DB-Library (and for recovery as mentioned above). This enables an application to easily
support optimistic locking (See Q1.5.4) by guaranteeing
a watch column in a row will change value if any other column in that row is
updated. The browse functions checked that the timestamp value was still the same as when
the column was read before attempting an update. This behaviour is easy to replicate
without necessarily using the actual client browse-mode functions - just read the
timestamp value along with other data retrieved to the client, and compare the stored
value with the current value prior to an update.
Back to top
When a stored procedure is created, the text is placed in syscomments and a parse tree
is placed in sysprocedures. At this stage there is no compiled query plan.
A compiled query plan for the procedure only ever exists in memory (that is, in
the procedure cache) and is created under the following conditions:
- A procedure is executed for the first time.
- A procedure is executed by a second or subsequent user when the first plan in cache is
still in use.
- The procedure cache is flushed by server restart or cache LRU flush procedure.
- The procedure is executed or created using the with recompile option.
If the objects the procedure refers to change in some way - indexes dropped, table
definition changed, etc - the procedure will be reresolved - which updates
sysprocedures with a modified tree. Before 10.x the tree grows and in extreme cases the
procedure can become too big to execute. This problem disappears in Sybase System 11. This
reresolution will always occur if the stored procedure uses temporary
tables (tables that start with "#").
There is apparently no way of telling if a procedure has been reresolved.
Traceflag 299 offers some relief, see Q1.3.3 for
more information regarding traceflags.
The Official Explanation -- Reresolution and Recompilation Explained
When stored procedures are created, an entry is made in sysprocedures that contains the
query tree for that procedure. This query tree is the resolution of the procedure and the
applicable objects referenced by it. The syscomments table will contain the actual
procedure text. No query plan is kept on disk. Upon first execution, the query tree is
used to create (compile) a query plan (execution plan) which is stored in the procedure
cache, a server memory structure. Additional query plans will be created in cache upon
subsequent executions of the procedure whenever all existing cached plans are in use. If a
cached plan is available, it will be used.
Recompilation is the process of using the existing query tree from sysprocedures to
create (compile) a new plan in cache. Recompilation can be triggered by any one of the
following:
- First execution of a stored procedure,
- Subsequent executions of the procedure when all existing cached query plans are in use,
- If the procedure is created with the recompile option, CREATE PROCEDURE sproc WITH
RECOMPILE
- If execution is performed with the recompile option, EXECUTE sproc WITH RECOMPILE
Re-resolution is the process of updating the query tree in sysprocedures AND
recompiling the query plan in cache. Re-resolution only updates the query tree by adding
the new tree onto the existing sysprocedures entry. This process causes the procedure to
grow in size which will eventually cause an execution error (Msg 703 - Memory request
failed because more than 64 pages are required to run the query in its present form. The
query should be broken up into shorter queries if possible). Execution of a procedure that
has been flagged for re-resolution will cause the re-resolution to occur. To reduce the
size of a procedure, it must be dropped which will remove the entries from sysprocedures
and syscomments. Then recreate the procedure.
Re-resolution can be triggered by various activities most of which are controlled by
SQL Server, not the procedure owner. One option is available for the procedure owner to
force re-resolution. The system procedure, sp_recompile, updates the schema count in
sysobjects for the table referenced. A DBA usually will execute this procedure after
creating new distribution pages by use of update statistics. The next execution of
procedures that reference the table flagged by sp_recompile will have a new query tree and
query plan created. Automatic re-resolution is done by SQL Server in the following
scenarios:
- Following a LOAD DATABASE on the database containing the procedure,
- After a table used by the procedure is dropped and recreated,
- Following a LOAD DATABASE of a database where a referenced table resides,
- After a database containing a referenced table is dropped and recreated,
- Whenever a rule or default is bound or unbound to a referenced table.
Forcing automatic compression of procedures in System 10 is done with trace flag 241.
System 11 should be doing automatic compression, though this is not certain.
When are stored procedures compiled?
Stored procedures are in a database as rows in sysprocedures, in the form of parse
trees. They are later compiled into execution plans.
A stored procedures is compiled:
- with the first EXECute, when the parse tree is read into cache
- with every EXECute, if CREATE PROCEDURE included WITH RECOMPILE
- with each EXECute specifying WITH RECOMPILE
- if the plans in cache for the procedure are all in use by other processes
- after a LOAD DATABASE, when all procedures in the database are recompiled
- if a table referenced by the procedure can not be opened (using object id), when
recompilation is done using the table's name
- after a schema change in any referenced table, including:
- CREATE INDEX or DROP INDEX to add/delete an index
- ALTER TABLE to add a new column
- sp_bindefault or sp_unbindefault to add/delete a default
- sp_bindrule or sp_unbindrule to add/delete a rule
- after EXECute sp_recompile on a referenced table, which increments sysobjects.schema and
thus forces re-compilation
What causes re-resolution of a stored procedure?
When a stored procedure references an object that is modified after the creation of the
stored procedure, the stored procedure must be re-resolved. Re-resolution is the process
of verifying the location of referenced objects, including the object id number.
Re-resolution will occur under the following circumstances:
- One of the tables used by the stored procedure is dropped and re-created.
- A rule or default is bound to one of the tables (or unbound).
- The user runs sp_recompile on one of the tables.
- The database the stored procedure belongs to is re-loaded.
- The database that one of the stored procedure's tables is located in is re-loaded.
- The database that one of the stored procedure's tables is located in is dropped and
re-created.
What will cause the size of a stored procedure to grow?
Any of the following will result in a stored procedure to grow when it is recompiled:
- One of the tables used in the procedure is dropped and re-created.
- A new rule or default is bound to one of the tables or the user runs sp_recompile on one
of the tables.
- The database containing the stored procedure is re-loaded.
Other things causing a stored procedure to be re-compiled will not cause it to grow.
For example, dropping an index on one of the tables used in the procedure or doing EXEC
WITH RECOMPILE.
The difference is between simple recompilation and re-resolution. Re-resolution happens
when one of the tables changes in such a way that the query trees stored in sysprocedures
may be invalid. The datatypes, column offsets, object ids or other parts of the tree may
change. In this case, the server must re-allocate some of the query tree nodes. The old
nodes are not de-allocated (there is no way to do this within a single procedure header),
so the procedure grows. In time, trying to execute the stored procedure will result in a
703 error about exceeding the 64 page limit for a query.
Back to top
The question was posed - How do we manipulate varbinary columns, given that some
portion - like the 5th and 6th bit of the 3rd byte - of a (var)binary column, needs to be
updated? Here is one approach, provided by Bret Halford (bret@sybase.com),
using stored procedures to set or clear certain bits of a certain byte of a field of a row
with a given id:
drop table demo_table
drop procedure clear_bits
drop procedure set_bits
go
create table demo_table (id numeric(18,0) identity, binary_col
binary(20))
go
insert demo_table values (0xffffffffffffffffffffffffffffffffffffffff)
insert demo_table values (0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)
insert demo_table values (0x0000000000000000000000000000000000000000)
go
create procedure clear_bits (
@id numeric(18,0), -- primary key of row to be changed
@bytenum tinyint, -- specifies which byte of binary_col to change
@mask binary(1) -- bits to be cleared are zeroed,
-- bits left alone are turned on
-- so 0xff = clear all, 0xfb = clear bit 3
)
as
update demo_table set binary_col =
substring(binary_col,1,@bytenum-1)+
convert(binary(1),
convert(tinyint,substring(binary_col,@bytenum,1)) &
convert(tinyint,@mask)
)+
substring(binary_col,@bytenum+1,20)
from demo_table
where id = @id
go
create procedure set_bits (
@id numeric(18,0), -- primary key of row to be changed
@bytenum tinyint, -- specifies which byte of binary_col to change
@mask binary(1)) -- bits to be set are turned on
-- bits left alone are zeroed
-- so 0xff = set all, 0xfb = set all but 3
)
as
update demo_table set binary_col =
substring(binary_col,1,@bytenum-1)+
convert(binary(1),
convert(tinyint,substring(binary_col,@bytenum, 1)) |
convert(tinyint,@mask)
)+
substring(binary_col,@bytenum+1,20)
from demo_table
where id = @id
go
select * from demo_table
-- clear bits 2,4,6,8 of byte 1 of row 1
exec clear_bits 1,1,0xAA
-- set bits 1-8 of byte 20 of row 3
exec set_bits 3,20,0xff
-- clear bits 1-8 of byte 4 of row 2
exec clear_bits 2,4,0xff
-- clear bit 3 of byte 5 of row 2
exec clear_bits 2,5,0x08
exec clear_bits 2,6,0x0f
exec set_bits 2,10,0xff
go
select * from demo_table
go
Back to top
There are a number of different ways to achieve this, depending on what
you are trying to achieve. Usually, you are trying to remove duplication
of a certain key due to changes in business rules or recognition of a
business rule that was not applied when the database was originally
built.
Probably the quickest method is to build a copy of the original table:
select *
into temp_table
from base_table
where 1=0
Create a unique index on the columns that covers the duplicating rows
with the ignore_dup_key attribute. This may be more columns
that the key for the table.
create unique index temp_idx
on temp_table(col1, col2, ..., colN)
with ignore_dup_key
Now, insert base_table into temp_table.
insert temp_table
select * from base_table
You probably want to ensure you have a very good backup of the
base_table at this point, coz your going to clear it out! You will also
want to check to ensure that the temp_table includes the rows you need.
You also need to ensure that there are no triggers on the base table
(remember to keep a copy!) or RI constraints. You probably do not want any
of these to fire, or if they do, you are aware of the implications.
Now you have a couple of choices. You can simply drop the original
table and rename the temp table to the same name as the base table.
Alternatively, truncate the table and insert from the temp_table into the
original table. You would need to do this last if you did need the RI to
fire on the table etc. I suspect that in most cases dropping and renaming
will be the best option.
If you want to simply see the duplicates in a table, the following query
will help:
select key1, key2, ...
from base_table
group by key1, key2, key3, key4, ...
having count(*) > 1
Sybase will actually allow a "select *", but it is not guaranteed to
work.
Back to top
|