ChatGPT解决这个技术问题 Extra ChatGPT

What are the differences between a clustered and a non-clustered index?

What are the differences between a clustered and a non-clustered index?

You can only have one clustered index per table. But there are plenty of other differences...
A clustered index actually describes the order in which records are physically stored on the disk, hence the reason you can only have one. A Non-Clustered Index defines a logical order that does not match the physical order on disk.
Clustered basically means that the data is in that phisical order in the table. This is why you can have only one per table. Unclustered means it's "only" a logical order.
@biri what is "logical" order? a Non clustered index stores the index keys in order physically and it stores a pointer to the table, namely the clustered index key.
@Stephanie Page: logical from the table point of view. Of course non-clustered indexes are ordered physically in the index itself.

M
Martynnw

Clustered Index

Only one per table

Faster to read than non clustered as data is physically stored in index order

Non Clustered Index

Can be used many times per table

Quicker for insert and update operations than a clustered index

Both types of index will improve performance when select data with fields that use the index but will slow down update and insert operations.

Because of the slower insert and update clustered indexes should be set on a field that is normally incremental ie Id or Timestamp.

SQL Server will normally only use an index if its selectivity is above 95%.


There are also storage considerations. When inserting rows into a table with no clustered index, the rows are stored back to back on the page and updating a row may result in the row being moved to the end of table, leaving empty space and fragmenting the table and indexes.
you don't have to care what is x. All you need to know is that for an app with millions of users, x will be significant
It's purely dogma. It's not "faster to read because the data is stored in order". It's faster to read because you avoid an index read AND THEN the table read. It's faster to range scan (if that's meaningful) because the data is stored in order. i.e. the clustering factor is perfect.
Also the idea that 95% of the records need to be unique is a fallacy. Say you have a table with 1,000,000 rows and you index a column with 500,000 keys. 0% are unique but each key returns 2 out of a million rows. This index is absolutely useful regardless that 0% of the records are unique.
"data is physically stored in index order" what do you mean by that? At one level it is trivially true because the data pages and the index leaf pages are one and the same - so obviously the ordering of one describes the ordering of the other. However this is not necessarily in any particular order such as order of the index key stackoverflow.com/questions/1251636/…
r
rslite

Clustered indexes physically order the data on the disk. This means no extra data is needed for the index, but there can be only one clustered index (obviously). Accessing data using a clustered index is fastest.

All other indexes must be non-clustered. A non-clustered index has a duplicate of the data from the indexed columns kept ordered together with pointers to the actual data rows (pointers to the clustered index if there is one). This means that accessing data through a non-clustered index has to go through an extra layer of indirection. However if you select only the data that's available in the indexed columns you can get the data back directly from the duplicated index data (that's why it's a good idea to SELECT only the columns that you need and not use *)


'However if you select only the data that's available in the indexed columns you can get the data back directly from the duplicated index data' - yes that is the important exception to the prefer clustered index heuristic. I guess in this case you essentially have a clustered index, but less data in the table you are querying so potentially it can be read faster off disk.
S
Santiago Cepas

Clustered indexes are stored physically on the table. This means they are the fastest and you can only have one clustered index per table.

Non-clustered indexes are stored separately, and you can have as many as you want.

The best option is to set your clustered index on the most used unique column, usually the PK. You should always have a well selected clustered index in your tables, unless a very compelling reason--can't think of a single one, but hey, it may be out there--for not doing so comes up.


can you elaborate more on "we should always have a clustered index in our tables" ? without elaboration that statement is simply wrong because of the word always
You're right Pacerier, one shouldn't use absolute statements lightly. Though I don't know of a single case when you shouldn't have a well selected clustered index, such case might exist so I've changed my answer to a more generic version.
C
Community

Clustered Index

There can be only one clustered index for a table. Usually made on the primary key. The leaf nodes of a clustered index contain the data pages.

Non-Clustered Index

There can be only 249 non-clustered indexes for a table(till sql version 2005 later versions support upto 999 non-clustered indexes). Usually made on the any key. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.


L
Lasitha Yapa

Clustered Index

Only one clustered index can be there in a table

Sort the records and store them physically according to the order

Data retrieval is faster than non-clustered indexes

Do not need extra space to store logical structure

Non Clustered Index

There can be any number of non-clustered indexes in a table

Do not affect the physical order. Create a logical order for data rows and use pointers to physical data files

Data insertion/update is faster than clustered index

Use extra space to store logical structure

Apart from these differences you have to know that when table is non-clustered (when the table doesn't have a clustered index) data files are unordered and it uses Heap data structure as the data structure.


P
Pritam Banerjee

Pros:

Clustered indexes work great for ranges (e.g. select * from my_table where my_key between @min and @max)

In some conditions, the DBMS will not have to do work to sort if you use an orderby statement.

Cons:

Clustered indexes are can slow down inserts because the physical layouts of the records have to be modified as records are put in if the new keys are not in sequential order.


P
Pritam Banerjee

Clustered basically means that the data is in that physical order in the table. This is why you can have only one per table.

Unclustered means it's "only" a logical order.


J
Josh

A clustered index actually describes the order in which records are physically stored on the disk, hence the reason you can only have one.

A Non-Clustered Index defines a logical order that does not match the physical order on disk.


s
supercat

An indexed database has two parts: a set of physical records, which are arranged in some arbitrary order, and a set of indexes which identify the sequence in which records should be read to yield a result sorted by some criterion. If there is no correlation between the physical arrangement and the index, then reading out all the records in order may require making lots of independent single-record read operations. Because a database may be able to read dozens of consecutive records in less time than it would take to read two non-consecutive records, performance may be improved if records which are consecutive in the index are also stored consecutively on disk. Specifying that an index is clustered will cause the database to make some effort (different databases differ as to how much) to arrange things so that groups of records which are consecutive in the index will be consecutive on disk.

For example, if one were to start with an empty non-clustered database and add 10,000 records in random sequence, the records would likely be added at the end in the order they were added. Reading out the database in order by the index would require 10,000 one-record reads. If one were to use a clustered database, however, the system might check when adding each record whether the previous record was stored by itself; if it found that to be the case, it might write that record with the new one at the end of the database. It could then look at the physical record before the slots where the moved records used to reside and see if the record that followed that was stored by itself. If it found that to be the case, it could move that record to that spot. Using this sort of approach would cause many records to be grouped together in pairs, thus potentially nearly doubling sequential read speed.

In reality, clustered databases use more sophisticated algorithms than this. A key thing to note, though, is that there is a tradeoff between the time required to update the database and the time required to read it sequentially. Maintaining a clustered database will significantly increase the amount of work required to add, remove, or update records in any way that would affect the sorting sequence. If the database will be read sequentially much more often than it will be updated, clustering can be a big win. If it will be updated often but seldom read out in sequence, clustering can be a big performance drain, especially if the sequence in which items are added to the database is independent of their sort order with regard to the clustered index.


E
Ed Guiness

A clustered index is essentially a sorted copy of the data in the indexed columns.

The main advantage of a clustered index is that when your query (seek) locates the data in the index then no additional IO is needed to retrieve that data.

The overhead of maintaining a clustered index, especially in a frequently updated table, can lead to poor performance and for that reason it may be preferable to create a non-clustered index.


N
Nandkishor Nangre

You might have gone through theory part from the above posts:

-The clustered Index as we can see points directly to record i.e. its direct so it takes less time for a search. Additionally it will not take any extra memory/space to store the index

-While, in non-clustered Index, it indirectly points to the clustered Index then it will access the actual record, due to its indirect nature it will take some what more time to access.Also it needs its own memory/space to store the index

https://i.stack.imgur.com/kFSWR.png


D
Deepak Mishra

// Copied from MSDN, the second point of non-clustered index is not clearly mentioned in the other answers.

Clustered

Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.

The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

Nonclustered

Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.

The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.


T
Techie Boy

Clustered Indexes

Clustered Indexes are faster for retrieval and slower for insertion and update.

A table can have only one clustered index.

Don't require extra space to store logical structure.

Determines the order of storing the data on the disk.

Non-Clustered Indexes

Non-clustered indexes are slower in retrieving data and faster in insertion and update.

A table can have multiple non-clustered indexes.

Require extra space to store logical structure.

Has no effect of order of storing data on the disk.