ChatGPT解决这个技术问题 Extra ChatGPT

Are there any disadvantages to always using nvarchar(MAX)?

In SQL Server 2005, are there any disadvantages to making all character fields nvarchar(MAX) rather than specifying a length explicitly, e.g. nvarchar(255)? (Apart from the obvious one that you aren't able to limit the field length at the database level)

I jsut don't understand why you'd want to let someone enter in a name of 8000+ characters.
The same logic can be applied to programming languages. Why not go back to the old VB6 variant for all our data? I don't think having checks and balances in more than one place is necessarily bad.
Your update should be its own answer to this question.
answer-in-question moved to proper answer seeing as the original author hasn't done it. stackoverflow.com/a/35177895/10245 I figure 7 years is enough time :-)

C
Community

Same question was asked on MSDN Forums:

Varchar(max) vs Varchar(255)

From the original post (much more information there):

When you store data to a VARCHAR(N) column, the values are physically stored in the same way. But when you store it to a VARCHAR(MAX) column, behind the screen the data is handled as a TEXT value. So there is some additional processing needed when dealing with a VARCHAR(MAX) value. (only if the size exceeds 8000) VARCHAR(MAX) or NVARCHAR(MAX) is considered as a 'large value type'. Large value types are usually stored 'out of row'. It means that the data row will have a pointer to another location where the 'large value' is stored...


So should the question be, is there a difference between using N/VARCHAR(MAX) and N/TEXT?
If I recall correctly, aren't they only stored out of row if the size exceeds 8k?
I read the answer as "no, there is no disadvantage to using N/VARCHAR(MAX)" because there is additional processing "only if the size exceeds 8000". Thus, you incur the cost only when necessary, and your database is less restrictive. Am I reading this wrong? Seems like you would almost always want N/VARCHAR(MAX) rather than N/VARCHAR(1-8000)...
Dead link above - the working link for the question on MSDN is social.msdn.microsoft.com/Forums/en-US/sqlgetstarted/thread/…
Unfortunately this answer has a number of problems. It makes the 8k boundary seem like a magic number, is not true, the value gets pushed out of row based on may more factors, including sp_tableoptions: msdn.microsoft.com/en-us/library/ms173530.aspx. VARCHAR(255) types can also b pushed out of row, the 'overhead' mentioned may be exactly the same for MAX and 255. It compares MAX types with TEXT types, when they are distinct as it gets (completely different API to manipulate, different storage etc). It fails to mention the actual differences: no index, no online operations on MAX types
L
Luke Girvin

It's a fair question and he did state apart from the obvious…

Disadvantages could include:

Performance implications Query optimizer uses field size to determine most efficent exectution plan

"1. The space alloction in extends and pages of the database are flexible. Thus when adding information to the field using update, your database would have to create a pointer if the new data is longer than the previous inserted. This the database files would become fragmented = lower performance in almost everything, from index to delete, update and inserts. " http://sqlblogcasts.com/blogs/simons/archive/2006/02/28/Why-use-anything-but-varchar_2800_max_2900_.aspx

Integration implications - hard for other systems to know how to integrate with your database Unpredictable growth of data Possible security issues e.g. you could crash a system by taking up all disk space

There is good article here: http://searchsqlserver.techtarget.com/tip/1,289483,sid87_gci1098157,00.html


+1 For the integration and security implications. These are an original angle to consider when most other answers talk about performance. Related to the integration implications any tools (such as report writers or form designers) that use meta data to provide reasonable default control sizes would require much more work to use if all columns were varchar(max).
Integration by database is most ridiculous thing what I know. If it is just import made once you can check data before by LEN function.
2
2 revs, 2 users 76%

Based on the link provided in the accepted answer it appears that:

100 characters stored in an nvarchar(MAX) field will be stored no different to 100 characters in an nvarchar(100) field - the data will be stored inline and you will not have the overhead of reading and writing data 'out of row'. So no worries there. If the size is greater than 4000 the data would be stored 'out of row' automatically, which is what you would want. So no worries there either.

However...

You cannot create an index on an nvarchar(MAX) column. You can use full-text indexing, but you cannot create an index on the column to improve query performance. For me, this seals the deal...it is a definite disadvantage to always use nvarchar(MAX).

Conclusion:

If you want a kind of "universal string length" throughout your whole database, which can be indexed and which will not waste space and access time, then you could use nvarchar(4000).


fyi this was an edit added to the original question that should have been posted as an answer
Thanks, for me this is the final answer. I asked myself the same - why not use nvarchar(max) all the time - like string in C#? - but point 3) (the index issue) is giving the answer.
Added an edit. As a kind of "universal string length", you could always use nvarchar(4000)
@SQLGeorge See this excellent answer by Martin Smith on the impact to query performance of declaring columns wider than they ever will be
@billinkc Thanks, that's a great article. OK, so the size indeed affects perfromance. I will edit the answer again.
B
Bill Karwin

Sometimes you want the data type to enforce some sense on the data in it.

Say for example you have a column that really shouldn't be longer than, say, 20 characters. If you define that column as VARCHAR(MAX), some rogue application could insert a long string into it and you'd never know, or have any way of preventing it.

The next time your application uses that string, under the assumption that the length of the string is modest and reasonable for the domain it represents, you will experience an unpredictable and confusing result.


I agree with this, and some of the other comments, but I still maintain that this is the responsibility of the business tier. By the time it reaches the database tier, it should snap a salute and store the value, no matter how ridiculously long it is. I think what's really at play here is that I think about 90% of the time when a developer specifies varchar(255), his intent is not really 255 characters, but some unspecified middling length value. And given the trade off between unreasonably large values in my db and unforeseen exceptions, I'll take the large values.
If they are specifying VARCHAR(255) to indicate some unknown length then that's their fault for not properly researching what they're designing. The solution is for the developer to do their job, not for the database to allow unreasonable values.
not helpful to the author. he explicitly excluded this question you gave an answer to.
@ Chris B Behrens: I disagree; the database schema is part of the business logic. The choice of tables, relations, fields and data types is all business logic - and it's worth using the RDBMS to enforce the rules of this business logic. For one reason, it's very rare that there is only one application tier accessing the database; for example you might have data import and extract tools which bypass the main business tier, which means you really need the DB to be enforcing the rules.
If you don't need or indeed want long strings to be stored, then its better to enforce sense on the data. Eg if storing a PostCode field, would you let someone enter hundreds or thousands of characters when it should be a max of 10 say.- the max size should be validated all levels, client, business tier AND database. If using a Model First approach such as with C# and Entity Framework, you can define you maxsize at the model and have it applied to the database, the business logic, and the client validation (with the likes of jquery validation). Only use nvarchar(max) if its really required
Q
QMaster

I checked some articles and find useful test script from this: http://www.sqlservercentral.com/Forums/Topic1480639-1292-1.aspx Then changed it to compare between NVARCHAR(10) vs NVARCHAR(4000) vs NVARCHAR(MAX) and I don't find speed difference when using specified numbers but when using MAX. You can test by yourself. Hope This help.

SET NOCOUNT ON;

--===== Test Variable Assignment 1,000,000 times using NVARCHAR(10)
DECLARE @SomeString NVARCHAR(10),
        @StartTime DATETIME;
--=====         
 SELECT @startTime = GETDATE();
 SELECT TOP 1000000
        @SomeString = 'ABC'
   FROM master.sys.all_columns ac1,
        master.sys.all_columns ac2;
 SELECT testTime='10', Duration = DATEDIFF(ms,@StartTime,GETDATE());
GO
--===== Test Variable Assignment 1,000,000 times using NVARCHAR(4000)
DECLARE @SomeString NVARCHAR(4000),
        @StartTime DATETIME;
 SELECT @startTime = GETDATE();
 SELECT TOP 1000000
        @SomeString = 'ABC'
   FROM master.sys.all_columns ac1,
        master.sys.all_columns ac2;
 SELECT testTime='4000', Duration = DATEDIFF(ms,@StartTime,GETDATE());
GO
--===== Test Variable Assignment 1,000,000 times using NVARCHAR(MAX)
DECLARE @SomeString NVARCHAR(MAX),
        @StartTime DATETIME;
 SELECT @startTime = GETDATE();
 SELECT TOP 1000000
        @SomeString = 'ABC'
   FROM master.sys.all_columns ac1,
        master.sys.all_columns ac2;
 SELECT testTime='MAX', Duration = DATEDIFF(ms,@StartTime,GETDATE());
GO

That's interesting. On my box it seems the MAX one is 4x slower.
New results on SQL Server 2012: 10 is twice slower than 4k and MAX is 5.5 times slower than 4k.
The majority of the time is the implicit cast from varchar to nvarchar(max). Try this one: DECLARE \@SomeString NVARCHAR(MAX), \@abc NVARCHAR(max) = N'ABC', \@StartTime DATETIME; SELECT @startTime = GETDATE(); SELECT TOP 1000000 \@SomeString = \@abc FROM master.sys.all_columns ac1, master.sys.all_columns ac2; SELECT testTime='MAX', Duration = DATEDIFF(ms,\@StartTime,GETDATE()); Had to insert \ before variables to get it to post.
SQL Server 2014 on SSD: 150, 156, 716 (10, 4000, MAX).
Thanks for adding some real numbers to this discussion. We often forget that building a test case is the quickest way to insight.
A
Alex

Think of it as just another safety level. You can design your table without foreign key relationships - perfectly valid - and ensure existence of associated entities entirely on the business layer. However, foreign keys are considered good design practice because they add another constraint level in case something messes up on the business layer. Same goes for field size limitation and not using varchar MAX.


N
Nick Kavadias

A reason NOT to use max or text fields is that you cannot perform online index rebuilds i.e. REBUILD WITH ONLINE= ON even with SQL Server Enterprise Edition.


This same restriction is in place for the TEXT field type, so you should still use VARCHAR(MAX) instead of TEXT.
we couldn't rebuild our clustered index due to this. it cost us much disk space until we could extract out the column into its own table (we couldn't afford to lock the table for greather than 7 seconds)
m
mattruma

The only problem I found was that we develop our applications on SQL Server 2005, and in one instance, we have to support SQL Server 2000. I just learned, the hard way that SQL Server 2000 doesn't like the MAX option for varchar or nvarchar.


So why not just develop on the lowest common denominator?
R
RichardOD

Bad idea when you know the field will be in a set range- 5 to 10 character for example. I think I'd only use max if I wasn't sure what the length would be. For example a telephone number would never be more than a certain number of characters.

Can you honestly say you are that uncertain about the approximate length requirements for every field in your table?

I do get your point though- there are some fields I'd certainly consider using varchar(max).

Interestingly the MSDN docs sum it up pretty well:

Use varchar when the sizes of the column data entries vary considerably. Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.

There's an interesting discussion on the issue here.


For things like telephone numbers, I would be much more agreeable to using a char field instead of varchar. As long as you are maintaining a standard in your storage and you don't have to worry about phone numbers from different countries, you should never need a variable field for something like a phone number (10 without any formatting) or zip code (5 or 9-10 if you add the last four digits), etc.
I was referring to telephone numbers that can vary in length. Perhaps I should of put this in the answer. Anything that is fixed length I would use a char field.
Or perhaps I should of said in my comment nchar or char. :-)
The number of characters in a telephone number is pretty much a business requirement. If you were required to store the international standard code along with the number, it could be more than 10. Or, some part of the world might have more than 10 digits for a phone number. Imagine the case of IPV4 to IPV6 transition. No one would have argued that we needed more than 12 digits in the good old IPV4 days. It may not hold good if IPV6 becomes prevalent. This is again a business rule change over a period of time. As it's said, change is the only constant thing we can expect :)
Be careful about assuming you know how many characters can be in a telephone number field, or what kind of characters they will be. Unless the system uses that data to actually dial out (in which case you have to be strict about formats), then the user might legitimately put unexpectedly long strings there e.g. "0123 456 78910 ask reception for extension 45 and then transfer to James".
T
Tom H

The job of the database is to store data so that it can be used by the enterprise. Part of making that data useful is ensuring that it is meaningful. Allowing someone to enter an unlimited number of characters for their first name isn't ensuring meaningful data.

Building these constraints into the business layer is a good idea, but that doesn't ensure that the database will remain intact. The only way to guarantee that the data rules are not violated is to enforce them at the lowest level possible in the database.


IMO, the data length limitations are purely based on the business rules, which can change over a period of time, as the application grows. Changing business rules at the business logic is easier than at the db level. So, i think the db should be flexible enough and shouldn't be tied to business rules such as max permitted length of first name, which is very much dependant on the part of the world you where your user lives in.
H
Harry Cooper

As was pointed out above, it is primarily a tradeoff between storage and performance. At least in most cases.

However, there is at least one other factor that should be considered when choosing n/varchar(Max) over n/varchar(n). Is the data going to be indexed (such as, say, a last name)? Since the MAX definition is considered a LOB, then anything defined as MAX is not available for indexing. and without an index, any lookup involving the data as predicate in a WHERE clause is going to be forced into a Full Table scan, which is the worst performance you can get for data lookups.


u
user2864740

As of SQL Server 2019, NVARCHAR(MAX) still does not support SCSU “Unicode compression” — even when stored using In-Row data storage. SCSU was added in SQL Server 2008 and applies to any ROW/PAGE-compressed tables and indices.

As such, NVARCHAR(MAX) can take up to twice as much physical disk space as a NVARCHAR(1..4000) field with the same text content+ — even when not stored in the LOB. The non-SCSU waste depends on data and language represented.

Unicode Compression Implementation:

SQL Server uses an implementation of the Standard Compression Scheme for Unicode (SCSU) algorithm to compress Unicode values that are stored in row or page compressed objects. For these compressed objects, Unicode compression is automatic for nchar(n) and nvarchar(n) columns [and is never used with nvarchar(max)].

On the other hand, PAGE compression (since 2014) still applies to NVARCHAR(MAX) columns if they are written as In-Row data.. so lack of SCSU feels like a “missing optimization”. Unlike SCSU, page compression results can vary dramatically based on shared leading prefixes (ie. duplicate values).

However, it may still be “faster” to use NVARCHAR(MAX) even with the higher IO costs with functions like OPENJSON due to avoiding the implicit conversion. This is implicit conversion overhead depends on the relative cost of usage and if the field is touched before or after filtering. This same conversion issue exists when using 2019’s UTF-8 collation in a VARCHAR(MAX) column.

Using NVARCHAR(1-4000) also requires N*2 bytes of the ~8000 byte row quota, while NVARCHAR(MAX) only requires 24 bytes. Overall design and usage need to be considered together to account for specific implementation details.

+In my database / data / schema, by using two columns (coalesced on read) it was possible to reduce disk space usage by ~40% while still supporting overflowing text values. SCSU, while with its flaws, is an amazingly clever and underutilized method of storing Unicode more space-efficiently.


T
TheTXI

One problem is that if you are having to work with multiple versions of SQL Server, the MAX will not always work. So if you are working with legacy DB's or any other situation that involves multiple versions, you better be very careful.


I think the unspoken assumption on the part of the OP is that he is dealing entirely with 2005+ instances, and that his apps won't need to work on 2000 (or, ack, lower) versions. I totally agree with you if there is a need to support the older versions, though!
John Rudy: I would imagine that's the case, I just know that I've run into those hurdles myself when I didn't think I was going to.
Actually this is a prevalent problem still with modern stuff due to SQL CE 4 which does not support MAX column types so interoperability is a hassle.
W
WWC

1) The SQL server will have to utilize more resources (allocated memory and cpu time) when dealing with nvarchar(max) vs nvarchar(n) where n is a number specific to the field.

2) What does this mean in regards to performance?

On SQL Server 2005, I queried 13,000 rows of data from a table with 15 nvarchar(max) columns. I timed the queries repeatedly and then changed the columns to nvarchar(255) or less.

The queries prior to the optimization averaged at 2.0858 seconds. The queries after the change returned in an average of 1.90 seconds. That was about 184 milliseconds of improvement to the basic select * query. That is an 8.8% improvement.

3) My results are in concurrence with a few other articles that indicated that there was a performance difference. Depending on your database and the query, the percentage of improvement can vary. If you don't have a lot of concurrent users or very many records, then the performance difference won't be an issue for you. However, the performance difference will increase as more records and concurrent users increase.


C
Cade Roux

I had a udf which padded strings and put the output to varchar(max). If this was used directly instead of casting back to the appropriate size for the column being adjusted, the performance was very poor. I ended up putting the udf to an arbitrary length with a big note instead of relying on all the callers of the udf to re-cast the string to a smaller size.


T
Tony

legacy system support. If you have a system that is using the data and it is expected to be a certain length then the database is a good place to enforce the length. This is not ideal but legacy systems are sometime not ideal. =P


M
Matt Spradley

If all of the data in a row (for all the columns) would never reasonably take 8000 or fewer characters then the design at the data layer should enforce this.

The database engine is much more efficient keeping everything out of blob storage. The smaller you can restrict a row the better. The more rows you can cram in a page the better. The database just performs better when it has to access fewer pages.


K
Kvasi

My tests have shown that there are differences when selecting.

CREATE TABLE t4000 (a NVARCHAR(4000) NULL);

CREATE TABLE tmax (a NVARCHAR(MAX) NULL);

DECLARE @abc4 NVARCHAR(4000) = N'ABC';

INSERT INTO t4000
SELECT TOP 1000000 @abc4
    FROM
    master.sys.all_columns ac1,
    master.sys.all_columns ac2;

DECLARE @abc NVARCHAR(MAX) = N'ABC';

INSERT INTO tmax
SELECT TOP 1000000 @abc
    FROM
    master.sys.all_columns ac1,
    master.sys.all_columns ac2;

SET STATISTICS TIME ON;
SET STATISTICS IO ON;

SELECT * FROM dbo.t4000;
SELECT * FROM dbo.tmax;

o
orip

Interesting link: Why use a VARCHAR when you can use TEXT?

It's about PostgreSQL and MySQL, so the performance analysis is different, but the logic for "explicitness" still holds: Why force yourself to always worry about something that's relevant a small percentage of the time? If you saved an email address to a variable, you'd use a 'string' not a 'string limited to 80 chars'.


That's akin to arguing that you shouldn't have check constraints to ensure that a person's age isn't a negative number.
I see a difference between data correctness and performance optimization.
c
carlos martini

The main disadvantage I can see is that let's say you have this:

Which one gives you the most information about the data needed for the UI?

This

            CREATE TABLE [dbo].[BusData](
                [ID] [int] IDENTITY(1,1) NOT NULL,
                [RecordId] [nvarchar](MAX) NULL,
                [CompanyName] [nvarchar](MAX) NOT NULL,
                [FirstName] [nvarchar](MAX) NOT NULL,
                [LastName] [nvarchar](MAX) NOT NULL,
                [ADDRESS] [nvarchar](MAX) NOT NULL,
                [CITY] [nvarchar](MAX) NOT NULL,
                [County] [nvarchar](MAX) NOT NULL,
                [STATE] [nvarchar](MAX) NOT NULL,
                [ZIP] [nvarchar](MAX) NOT NULL,
                [PHONE] [nvarchar](MAX) NOT NULL,
                [COUNTRY] [nvarchar](MAX) NOT NULL,
                [NPA] [nvarchar](MAX) NULL,
                [NXX] [nvarchar](MAX) NULL,
                [XXXX] [nvarchar](MAX) NULL,
                [CurrentRecord] [nvarchar](MAX) NULL,
                [TotalCount] [nvarchar](MAX) NULL,
                [Status] [int] NOT NULL,
                [ChangeDate] [datetime] NOT NULL
            ) ON [PRIMARY]

Or This?

            CREATE TABLE [dbo].[BusData](
                [ID] [int] IDENTITY(1,1) NOT NULL,
                [RecordId] [nvarchar](50) NULL,
                [CompanyName] [nvarchar](50) NOT NULL,
                [FirstName] [nvarchar](50) NOT NULL,
                [LastName] [nvarchar](50) NOT NULL,
                [ADDRESS] [nvarchar](50) NOT NULL,
                [CITY] [nvarchar](50) NOT NULL,
                [County] [nvarchar](50) NOT NULL,
                [STATE] [nvarchar](2) NOT NULL,
                [ZIP] [nvarchar](16) NOT NULL,
                [PHONE] [nvarchar](18) NOT NULL,
                [COUNTRY] [nvarchar](50) NOT NULL,
                [NPA] [nvarchar](3) NULL,
                [NXX] [nvarchar](3) NULL,
                [XXXX] [nvarchar](4) NULL,
                [CurrentRecord] [nvarchar](50) NULL,
                [TotalCount] [nvarchar](50) NULL,
                [Status] [int] NOT NULL,
                [ChangeDate] [datetime] NOT NULL
            ) ON [PRIMARY]

I would prefer business logic telling me that a Company Name can be a maximum of 50 characters rather than relying a database table for that info.
I agree with Jeff. I don't think the persistence store is the right place to define your business rules. And in a layered architecture your UI wouldn't even know about the persistence layer.
Unless of course you're using a value that is constrained to a specific size, for instance ISO code for country.
What does that have to do with a table def? You can still have business logic. I think your point makes no sense as related to how a table is designed. If you still want to design some kind of definition in your business layer, so be it. Although what would make more sense, is using a stored proc anyway in the business layer; not a table def??
It seems unpopular but I agree with carlos, if the db sets a max size then you can be comfortable in all the layers you build on top of it what you are likely to have to deal with. This is particularly important if you have multiple systems writing to your database.
t
tsundoku

One disadvantage is that you will be designing around an unpredictable variable, and you will probably ignore instead of take advantage of the internal SQL Server data structure, progressively made up of Row(s), Page(s), and Extent(s).

Which makes me think about data structure alignment in C, and that being aware of the alignment is generally considered to be a Good Thing (TM). Similar idea, different context.

MSDN page for Pages and Extents

MSDN page for Row-Overflow Data


S
Stephen Whipp

firstly I thought about this, but then thought again. There are performance implications, but equally it does serve as a form of documentation to have an idea what size the fields really are. And it does enforce when that database sits in a larger ecosystem. In my opinion the key is to be permissive but only within reason.

ok, here's my feelings simply on the issue of business and data layer logic. It depends, if your DB is a shared resource between systems that share business logic then of course it seems a natural place to enforce such logic, but its not the BEST way to do it, the BEST way is to provide an API, this allows the interaction to be tested and keeps business logic where it belongs, it keeps systems decoupled, it keeps your tiers within a system decoupled. If however your database is supposed to be serving only one application, then lets get AGILE in thinking, what's true now? design for now. If and when such access is needed, provide an API to that data.

obviously though, this is just the ideal, if you are working with an existing system the likelyhood is that you will need to do it differently at least in the short term.


D
Dan Goldstein

This will cause a performance problem, although it may never cause any actual issues if your database is small. Each record will take up more space on the hard drive and the database will need to read more sectors of the disk if you're searching through a lot of records at once. For example, a small record could fit 50 to a sector and a large record could fit 5. You'd need to read 10 times as much data from the disk using the large record.


-1. A string of length 100 stored in an nvarchar(max) column takes no more disc space than if it were in an nvarchar(100) column.
What you're describing is correct if the size of the stored data is greater, but this question is about whether the data type has performance implications or other considerations.
p
pappes

It will make screen design harder as you will no longer be able to predict how wide your controls should be.