I'm trying to migrate a MySQL-based app over to Microsoft SQL Server 2005 (not by choice, but that's life).
In the original app, we used almost entirely ANSI-SQL compliant statements, with one significant exception -- we used MySQL's group_concat
function fairly frequently.
group_concat
, by the way, does this: given a table of, say, employee names and projects...
SELECT empName, projID FROM project_members;
returns:
ANDY | A100
ANDY | B391
ANDY | X010
TOM | A100
TOM | A510
... and here's what you get with group_concat:
SELECT
empName, group_concat(projID SEPARATOR ' / ')
FROM
project_members
GROUP BY
empName;
returns:
ANDY | A100 / B391 / X010
TOM | A100 / A510
So what I'd like to know is: Is it possible to write, say, a user-defined function in SQL Server which emulates the functionality of group_concat
?
I have almost no experience using UDFs, stored procedures, or anything like that, just straight-up SQL, so please err on the side of too much explanation :)
No REAL easy way to do this. Lots of ideas out there, though.
SELECT table_name, LEFT(column_names , LEN(column_names )-1) AS column_names
FROM information_schema.columns AS extern
CROSS APPLY
(
SELECT column_name + ','
FROM information_schema.columns AS intern
WHERE extern.table_name = intern.table_name
FOR XML PATH('')
) pre_trimmed (column_names)
GROUP BY table_name, column_names;
Or a version that works correctly if the data might contain characters such as <
WITH extern
AS (SELECT DISTINCT table_name
FROM INFORMATION_SCHEMA.COLUMNS)
SELECT table_name,
LEFT(y.column_names, LEN(y.column_names) - 1) AS column_names
FROM extern
CROSS APPLY (SELECT column_name + ','
FROM INFORMATION_SCHEMA.COLUMNS AS intern
WHERE extern.table_name = intern.table_name
FOR XML PATH(''), TYPE) x (column_names)
CROSS APPLY (SELECT x.column_names.value('.', 'NVARCHAR(MAX)')) y(column_names)
I may be a bit late to the party but this method works for me and is easier than the COALESCE method.
SELECT STUFF(
(SELECT ',' + Column_Name
FROM Table_Name
FOR XML PATH (''))
, 1, 1, '')
SQL Server 2017 does introduce a new aggregate function
STRING_AGG ( expression, separator)
.
Concatenates the values of string expressions and places separator values between them. The separator is not added at the end of string.
The concatenated elements can be ordered by appending WITHIN GROUP (ORDER BY some_expression)
For versions 2005-2016 I typically use the XML method in the accepted answer.
This can fail in some circumstances however. e.g. if the data to be concatenated contains CHAR(29)
you see
FOR XML could not serialize the data ... because it contains a character (0x001D) which is not allowed in XML.
A more robust method that can deal with all characters would be to use a CLR aggregate. However applying an ordering to the concatenated elements is more difficult with this approach.
The method of assigning to a variable is not guaranteed and should be avoided in production code.
Possibly too late to be of benefit now, but is this not the easiest way to do things?
SELECT empName, projIDs = replace
((SELECT Surname AS [data()]
FROM project_members
WHERE empName = a.empName
ORDER BY empName FOR xml path('')), ' ', REQUIRED SEPERATOR)
FROM project_members a
WHERE empName IS NOT NULL
GROUP BY empName
Have a look at the GROUP_CONCAT project on Github, I think I does exactly what you are searching for:
This project contains a set of SQLCLR User-defined Aggregate functions (SQLCLR UDAs) that collectively offer similar functionality to the MySQL GROUP_CONCAT function. There are multiple functions to ensure the best performance based on the functionality required...
GROUP_CONCAT(klascode,'(',name,')' ORDER BY klascode ASC SEPARATOR ', ')
To concatenate all the project manager names from projects that have multiple project managers write:
SELECT a.project_id,a.project_name,Stuff((SELECT N'/ ' + first_name + ', '+last_name FROM projects_v
where a.project_id=project_id
FOR
XML PATH(''),TYPE).value('text()[1]','nvarchar(max)'),1,2,N''
) mgr_names
from projects_v a
group by a.project_id,a.project_name
With the below code you have to set PermissionLevel=External on your project properties before you deploy, and change the database to trust external code (be sure to read elsewhere about security risks and alternatives [like certificates]) by running "ALTER DATABASE database_name SET TRUSTWORTHY ON".
using System;
using System.Collections.Generic;
using System.Data.SqlTypes;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;
using Microsoft.SqlServer.Server;
[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined,
MaxByteSize=8000,
IsInvariantToDuplicates=true,
IsInvariantToNulls=true,
IsInvariantToOrder=true,
IsNullIfEmpty=true)]
public struct CommaDelimit : IBinarySerialize
{
[Serializable]
private class StringList : List<string>
{ }
private StringList List;
public void Init()
{
this.List = new StringList();
}
public void Accumulate(SqlString value)
{
if (!value.IsNull)
this.Add(value.Value);
}
private void Add(string value)
{
if (!this.List.Contains(value))
this.List.Add(value);
}
public void Merge(CommaDelimit group)
{
foreach (string s in group.List)
{
this.Add(s);
}
}
void IBinarySerialize.Read(BinaryReader reader)
{
IFormatter formatter = new BinaryFormatter();
this.List = (StringList)formatter.Deserialize(reader.BaseStream);
}
public SqlString Terminate()
{
if (this.List.Count == 0)
return SqlString.Null;
const string Separator = ", ";
this.List.Sort();
return new SqlString(String.Join(Separator, this.List.ToArray()));
}
void IBinarySerialize.Write(BinaryWriter writer)
{
IFormatter formatter = new BinaryFormatter();
formatter.Serialize(writer.BaseStream, this.List);
}
}
I've tested this using a query that looks like:
SELECT
dbo.CommaDelimit(X.value) [delimited]
FROM
(
SELECT 'D' [value]
UNION ALL SELECT 'B' [value]
UNION ALL SELECT 'B' [value] -- intentional duplicate
UNION ALL SELECT 'A' [value]
UNION ALL SELECT 'C' [value]
) X
And yields: A, B, C, D
Tried these but for my purposes in MS SQL Server 2005 the following was most useful, which I found at xaprb
declare @result varchar(8000);
set @result = '';
select @result = @result + name + ' '
from master.dbo.systypes;
select rtrim(@result);
@Mark as you mentioned it was the space character that caused issues for me.
About J Hardiman's answer, how about:
SELECT empName, projIDs=
REPLACE(
REPLACE(
(SELECT REPLACE(projID, ' ', '-somebody-puts-microsoft-out-of-his-misery-please-') AS [data()] FROM project_members WHERE empName=a.empName FOR XML PATH('')),
' ',
' / '),
'-somebody-puts-microsoft-out-of-his-misery-please-',
' ')
FROM project_members a WHERE empName IS NOT NULL GROUP BY empName
By the way, is the use of "Surname" a typo or am i not understanding a concept here?
Anyway, thanks a lot guys cuz it saved me quite some time :)
2021
@AbdusSalamAzad's answer is the correct one.
SELECT STRING_AGG(my_col, ',') AS my_result FROM my_tbl;
If the result is too big, you may get error "STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation." , which can be fixed by changing the query to this:
SELECT STRING_AGG(convert(varchar(max), my_col), ',') AS my_result FROM my_tbl;
STRING_AGG
.
UPDATE 2020: SQL Server 2016+ JSON Serialization and De-serialization Examples
The data provided by the OP inserted into a temporary table called #project_members
drop table if exists #project_members;
create table #project_members(
empName varchar(20) not null,
projID varchar(20) not null);
go
insert #project_members(empName, projID) values
('ANDY', 'A100'),
('ANDY', 'B391'),
('ANDY', 'X010'),
('TOM', 'A100'),
('TOM', 'A510');
How to serialize this data into a single JSON string with a nested array containing projID's
select empName, (select pm_json.projID
from #project_members pm_json
where pm.empName=pm_json.empName
for json path, root('projList')) projJSON
from #project_members pm
group by empName
for json path;
Result
'[
{
"empName": "ANDY",
"projJSON": {
"projList": [
{ "projID": "A100" },
{ "projID": "B391" },
{ "projID": "X010" }
]
}
},
{
"empName": "TOM",
"projJSON": {
"projList": [
{ "projID": "A100" },
{ "projID": "A510" }
]
}
}
]'
How to de-serialize this data from a single JSON string back to it's original rows and columns
declare @json nvarchar(max)=N'[{"empName":"ANDY","projJSON":{"projList":[{"projID":"A100"},
{"projID":"B391"},{"projID":"X010"}]}},{"empName":"TOM","projJSON":
{"projList":[{"projID":"A100"},{"projID":"A510"}]}}]';
select oj.empName, noj.projID
from openjson(@json) with (empName varchar(20),
projJSON nvarchar(max) as json) oj
cross apply openjson(oj.projJSON, '$.projList') with (projID varchar(20)) noj;
Results
empName projID
ANDY A100
ANDY B391
ANDY X010
TOM A100
TOM A510
How to persist the unique empName to a table and store the projID's in a nested JSON array
drop table if exists #project_members_with_json;
create table #project_members_with_json(
empName varchar(20) unique not null,
projJSON nvarchar(max) not null);
go
insert #project_members_with_json(empName, projJSON)
select empName, (select pm_json.projID
from #project_members pm_json
where pm.empName=pm_json.empName
for json path, root('projList'))
from #project_members pm
group by empName;
Results
empName projJSON
ANDY {"projList":[{"projID":"A100"},{"projID":"B391"},{"projID":"X010"}]}
TOM {"projList":[{"projID":"A100"},{"projID":"A510"}]}
How to de-serialize from a table with unique empName and nested JSON array column containing projID's
select wj.empName, oj.projID
from
#project_members_with_json wj
cross apply
openjson(wj.projJSON, '$.projList') with (projID varchar(20)) oj;
Results
empName projID
ANDY A100
ANDY B391
ANDY X010
TOM A100
TOM A510
GROUP_CONCAT
behaviour. The string that GROUP_CONCAT
produces is just a list of values separated by a delimiter. A JSON-formatted string is much more than that.
For SQL Server 2017+, use STRING_AGG() function
SELECT STRING_AGG(Genre, ',') AS Result
FROM Genres;
Sample result:
Result
Rock,Jazz,Country,Pop,Blues,Hip Hop,Rap,Punk
For my fellow Googlers out there, here's a very simple plug-and-play solution that worked for me after struggling with the more complex solutions for a while:
SELECT
distinct empName,
NewColumnName=STUFF((SELECT ','+ CONVERT(VARCHAR(10), projID )
FROM returns
WHERE empName=t.empName FOR XML PATH('')) , 1 , 1 , '' )
FROM
returns t
Notice that I had to convert the ID into a VARCHAR in order to concatenate it as a string. If you don't have to do that, here's an even simpler version:
SELECT
distinct empName,
NewColumnName=STUFF((SELECT ','+ projID
FROM returns
WHERE empName=t.empName FOR XML PATH('')) , 1 , 1 , '' )
FROM
returns t
All credit for this goes to here: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/9508abc2-46e7-4186-b57f-7f368374e084/replicating-groupconcat-function-of-mysql-in-sql-server?forum=transactsql
Success story sharing