All of my records have a field called "pictures". This field is an array of strings.
I now want the newest 10 records where this array IS NOT empty.
I've googled around, but strangely enough I haven't found much on this. I've read into the $where option, but I was wondering how slow that is to native functions, and if there is a better solution.
And even then, that does not work:
ME.find({$where: 'this.pictures.length > 0'}).sort('-created').limit(10).execFind()
Returns nothing. Leaving this.pictures
without the length bit does work, but then it also returns empty records, of course.
If you also have documents that don't have the key, you can use:
ME.find({ pictures: { $exists: true, $not: {$size: 0} } })
MongoDB doesn't use indexes if $size
is involved, so here is a better solution:
ME.find({ pictures: { $exists: true, $ne: [] } })
If your property can have invalid values (like null
boolean
or others) , then you an add an additional check using $types
as proposed in this answer:
With mongo >= 3.2:
ME.find({ pictures: { $exists: true, $type: 'array', $ne: [] } })
With mongo < 3.2:
ME.find({ pictures: { $exists: true, $type: 4, $ne: [] } })
Since the MongoDB 2.6 release, you can compare with the operator $gt
, but this could lead to unexpected results (you can find a detailed explanation in this answer):
ME.find({ pictures: { $gt: [] } })
After some more looking, especially in the mongodb documents, and puzzling bits together, this was the answer:
ME.find({pictures: {$exists: true, $not: {$size: 0}}})
pictures
field.
This might also work for you:
ME.find({'pictures.0': {$exists: true}});
pictures.2
exists but pictures.1
does not?
$exists
operator is a boolean, not an offset. @tenbatsu should be using true
instead of 1
.
Would there ever be a case where pictures.2 exists but pictures.1 does not?
Yes, that case could be happens.
pictures
is a sub-doc, not an array. e.g. pictures: {'2': 123}
pictures
.
You care about two things when querying - accuracy and performance. With that in mind, I tested a few different approaches in MongoDB v3.0.14.
TL;DR db.doc.find({ nums: { $gt: -Infinity }})
is the fastest and most reliable (at least in the MongoDB version I tested).
EDIT: This no longer works in MongoDB v3.6! See the comments under this post for a potential solution.
Setup
I inserted 1k docs w/o a list field, 1k docs with an empty list, and 5 docs with a non-empty list.
for (var i = 0; i < 1000; i++) { db.doc.insert({}); }
for (var i = 0; i < 1000; i++) { db.doc.insert({ nums: [] }); }
for (var i = 0; i < 5; i++) { db.doc.insert({ nums: [1, 2, 3] }); }
db.doc.createIndex({ nums: 1 });
I recognize this isn't enough of a scale to take performance as seriously as I am in the tests below, but it's enough to present the correctness of various queries and behavior of chosen query plans.
Tests
db.doc.find({'nums': {'$exists': true}})
returns wrong results (for what we're trying to accomplish).
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': {'$exists': true}}).count()
1005
--
db.doc.find({'nums.0': {'$exists': true}})
returns correct results, but it's also slow using a full collection scan (notice COLLSCAN
stage in the explanation).
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': {'$exists': true}}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': {'$exists': true}}).explain()
{
"queryPlanner": {
"plannerVersion": 1,
"namespace": "test.doc",
"indexFilterSet": false,
"parsedQuery": {
"nums.0": {
"$exists": true
}
},
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"nums.0": {
"$exists": true
}
},
"direction": "forward"
},
"rejectedPlans": [ ]
},
"serverInfo": {
"host": "MacBook-Pro",
"port": 27017,
"version": "3.0.14",
"gitVersion": "08352afcca24bfc145240a0fac9d28b978ab77f3"
},
"ok": 1
}
--
db.doc.find({'nums': { $exists: true, $gt: { '$size': 0 }}})
returns wrong results. That's because of an invalid index scan advancing no documents. It will likely be accurate but slow without the index.
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $gt: { '$size': 0 }}}).count()
0
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $gt: { '$size': 0 }}}).explain('executionStats').executionStats.executionStages
{
"stage": "KEEP_MUTATIONS",
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 2,
"advanced": 0,
"needTime": 0,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"inputStage": {
"stage": "FETCH",
"filter": {
"$and": [
{
"nums": {
"$gt": {
"$size": 0
}
}
},
{
"nums": {
"$exists": true
}
}
]
},
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 1,
"advanced": 0,
"needTime": 0,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"docsExamined": 0,
"alreadyHasObj": 0,
"inputStage": {
"stage": "IXSCAN",
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 1,
"advanced": 0,
"needTime": 0,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"keyPattern": {
"nums": 1
},
"indexName": "nums_1",
"isMultiKey": true,
"direction": "forward",
"indexBounds": {
"nums": [
"({ $size: 0.0 }, [])"
]
},
"keysExamined": 0,
"dupsTested": 0,
"dupsDropped": 0,
"seenInvalidated": 0,
"matchTested": 0
}
}
}
--
db.doc.find({'nums': { $exists: true, $not: { '$size': 0 }}})
returns correct results, but the performance is bad. It technically does an index scan, but then it still advances all the docs and then has to filter through them).
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $not: { '$size': 0 }}}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $not: { '$size': 0 }}}).explain('executionStats').executionStats.executionStages
{
"stage": "KEEP_MUTATIONS",
"nReturned": 5,
"executionTimeMillisEstimate": 0,
"works": 2016,
"advanced": 5,
"needTime": 2010,
"needFetch": 0,
"saveState": 15,
"restoreState": 15,
"isEOF": 1,
"invalidates": 0,
"inputStage": {
"stage": "FETCH",
"filter": {
"$and": [
{
"nums": {
"$exists": true
}
},
{
"$not": {
"nums": {
"$size": 0
}
}
}
]
},
"nReturned": 5,
"executionTimeMillisEstimate": 0,
"works": 2016,
"advanced": 5,
"needTime": 2010,
"needFetch": 0,
"saveState": 15,
"restoreState": 15,
"isEOF": 1,
"invalidates": 0,
"docsExamined": 2005,
"alreadyHasObj": 0,
"inputStage": {
"stage": "IXSCAN",
"nReturned": 2005,
"executionTimeMillisEstimate": 0,
"works": 2015,
"advanced": 2005,
"needTime": 10,
"needFetch": 0,
"saveState": 15,
"restoreState": 15,
"isEOF": 1,
"invalidates": 0,
"keyPattern": {
"nums": 1
},
"indexName": "nums_1",
"isMultiKey": true,
"direction": "forward",
"indexBounds": {
"nums": [
"[MinKey, MaxKey]"
]
},
"keysExamined": 2015,
"dupsTested": 2015,
"dupsDropped": 10,
"seenInvalidated": 0,
"matchTested": 0
}
}
}
--
db.doc.find({'nums': { $exists: true, $ne: [] }})
returns correct results and is slightly faster, but the performance is still not ideal. It uses IXSCAN which only advances docs with an existing list field, but then has to filter out the empty lists one by one.
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $ne: [] }}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $exists: true, $ne: [] }}).explain('executionStats').executionStats.executionStages
{
"stage": "KEEP_MUTATIONS",
"nReturned": 5,
"executionTimeMillisEstimate": 0,
"works": 1018,
"advanced": 5,
"needTime": 1011,
"needFetch": 0,
"saveState": 15,
"restoreState": 15,
"isEOF": 1,
"invalidates": 0,
"inputStage": {
"stage": "FETCH",
"filter": {
"$and": [
{
"$not": {
"nums": {
"$eq": [ ]
}
}
},
{
"nums": {
"$exists": true
}
}
]
},
"nReturned": 5,
"executionTimeMillisEstimate": 0,
"works": 1017,
"advanced": 5,
"needTime": 1011,
"needFetch": 0,
"saveState": 15,
"restoreState": 15,
"isEOF": 1,
"invalidates": 0,
"docsExamined": 1005,
"alreadyHasObj": 0,
"inputStage": {
"stage": "IXSCAN",
"nReturned": 1005,
"executionTimeMillisEstimate": 0,
"works": 1016,
"advanced": 1005,
"needTime": 11,
"needFetch": 0,
"saveState": 15,
"restoreState": 15,
"isEOF": 1,
"invalidates": 0,
"keyPattern": {
"nums": 1
},
"indexName": "nums_1",
"isMultiKey": true,
"direction": "forward",
"indexBounds": {
"nums": [
"[MinKey, undefined)",
"(undefined, [])",
"([], MaxKey]"
]
},
"keysExamined": 1016,
"dupsTested": 1015,
"dupsDropped": 10,
"seenInvalidated": 0,
"matchTested": 0
}
}
}
--
db.doc.find({'nums': { $gt: [] }})
IS DANGEROUS BECAUSE DEPENDING ON THE INDEX USED IT MIGHT GIVE UNEXPECTED RESULTS. That's because of an invalid index scan which advances no documents.
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).count()
0
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).hint({ nums: 1 }).count()
0
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).hint({ _id: 1 }).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: [] }}).explain('executionStats').executionStats.executionStages
{
"stage": "KEEP_MUTATIONS",
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 1,
"advanced": 0,
"needTime": 0,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"inputStage": {
"stage": "FETCH",
"filter": {
"nums": {
"$gt": [ ]
}
},
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 1,
"advanced": 0,
"needTime": 0,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"docsExamined": 0,
"alreadyHasObj": 0,
"inputStage": {
"stage": "IXSCAN",
"nReturned": 0,
"executionTimeMillisEstimate": 0,
"works": 1,
"advanced": 0,
"needTime": 0,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"keyPattern": {
"nums": 1
},
"indexName": "nums_1",
"isMultiKey": true,
"direction": "forward",
"indexBounds": {
"nums": [
"([], BinData(0, ))"
]
},
"keysExamined": 0,
"dupsTested": 0,
"dupsDropped": 0,
"seenInvalidated": 0,
"matchTested": 0
}
}
}
--
db.doc.find({'nums.0’: { $gt: -Infinity }})
returns correct results, but has bad performance (uses a full collection scan).
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': { $gt: -Infinity }}).count()
5
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums.0': { $gt: -Infinity }}).explain('executionStats').executionStats.executionStages
{
"stage": "COLLSCAN",
"filter": {
"nums.0": {
"$gt": -Infinity
}
},
"nReturned": 5,
"executionTimeMillisEstimate": 0,
"works": 2007,
"advanced": 5,
"needTime": 2001,
"needFetch": 0,
"saveState": 15,
"restoreState": 15,
"isEOF": 1,
"invalidates": 0,
"direction": "forward",
"docsExamined": 2005
}
--
db.doc.find({'nums': { $gt: -Infinity }})
surprisingly, this works very well! It gives the right results and it's fast, advancing 5 docs from the index scan phase.
MacBook-Pro(mongod-3.0.14) test> db.doc.find({'nums': { $gt: -Infinity }}).explain('executionStats').executionStats.executionStages
{
"stage": "FETCH",
"nReturned": 5,
"executionTimeMillisEstimate": 0,
"works": 16,
"advanced": 5,
"needTime": 10,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"docsExamined": 5,
"alreadyHasObj": 0,
"inputStage": {
"stage": "IXSCAN",
"nReturned": 5,
"executionTimeMillisEstimate": 0,
"works": 15,
"advanced": 5,
"needTime": 10,
"needFetch": 0,
"saveState": 0,
"restoreState": 0,
"isEOF": 1,
"invalidates": 0,
"keyPattern": {
"nums": 1
},
"indexName": "nums_1",
"isMultiKey": true,
"direction": "forward",
"indexBounds": {
"nums": [
"(-inf.0, inf.0]"
]
},
"keysExamined": 15,
"dupsTested": 15,
"dupsDropped": 10,
"seenInvalidated": 0,
"matchTested": 0
}
}
seen_events
String array, which is also indexed. Searching with { $gt: -Infinity }
, I immediately get 0 documents. Using { $exists: true, $ne: [] }
I get the more likely 1,2m docs, with a lot of time being wasted in the FETCH stage: gist.github.com/N-Coder/b9e89a925e895c605d84bfeed648d82c
db.test_collection.find({"seen_events.0": {$exists: true}})
is bad because it uses a collection scan. 2. db.test_collection.find({seen_events: {$exists: true, $ne: []}})
is bad because its IXSCAN matches all documents and then the filtering is performed in the slow FETCH phase. 3. Same goes for db.test_collection.find({seen_events: {$exists: true, $not: {$size: 0}}})
. 4. All the other queries return invalid results.
seen_events
contain strings, you can use this: db.test_collection.find({seen_events: {$gt: ''}}).count()
. To confirm it performs well, check out db.test_collection.find({seen_events: {$gt: ''}}).explain(true).executionStats
. You can probably enforce that seen events are strings via schema validation: docs.mongodb.com/manual/core/schema-validation
Starting with the 2.6 release, another way to do this is to compare the field to an empty array:
ME.find({pictures: {$gt: []}})
Testing it out in the shell:
> db.ME.insert([
{pictures: [1,2,3]},
{pictures: []},
{pictures: ['']},
{pictures: [0]},
{pictures: 1},
{foobar: 1}
])
> db.ME.find({pictures: {$gt: []}})
{ "_id": ObjectId("54d4d9ff96340090b6c1c4a7"), "pictures": [ 1, 2, 3 ] }
{ "_id": ObjectId("54d4d9ff96340090b6c1c4a9"), "pictures": [ "" ] }
{ "_id": ObjectId("54d4d9ff96340090b6c1c4aa"), "pictures": [ 0 ] }
So it properly includes the docs where pictures
has at least one array element, and excludes the docs where pictures
is either an empty array, not an array, or missing.
db.ME.createIndex({ pictures: 1 })
and then db.ME.find({pictures: {$gt: []}})
will return zero results, at least in MongoDB v3.0.14
Retrieve all and only the documents where 'pictures' is an array and is not empty
ME.find({pictures: {$type: 'array', $ne: []}})
If using a MongoDb version prior to 3.2, use $type: 4
instead of $type: 'array'
. Notice that this solution doesn't even use $size, so there's no problem with indexes ("Queries cannot use indexes for the $size portion of a query")
Other solutions, including these (accepted answer):
ME.find({ pictures: { $exists: true, $not: {$size: 0} } }); ME.find({ pictures: { $exists: true, $ne: [] } })
are wrong because they return documents even if, for example, 'pictures' is null
, undefined
, 0, etc.
You can use any of the following to achieve this. Both also take care of not returning a result for objects that don't have the requested key in them:
db.video.find({pictures: {$exists: true, $gt: {$size: 0}}})
db.video.find({comments: {$exists: true, $not: {$size: 0}}})
db.find({ pictures: { $elemMatch: { $exists: true } } })
$elemMatch
matches documents that contain an array field with at least one element that matches the specified query.
So you're matching all arrays with at least an element.
Use the $elemMatch
operator: according to the documentation
The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria.
$elemMatches
makes sure that the value is an array and that it is not empty. So the query would be something like
ME.find({ pictures: { $elemMatch: {$exists: true }}})
PS A variant of this code is found in MongoDB University's M121 course.
{ $where: "this.pictures.length > 1" }
use the $where and pass the this.field_name.length which return the size of array field and check it by comparing with number. if any array have any value than array size must be at least 1. so all the array field have length more than one, it means it have some data in that array
This also works:
db.getCollection('collectionName').find({'arrayName': {$elemMatch:{}}})
You can also use the helper method Exists over the Mongo operator $exists
ME.find()
.exists('pictures')
.where('pictures').ne([])
.sort('-created')
.limit(10)
.exec(function(err, results){
...
});
ME.find({pictures: {$exists: true}})
Simple as that, this worked for me.
Success story sharing
ME.find({ pictures: { $gt: [] } })
IS DANGEROUS, even in newer MongoDB versions. If you have an index on your list field and that index is utilized during the query, you will get unexpected results. For example:db.doc.find({'nums': { $gt: [] }}).hint({ _id: 1 }).count()
returns the right number, whiledb.doc.find({'nums': { $gt: [] }}).hint({ nums: 1 }).count()
returns0
.