How do you track record relations in NoSQL?

nosql database-relations

I am trying to figure out the equivalent of foreign keys and indexes in NoSQL KVP or Document databases. Since there are no pivotal tables (to add keys marking a relation between two objects) I am really stumped as to how you would be able to retrieve data in a way that would be useful for normal web pages.

Say I have a user, and this user leaves many comments all over the site. The only way I can think of to keep track of that users comments is to

Embed them in the user object (which seems quite useless) Create and maintain a user_id:comments value that contains a list of each comment's key [comment:34, comment:197, etc...] so that that I can fetch them as needed.

However, taking the second example you will soon hit a brick wall when you use it for tracking other things like a key called "active_comments" which might contain 30 million ids in it making it cost a TON to query each page just to know some recent active comments. It also would be very prone to race-conditions as many pages might try to update it at the same time.

How can I track relations like the following in a NoSQL database?

All of a user's comments

All active comments

All posts tagged with [keyword]

All students in a club - or all clubs a student is in

Or am I thinking about this incorrectly?

There is no one way to do it in NoSQL databases, this question is rather a kin to asking how would I track relations in C programs.

Wow, then I guess the hype about NoSQL replacing RDBMS is impossible.

Yes, NoSQL is definitely overhyped. I'm not saying the new technologies aren't useful in the right circumstances, but it's ridiculous to think they'll replace the RDBMS. See en.wikipedia.org/wiki/Hype_cycle

Wouldn't you just have a collection of 'users' and a collection of comments. And then, each comment just as an 'author' property whose value is a reference back to a user id?

Bill Karwin

All the answers for how to store many-to-many associations in the "NoSQL way" reduce to the same thing: storing data redundantly.

In NoSQL, you don't design your database based on the relationships between data entities. You design your database based on the queries you will run against it. Use the same criteria you would use to denormalize a relational database: if it's more important for data to have cohesion (think of values in a comma-separated list instead of a normalized table), then do it that way.

But this inevitably optimizes for one type of query (e.g. comments by any user for a given article) at the expense of other types of queries (comments for any article by a given user). If your application has the need for both types of queries to be equally optimized, you should not denormalize. And likewise, you should not use a NoSQL solution if you need to use the data in a relational way.

There is a risk with denormalization and redundancy that redundant sets of data will get out of sync with one another. This is called an anomaly. When you use a normalized relational database, the RDBMS can prevent anomalies. In a denormalized database or in NoSQL, it becomes your responsibility to write application code to prevent anomalies.

One might think that it'd be great for a NoSQL database to do the hard work of preventing anomalies for you. There is a paradigm that can do this -- the relational paradigm.

"you should not use a NoSQL solution if you need to use the data in a relational way" - So how do others running NoSQL get away with it? How can you possibly know all the ways you will query data when you first design your application? Fox example, I might want recent comments, comments by user, comments by tag, comments for a given post, comments marked as spam, active comments, highest rated comments, etc.

Exactly -- there ain't no such thing as "it just works" as the advocates of NoSQL like to claim. Either you do a bunch of analysis up front for your relational data modeling, or you do a bunch of analysis up front for your top-priority queries, or you do a bunch of costly refactoring throughout the project as you discover which parts of your design didn't get enough analysis up front.

If we store data redundantly, how should we update things? For example, changes his name, and he wrote some comments. His name is already changed in user collection, but how to change all the redundantly stored names in the comments collection?

@M98, Ah, you've found the weakness in this strategy. You have to know about all the places you need to update, and then write code in your application to update all of them when you update any one. Good luck!

The same problem exists for a denormalized relational database.

Basit Anwer

The couchDB approach suggest to emit proper classes of stuff in map phase and summarize it in reduce.. So you could map all comments and emit 1 for the given user and later print out only ones. It would require however lots of disk storage to build persistent views of all trackable data in couchDB. btw they have also this wiki page about relationships: http://wiki.apache.org/couchdb/EntityRelationship.

Riak on the other hand has tool to build relations. It is link. You can input address of a linked (here comment) document to the 'root' document (here user document). It has one trick. If it is distributed it may be modified at one time in many locations. It will cause conflicts and as a result huge vector clock tree :/ ..not so bad, not so good.

Riak has also yet another 'mechanism'. It has 2-layer key name space, so called bucket and key. So, for student example, If we have club A, B and C and student StudentX, StudentY you could maintain following convention:

{ Key = {ClubA, StudentX}, Value = true }, 
{ Key = {ClubB, StudentX}, Value = true }, 
{ Key = {ClubA, StudentY}, Value = true }

and to read relation just list keys in given buckets. Whats wrong with that? It is damn slow. Listing buckets was never priority for riak. It is getting better and better tho. btw. you do not waste memory because this example {true} can be linked to single full profile of StudentX or Y (here conflicts are not possible).

As you see it NoSQL != NoSQL. You need to look at specific implementation and test it for yourself.

Mentioned before Column stores look like good fit for relations.. but it all depends on your A and C and P needs;) If you do not need A and you have less than Peta bytes just leave it, go ahead with MySql or Postgres.

good luck

Riak recently released v1.0, which adds support for secondary indexes when using the LevelDB backend. Very valuable feature.

Tom Clarkson

user:userid:comments is a reasonable approach - think of it as the equivalent of a column index in SQL, with the added requirement that you cannot query on unindexed columns. This is where you need to think about your requirements. A list with 30 million items is not unreasonable because it is slow, but because it is impractical to ever do anything with it. If your real requirement is to display some recent comments you are better off keeping a very short list that gets updated whenever a comment is added - remember that NoSQL has no normalization requirement. Race conditions are an issue with lists in a basic key value store but generally either your platform supports lists properly, you can do something with locks, or you don't actually care about failed updates. Same as for user comments - create an index keyword:posts More of the same - probably a list of clubs as a property of student and an index on that field to get all members of a club

So, basically everything just needs lists? Seems like there should be a more sophisticated approach than just keeping track of strings of ids manually. For one, you can only go so far before they get to big to be useful. Then again, the NoSQL technology's main poster-child projects (MongoDB, CouchDB, Membase, etc) are all new projects so perhaps I just need to give them more time to come up with a better way to track relations.

If you are using NoSQL (AKA non-relational data stores) you need to stop thinking in relational terms. The approach used will differ between platforms, but the basic idea that you have to manage indexes is fairly universal. The relationship examples you have given are modeled in two different ways in NoSQL: 1) Storage - unlike SQL, columns can have multiple/complex values, so the child object is just part of the parent object. 2) Search - Your long lists are actually a requirement for searchability, which means indexing - you could use a simple custom list or a more complete search engine.

Basit Anwer

You have

"user": {
    "userid": "unique value",
    "category": "student",
    "metainfo": "yada yada yada",
    "clubs": ["archery", "kendo"]
}

"comments": {
    "commentid": "unique value",
    "pageid": "unique value",
    "post-time": "ISO Date",
    "userid": "OP id -> THIS IS IMPORTANT"
}

"page": {
    "pageid": "unique value",
    "post-time": "ISO Date",
    "op-id": "user id",
    "tag": ["abc", "zxcv", "qwer"]
}

Well in a relational database the normal thing to do would be in a one-to-many relation is to normalize the data. That is the same thing you would do in a NoSQL database as well. Simply index the fields which you will be fetching the information with.

For example, the important indexes for you are

Comment.UserID

Comment.PageID

Comment.PostTime

Page.Tag[]

If you are using NosDB (A .NET based NoSQL Database with SQL support) your queries will be like

 SELECT * FROM Comments WHERE userid = ‘That user’;

 SELECT * FROM Comments WHERE pageid = ‘That user’;

 SELECT * FROM Comments WHERE post-time > DateTime('2016, 1, 1');

 SELECT * FROM Page WHERE tag = 'kendo'

Check all the supported query types from their SQL cheat sheet or documentation.

Imran Javed

Although, it is best to use RDBMS in such cases instead of NoSQL, yet one possible solution is to maintain additional nodes or collections to manage mapping and indexes. It may have additional cost in form of extra collections/nodes and processing, but it will give an solution easy to maintain and avoid data redundancy.

How do you track record relations in NoSQL?

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Links

Contact US