Data Model

From AardRock Wiki
Revision as of 09:53, 16 May 2006 by Martijn (talk | contribs) (Trac -> MediaWiki formatting)
Jump to navigation Jump to search

Data Model

Agreeing on the Cheetah data model is a signicant subject in the startup fase of our application. We can't write any useful code without agreeing somehow on the data model we'll be using. In this page we can, together, form an agreement about the data model, meaning how we are going to store user data..

According to the application specifications, Cheetah users will have to be able to log in anywhere. This implicates that all user data will have to be copied to several enough nodes in the network, enough to ensure that when a user logs in anywhere, he will find the most recent version of his data somewhere in the network. Since user data will have to be moved around as a whole, it will be meaningfull to provide each user data as a single package that can be copied over the network. For simplification, let's call this package a Cheetah User Package (CUP). Each CUP should consists of at least the following:

  • Timestamp. When a CUP is updated, it will be needed to propagate the new version through the network. As soon as a node logs in at a later time, it can receive its own CUP by searching the network for the most recent version using the timestamp. Note that new-version propagation is very important to ensure that a user can login in at any place and still receive the most recent version.
  • UID. This is the unique user identifier key. Could be checked at login time.
  • Password (hashed). Used at login time.
  • JXTA URI. Unique identifier key used within the Cheetah PeerGroup. Can be generated randomly with the JXTA library using the Cheetah PeerGroupID.
  • Personal user information (email etc.)
  • Any user preferences.
  • (initially empty) Dataset such as nutricion values, etc

CUP's should be propagated trough the network as soon as they're updated. This to ensure a user can log in anywhere later and still use his latest data.

A problem arises at login as soon as a user can't find its most recent CUP in the network. Imagine the following. We have a Cheetah network with 500 nodes. There's a 10% chance of a node being online at that moment. So only 50 nodes can be connected to. So, if you log out and propagate your CUP to all availabe nodes, only 50 nodes will have your most recent CUP. This means that as soon as you log in elsewhere, one of these 50 nodes will have to be online to ensure you can pull your most recent CUP. That is, of course, a problem. To combat this, it may be necesarry to have one 'supernode' that is always online, storing all CUPs in the network.

Issues

Sharing

How do we share between peers if they have different versions of the model? For example, if peer1 has version 5 of the application, and peer2 has version 31, there might be a problem when they try to share the data. [wiki:Migrations Migrations] could help here, but I think it's not enough.

Some calculations

If we share our data with all avaible clients by directly transferring it to them, that will probably cost a lot of bandwidth.

If we do this, we can also use the following formula for the expected number of nodes online with our data: x * x * y, where x is the average chance of a node being online, and y is a total number of nodes. When the network grows, the chance will more or less stay the same, but y will grow. That means, the number of expected nodes online with our data will grow as the total number of nodes grows. So let's say, if we have a network of 500 nodes, and our average file is 10 MB, and the average chance of a node being online is 10%, there will be 0.1 * 0.1 * 500 = 5 expected nodes with our data. This means you generate 500 MB of traffic each time you update, and every node will store a lot of data (possibly 500 * 10 MB).

There are two problems with this: it expects that a node will always save any data, which means that we're possibly going to use a lot of space on a node's disk, and also, when all data needs to be sent to all nodes, we're going to use a lot of bandwith.

Another solution is to choose to not share our data with everyone, but only with a number of friends. This means the expected number of nodes having our data available is the number of friends * the average chance of a friend being online.

Design

A very early proposal for the model design.

Root
- version : int
- accounts : List<Account>
- insulinTypes : List<InsulinType>

Account
- userInfo : userInfo
- logbookEntries : List<LogbookEntry>

UserInfo
- firstName : String
- surnamePrefix : String
- surname : String
- emailAddress : String
- password : String
- birthDate : long
- street : String
- houseNumber : String
- zipCode : String
- city : String
- state : String
- country : String

InsulinType
- manufacturer : String
- type : String
- units : scalar
- volume : scalar
- totalVolume : scalar
- cartridgesPerBox : int
- basalPercentage : float
- bolusPercentage : float
- comment : String

LogbookEntry
- time : long
- glucoseLevel : scalar
- insulinDose : scalar
- foodIntake : int
- healthStatus : String
- stressLevel : String
- activities : String
- weight : scalar
- comment : String