Data types in MongoDB

In this article, we will go through the different data types that MongoDB supports. Let’s start with the basic types first. We can identify 6 basic types: null, boolean, numeric, string, array and object. If you are familiar with JSON, the representation of data in MongoDB will look familiar. JSON allows you to represent data by using key/value pairs and use those six basic types.

To get up and running with MongoDB, check out this article

Those six types allows you to express a lot of different things, but they have their limits. For example, there is not date types here. MongoDB add support for additional data types to solve these problems, while keeping the JSON syntax. Here are the most common types you will find:

  • null
{"x": null}

Null can be used to represent both a non-existent field and a null value

  • boolean

A boolean type allows you to choose between two values: true and false

{"x": true}
  • number

By default, the MongoDB shell will use 64-but floating point numbers.

{"x": 3.14}
{"x": 3}

To have more control over the space used by integers you can use NumberInt and NumberLong to store 4-byte and 8-byte signed integers:

{"x": NumberInt("3")}
{"x": NumberLong("3")}
  • string

Fairly common type, a string type can represent any string of UTF-8 characters:

{"x": "This is a string"}
  • date

MongoDB stores dates as 64-bit integers representing milliseconds since 1st January 1970. This is the same system used by the Date() constructor in Javascript. Note that the timezone is not stored:

{"x": new Date()}
  • regular expression

You can store Javascript’s regular expressions syntax inside a document:

{"x": /hello/i}
  • array

Another common type, list of values can be stored as arrays:

{"x": [1, "hello", 34, false]}
  • embedded document

Documents can contain entire documents as values:

{"x": {"hello": "world"}}
  • object id

An object ID is a 12-byte ID for documents:

{"x" : ObjectId()}

=> { “x” : ObjectId(“595574f86ecded6657b95d18”) }

  • code

MongoDB allows you to store Javascript in queries and documents:

{"x": function(){ ... }}

These are the types that will encounter most of the time in MongoDB.

Note on dates: As mentioned before, MongoDB uses the Date constructor from Javascript. This means that you must use the keyword new. If you don’t use it, you will get a string representation of the date, not a date object, like so:

> Date()
Fri Jun 30 2017 00:09:47 GMT+0200 (CEST)
> new Date()
ISODate("2017-06-29T22:09:51.582Z")

Note on arrays: Arrays in MongoDB are very similar to arrays in Javascript. You do not have to store the same type of values inside a array. Furthermore, MongoDB is perfectly capable of reaching inside an array with an index.

Note on embedded documents: An embedded document is what we call a document that is used as the value for a key. This allows us to have deeper levels of data and not being stuck with a flat key/value pair system. For example:

{"store": {
    "name": "SuperStore",
    "numberEmployees": 6,
    "employees": {
        "names": ["Joe", "John", "Sarah"],
        "ages": [23, 56, 32, 45]
        }
    }
}

etc…. the possibilities are of course endless and allows you to build your models to suit your needs.

 

_id and ObjectIds

You may have noticed by now that every single document you create or retrieve has an _id field with a ObjectId as value. This is mandatory in MongoDB. Every document must have an _id field, but the value doesn’t have to be an ObjectId. The value has to be unique within the *collection*.

I could give an _id a value ‘Hello World’, but this means that no other _id in the same collection can have ‘Hello World’ as a value. By default, the value will be an ObjectId.

The ObjectId class generates unique values. It is easier and faster than synchronizing auto-incrementing keys across multiple servers. ObjectId uses 12 bytes of storage, so you have a string representation of 24 hexadecimal digits. There are 2 digits by byte.

To dig deeper on ObjectId, try to call several new ObjectId() to see the difference.

> new ObjectId()
ObjectId("5957b00be0aafac536047e69")
> new ObjectId()
ObjectId("5957b00ce0aafac536047e6a")
> new ObjectId()
ObjectId("5957b00de0aafac536047e6b")
> new ObjectId()
ObjectId("5957b00ee0aafac536047e6c")

As you can see, only the last digit in my example changes. Here is how ObjectId is generated:

  • The first four bytes are a timestamp in seconds since the epoch ( 1st January 1970 )
  • The next three bytes are an unique identifier of the machine on which it was generated, usually a hash of the machine’s hostname. With that, we know that different machines won’t produce colliding ObjectIds
  • The next two bytes are created from the process identifier (PID) of the ObjectId generating process. This makes sure that you won’t have ObjectId colliding on the same machine.
  • The last three bytes are a incrementing counter responsible for uniqueness within a second in a single process.

 

With all that, it allows for up to 256^3 ( 16,277,216 ) unqiue ObjectIds to be generated per process in a single second. And no, I didn’t do the math myself. I took that from MongoDB The Definitive Guide by¬†Shannon Bradshaw and Kristina Chodorow.

Anyway, you do not need to worry about generating yourself those _id values. MongoDB takes care of it for you, so you don’t need to specify an _id field when you insert a new document.

 

That wraps it up for the data types in MongoDB!

Feel free to comment and share.

Have a nice day!

Your email address will not be published. Required fields are marked *

*