Why making the right strategic decisions when setting up your IoT database architecture will pay dividends in the future.

Choosing the right back-end infrastructure architecture by making important initial strategic decisions will save a lot of headaches and money down the road. The first step in the process is for the team to ask itself a few questions to decide what types of data it needs to collect in order to achieve its objectives.

Does that data need to be collected and available in “real time” (where timeliness is essential), or are the observations less time-bound? Some applications rely on the immediacy of data delivery, so should the data be streamed or sampled or is the data set too large that it must be sampled? Do you need all of the details contained in the data or just metadata? Answers to these questions will lead your team to a decision on what type of cloud to adopt.

If you’re sampling, using one of the public cloud vendors is probably fine. Smaller volumes of data with an implied lack of urgency lend themselves well to applications housed in one of the “big three” providers. By contrast, if you need to stream data in real time (large data sets, time-bound applications), you should find a local private cloud provider in a true Tier III data center to host the data.

There are a number of very real considerations why the public cloud may not be ideal for streaming data applications.

Streaming data applications need data to be rapidly available and more importantly delivered in an uninterrupted fashion. For this reason, latency – the time it takes the data to arrive at the destination database – is hugely important, so if the data source is physically too far from the database the application won’t run well.

Streaming data invariably creates a very large database, which ultimately will become too costly and unwieldy to maintain in the public cloud, potentially exceeding the capabilities of the tools available to extract value from the data.

The database should be a free and unrestricted (libre) open source database. A proprietary database will get very expensive and you may not be able to migrate the database later, potentially losing the rights to that data.

Similarly, collecting data in public cloud X will tie you to cloud X going forward. Consider that if the tools within that cloud are changed, discontinued or outgrown, your data isn’t portable, and you may lose your hard-won data entirely. Once the project is mature, the cost of the bandwidth in the public cloud could also make the project uneconomical.

Learn more about public and private clouds from our data partner, Stack41: