OpenIO is a very young company with a history already behind it: although on the market only since 2015, the company’s founders started developing the core technology back in 2006, as part of a project for a major Telco. The code was open-sourced in 2012, then forked and finally productized and presented to customers in its current form. OpenIO is based in Lille, France, with offices in San Francisco and Tokyo and plans for expansion in the next coming months.
OpenIO’s proposition could be quickly and very unfairly labeled as YAOSS – Yet Another Object Storage Solution, while in reality it is way more than just that. To better understand why, let’s start from a very high level description of the current state of the storage market, the typical use cases for object storage systems and how they are quickly evolving.
The storage market can be split between systems focused on performance – designed to address the needs of applications that require high IOPS and low latencies for frequently changing and frequently accessed data – and capacity-focused systems, good at handling data unfrequently changing but with specific retention and retrievability requirements. While the former usually run on proprietary hardware, the latter – that fit the definition of “Object Storage Systems” – leverage intelligence coming from software running on commodity, and often heterogeneous, systems.
Typical use cases for traditional Object Storage Systems – OSSs for brevity – include Email, Multimedia, Backup and Archiving, Private Cloud Storage and even Big Data; the market is already well provided with systems addressing all these requirements in a smart and efficient way but the truth is that the landscape is rapidly evolving and new needs are arising. Future proof OSSs should not be able only to store and retrieve kittens’ pictures, but primarily address the challenges of the Data Centric Era. Data is more and more generated not by humans but by machines themselves and needs to be collected, analyzed and stored, very often as close as possible to the data source itself, without having (or without an option) to be shipped to remote data warehouses.
Examples of these new upcoming trends can be summarized as follows:
- Integrated Data Processing
- Industrial IoT
- Machine Learning/AI
Smart cars (think of Teslas or upcoming self-driving vehicles) already generate daily Terabytes of data that must be analyzed and processed locally; industrial sensors in plants or off-shore platforms collect data that must be cross-checked to make real-time decisions impacting production batches; finally, AI software must look for patterns in data collected just a few milliseconds before (or been stored for years in huge Data Lakes) to provide instant solutions or long-term forecasts.
It is very evident that, in order to serve these new use cases, two intertwined and somehow conflicting needs emerge. First, quite often data must be processed exactly where it is collected and, second, the platform where data is stored and processed must be able to scale from the size of a Raspberry Pi Zero to that of an Enterprise Data Center.
OpenIO’s mission is exactly to solve the above problems through SDS, an open, lightweight, flexible and integrated solution that – as a bonus – comes with open sourced code for its core components.
Being open allows us to freely download the core OpenIO code and test it (or even contribute to it) without any functionality limitation, while only the advanced features are proprietary and licensed separately.
SDS is lightweight because it can run on a system as small as a one ARM CPU core and 512 MB of RAM and scale up to multi-Petabytes capacity, multi socket x86 systems, still running the very same code.
OpenIO demonstrated their ability to run SDS on such small systems by showcasing a “nano-node” PoC device where a fully functional OpenIO implementation runs on a Linux powered, fully fledged PCB directly plugged into a consumer grade HDD.
Although OpenIO is not a HW vendor, this “demo device” shows the level of miniaturization achievable and therefore the integration possibilities with mobile or IoT kind of data generating sources. OpenIO recently went as far as hinting an upcoming and yet to be announced partnership with a well-known HDD manufacturer to embed SDS directly inside their disks’ firmwares.
SDS is also flexible because it can run on nodes assembled using any commodity hardware sourced directly by the customer, regardless of their performance and capacity profile. The “magic” is performed by an SDS component called Conscience: an OpenIO cluster is a distributed, master-less system, where every node communicates with each other using agents and establishing a shared understanding of the cluster’s and of the individual nodes’ status (by means of “scores”). Therefore, whenever a new object enters the cluster, it is allocated by Conscience to the most appropriate node using the scoring algorithm.
Finally, and this is in my humble opinion the real OpenIO differentiator, integration with applications is obtained thanks to OpenIO’s Event-Driven Data Processing Framework called “Grid For Apps” (G4A). I would describe this as a distributed engine to run data-manipulating code directly on the storage nodes as the direct consequence of a specific event. If you were thinking about “Serverless”, then you were close enough. Serverless is the new flavor of the month and I personally don’t like to use buzzwords, but in this case it sets the stage for what G4A can do; the evolutionary trend of the computing model is very clear, we are abstracting more and more applications from the underlying layers, first with virtualization, then with containers and now with “Lambda-like” functions triggered by events and ran without any reference to any existing computing system.
With G4A OpenIO simply took this concept and applied it to their storage clusters: imagine an external application pushing an object to an OpenIO cluster, therefore triggering an event that in turn would cause a specific processing (happening on the OpenIO node) of that file and finally leading to a derived output.
That external application could be a social media app uploading kitten pictures to the cloud: the code running on the OpenIO cluster nodes would identify the image as that of a cute cat and apply a tag, so cat pictures would be automatically indexed and searchable through the social media app. Another possible example would be in-line, synchronous transcoding into multiple formats of broadcast feeds for instantaneous streaming to multiple platforms. Your fantasy and your coding skills are the limit here.
Although G4A has been around already for some time, OpenIO is improving it, as they will soon release Grid For Apps Pro which will allow for Job Dispatching and Scheduling across the cluster, Resource Management and Monitoring, direct code upload through a Web UI (Lambda again!) and even support for Functions.
Second half of 2017 will be a very busy year for OpenIO, with more new features being added to the core set (the FUSE-based OIO-FS file system connector, better S3 compatibility, a multi-cloud strategy involving expansion and replication to AWS EC2 and tiering to AWS S3 and Backblaze B2).
Therefore my advice is – if you have a use case that could be addressed by OpenIO – to keep an eye on the company for when all these cool new functionalities will be made available; in the meanwhile, OpenIO’s website (and especially its Community Section and Github repository) are packed full of information, resources and freely downloadable code. Grab a Raspberry Pi (or some) and build your first OpenIO cluster. What are you waiting for?