One repository to rule them all?

This post is in response to a number of tweets which were going around a week or so ago from the likes of Pie and Lee. The questions which were being discussed were around what is a platform and what is a system. It actually resonated with something else which was going around in my mind for the past few weeks which I wanted to get down and out to the wider world.

I’ve seen a number of times now where vendors are pushing for their Content Management product to be the repository platform for a particular organisation, being able to handle all types of unstructured content. I’ve spoken about this myself with customers and I think there are a lot of advantages to this approach, not least the cost and the simplicity of maintaining a single stack. However there are times when an organisation will have multiple repositories and for good reason.

I was starting to consider which model works best. One thing I considered was the approach to Databases, organisations do not tend to have all their structured data in a single repository but will have multiple different repositories within the organisation which are aligned to specific applications. Admittedly organisations will strive to standardise on a database vendor but the reality is that they will have many, many databases within the organisation with lots of information, sometimes overlapping, sometimes contradictory. In fact this multitude of data has seen the rise of Master Data Management, to help organisations understand which data is the truth, and Data Warehousing where organisations can start to aggregate this data, largely for reporting purposes.

In more recent years we have seen this data made available through a Service Oriented Architecture, where the data and the behaviours associated with it are combined to form services.

So what does this all mean for Content Management? Is it any different from structured data management? And what the hell has this got to do with the discussion on what is a platform and what is a system?

To begin with I cannot see the one repository to rule them all being the way in which organisations go. In fact the recent announcement of EMC’s partnership with Fatwire led to a similar comment from Stephen Powers of Forrester. Further to this we will always see content which is not stored within a single repository, it could be on file shares, it could be within emails, it could be within one of the 2 or 3 different content repositories the organisation use. (Whilst standardisation of technology can be good there are times when having a low cost and easy to use content management system alongside a more feature rich, yet more expensive solution, can be the way forward)

Okay so if we believe that there will not be the one repository to rule them all, where does this leave us? Well I believe it leaves us looking towards the more SOA based view of the world. The repositories expose their content and features which can then be managed through a single layer…this is the platform. When a user works on a piece of content they simply save it with the information needed to identify it, the behaviours they expect of that content and the security rules of that content. The platform layer will then decide which repository to use to store and manage the content. Alternatively users will be able to access a specific system direct in much the way they do now through the SharePoint or Documentum Webtop UI.

This platform layer will be able to interpret and understand what content is available within an organisation and how that content should be treated. This starts to raise other questions such as where does the BPM capability reside and where should content retention be performed? Well this is where I think the platform layer starts to come into its own. For content Retention, read Information retention. It strikes me that in the unstructured world we are well versed in the need to manage and dispose of information appropriately, I am not too sure this is the case in the structured world. This is wrong, the majority of the time the reason for a piece of information being kept or being disposed is due to some of the structured data e.g. customer information, including correspondence, will be based on how long that individual remains a customer, plus whatever period must be added. Similarly in criminal investigations, it is generally related to a date related to the case, be it the court date, charge date or even release date. The simplicity of having a rule where all information stored about an entity (I use the term loosely here) must be retained or can be disposed of should not be overlooked. This leads me on to the question of BPM, similarly a lot of rules which are used and evaluated in the execution of a business process are related to structured information. Many times I have found myself in a position where I need access to some structured information in order to put a rule in a business process. Yes there are solutions to this, e.g. JDBC access to a database form within the Content Management BPM engine, it always seems to be very closely tied to one view or another…whereas it is more important to consider the business process at a slightly more abstract layer which can be easily decoupled from the content repository.

Now this concept may be some way off being achieved but there are two things which lead me to believe it is where we need to be, Cloud Computing and CMIS:

- Cloud Computing. As we start to see more and more organisations put “some” of their content in the cloud the need to understand where their distributed content is will become stronger. I think it ambitious that an organisation will put all their content in the cloud but the lower value content will certainly start to find its way to such a solution. The important thing is that end users do not give a hoot where the content is, they want to know about the content.

- CMIS. It remains early days but the I predict we will see take up of this standard which is wider than the traditional CMS vendors. The more products out there which expose their content through CMIS will make it more likely that a vendor will develop a platform which uses this commonality to provide the functionality I describe above.

I’ll post again in the future as I see this being expanded into some of the Web 2.0 features which exists, plus some discussion on visualisation of all this information.