Saturday, March 16, 2013

Big Data: Structured or Unstructured?

I keep coming across a range of experiences in working with large financial institutions on Big Data projects.

First off, the technology stack in Big Data is extremely fragmented, names are bewildering, and each Open Source project addresses some rather subtle aspects of the overall problems.

Over the last 3 decades or so there have always been camps of various kinds - Unix vs Windows, Oracle vs Sybase vs DB2 vs ..., Apple vs Microsoft, ... As software designers and architects we have had to pick, mix, and match. But the Big Data stack and the choices is more varied and confusing than anything else on show so far.

Most people tend to agree that they need to do something with and about Big Data. The motivations vary though. In my experience the motivations range from:
  1. Big Data can handle unlimited data at lightening fast speed a la Google, Facebook, Twitter etc so we could use the same to manage all our data
  2. ALL data in one place creates opportunities not discovered when data is fragmented.   So, let's build and then they will come
  3. Insights into data can augment products and services and become additional sources of revenue
  4. ETL/EAI infrastructure is expensive to scale and Big Data technologies - largely open source - offer inexpensive alternatives with nearly unlimited capacity
  5. Big Data offers opportunities to bring intelligence in business applications and processes with direct and tangible benefits in areas ranging from customer interfaces, to risk management, to operational efficiency, and beyond
Leadership in these institutions is somewhat scared of making wrong choice in technology stack selection. I often find camps within organizations taking positions on technologies. Some have moved half way by bringing all data on a Hadoop cluster and then play with technologies for the next step.

Technologies themselves are converging, evolving, and innovating in diverse ways. This makes the task of selecting specific technologies very hard.

Then there is confusion about where to use Big Data? A lot of hype about Big Data is about their ability to handle unstructured data - free text, documents, audio/video, tweets, posts etc - of limitless variety 'easily'. Many CxOs conclude that Big Data does not apply in financial services industry because the data there is largely structured.

However, the financial industry's problem is different. The first opportunities to use Big Data here are where data IS structured but there are huge number of data sources - all structured. Big Metadata is a bigger problem to address before it can come to Big Data. While Facebook and Twitter etc are believed to address ultra-large membership large financial institutions compare well with them - each having hundreds of millions of relationships. Then, there is velocity depending on lines of business. Credit card and retail transactions are as fast paced as tweets and posts. There are also transactions that may be smaller in number (by comparison) but MUST be processed in short periods of time.

Big Data in financial industry is first about Big Structured Data.

And, then the unstructured world kicks in. Huge amount of documents are the first area of interest. These number tens of millions in typical large financial institutions.

Making technology choices is a difficult problem. In my view the strategies to manage technologies need to have significant departure from the past 'best' practices. It is practically a given that the stack is fluid and any choice made once may have to be revised or improvised upon in 2-3 years time. 

Very smart architectural choices, identification of key invariant drivers, and intelligent application creation are vital ingredients to good strategies for the CxOs today.

These times are intensely 'interesting' with an explosion in innovation in data, devices, capacity, mobility, and intelligence. It is curious to project the world on a time horizon of 3 years or so.

Monday, January 30, 2012


This is meant to be a constantly WIP post containing the topic tree which would be expanded in subsequent posts:

  1. Big Data
  2. User Experience
    1. Common
    2. Touch
    3. Surface Computing
  3. Compliance
    1. USA
      1. Dodd Frank
      2. Volcker
      3. FATCA
  4. Services
    1. Payment Gateways
    2. Master Data Management

Saturday, January 28, 2012


I started with COSL (Citicorp Overseas Software Limited) in Mumbai in Feb 1989.

I started doing a securities processing system for Citibank in Amsterdam. In the intervening years I spent many years in Japan, UK, India, and the US.  It has been decades now and the diversity has been increasing. Some trends that I see now include:

  1. Technology has become far more central in modern financial practice in terms of scale, reach, portability, speed, and complexity
  2. Finance has been inventing itself in drastic ways muddling through a variety of crises and challenges
  3. Finance has grown in developing countries in a far bigger way than we used to know in the 80s and 90s
  4. There have been many regulatory changes both liberal and of oversight
  5. Many organizations have changed beyond recognition due to global mergers and acquisitions
Given the growing complexity I find the need to keep in touch increasing at a rapid pace. This task is daunting and I need to keep notes (with my failing memory now!).

I thought it would be good to share these notes with anyone who would care to notice.

So this is the broad agenda. Despite plenty of grey hair I must humbly dispel any notion or claim to expertise. I do have some opinions and ideas which I would jot down here 'as is'