I keep coming across a range of experiences in working with large financial institutions on Big Data projects.
First off, the technology stack in Big Data is extremely fragmented, names are bewildering, and each Open Source project addresses some rather subtle aspects of the overall problems.
Over the last 3 decades or so there have always been camps of various kinds - Unix vs Windows, Oracle vs Sybase vs DB2 vs ..., Apple vs Microsoft, ... As software designers and architects we have had to pick, mix, and match. But the Big Data stack and the choices is more varied and confusing than anything else on show so far.
Most people tend to agree that they need to do something with and about Big Data. The motivations vary though. In my experience the motivations range from:
- Big Data can handle unlimited data at lightening fast speed a la Google, Facebook, Twitter etc so we could use the same to manage all our data
- ALL data in one place creates opportunities not discovered when data is fragmented. So, let's build and then they will come
- Insights into data can augment products and services and become additional sources of revenue
- ETL/EAI infrastructure is expensive to scale and Big Data technologies - largely open source - offer inexpensive alternatives with nearly unlimited capacity
- Big Data offers opportunities to bring intelligence in business applications and processes with direct and tangible benefits in areas ranging from customer interfaces, to risk management, to operational efficiency, and beyond
Leadership in these institutions is somewhat scared of making wrong choice in technology stack selection. I often find camps within organizations taking positions on technologies. Some have moved half way by bringing all data on a Hadoop cluster and then play with technologies for the next step.
Technologies themselves are converging, evolving, and innovating in diverse ways. This makes the task of selecting specific technologies very hard.
Then there is confusion about where to use Big Data? A lot of hype about Big Data is about their ability to handle unstructured data - free text, documents, audio/video, tweets, posts etc - of limitless variety 'easily'. Many CxOs conclude that Big Data does not apply in financial services industry because the data there is largely structured.
However, the financial industry's problem is different. The first opportunities to use Big Data here are where data IS structured but there are huge number of data sources - all structured. Big Metadata is a bigger problem to address before it can come to Big Data. While Facebook and Twitter etc are believed to address ultra-large membership large financial institutions compare well with them - each having hundreds of millions of relationships. Then, there is velocity depending on lines of business. Credit card and retail transactions are as fast paced as tweets and posts. There are also transactions that may be smaller in number (by comparison) but MUST be processed in short periods of time.
Big Data in financial industry is first about Big Structured Data.
And, then the unstructured world kicks in. Huge amount of documents are the first area of interest. These number tens of millions in typical large financial institutions.
Making technology choices is a difficult problem. In my view the strategies to manage technologies need to have significant departure from the past 'best' practices. It is practically a given that the stack is fluid and any choice made once may have to be revised or improvised upon in 2-3 years time.
Very smart architectural choices, identification of key invariant drivers, and intelligent application creation are vital ingredients to good strategies for the CxOs today.
These times are intensely 'interesting' with an explosion in innovation in data, devices, capacity, mobility, and intelligence. It is curious to project the world on a time horizon of 3 years or so.