Three Spheres: Science, Design and Engineering

In the world of finance, the Foundery stands out as a pioneering challenger to the traditional financial institution – think suits, three-letter acronyms and legacy software housed in massive, skyline-dominating buildings. Although the Foundery isn’t alone in this endeavour, the digital financial organisation is still in its earliest days and there are many unanswered questions and unsolved challenges that lie ahead. This is the nature of the challenge that the Foundery has accepted: there will be no obvious answers or solutions.

http://www.symmetrymagazine.org/article/universe-steps-on-the-gas

In the world of finance, the Foundery stands out as a pioneering challenger to the traditional financial institution – think suits, three-letter acronyms and legacy software housed in massive, skyline-dominating buildings. Although the Foundery isn’t alone in this endeavour, the digital financial organisation is still in its earliest days and there are many unanswered questions and unsolved challenges that lie ahead. This is the nature of the challenge that the Foundery has accepted: there will be no obvious answers or solutions.

The key to success, however, is to recognise that with uncertainty comes opportunity – the opportunity to break new technological ground and seek new digital pathways that will one day reshape the world of finance.

This blogpost, however, isn’t about those challenges. Rather it is about the pioneering spirit, embodied by three overlapping spheres of innovation: science, design and engineering.

Science

We understand science as both the body of knowledge and the process by which we try to understand the world. Science is humanity’s attempt to organise the entire universe into testable theories from which we can make predictions about the world.

Here the universe is taken to include the natural world – such as physics

and biology – the social world – such as economics and linguistics – and

the abstract world, such as mathematics and computer science  [link].

If the goal of science is to formulate testable theories from which we can make predictions, how does it relate to the Foundery’s challenge of transforming the world of banking?

Science is the sphere that embodies the process of discovery. It is curiosity coupled with the discipline to establish truths and meaning in the world in which we live – including the world of digital disruption which the Foundery inhabits.

The pioneering spirit requires not only the curiosity to break new ground, but also a special kind of scientific curiosity to turn this new ground into groundbreaking discoveries.

Design

Design is the conceptual configuration of an idea, process or object. It is understood as the formulation of both the aesthetic and functional specifications of the object, idea or process.

To put it more simply in the words of the late Steve Jobs, arguably one of the most significant pioneers of the 21st century:

“Design is not just what it looks and feels like. Design is how it works.”

Whereas science is concerned with trying to understand the world that humanity occupies, design is concerned with the things – objects, ideas and processes – which humanity adds to the world, and how they look and how they work.

At the Foundery, the pioneering spirit is more than just breaking new ground: it is the creation of accessible pathways, including new solutions and disruptive technologies. Design is the process of creating new solutions – not just planning and configuring what these solutions are, but experimenting with how they look and work.

Thus design is the sphere which embodies experimentation. It is the courage to try something new, unencumbered by the fear of failure. It is the willpower to try over and over again until something great can be achieved.

Engineering

Engineering is the application of science to solve problems in the real world. At one level engineering is the intersection of science and design – combining scientific knowledge with principles from design – but taken on the whole engineering is more that: it encompasses the design, control and scaling of constructive and systematic solutions to real world problems.

In the past engineering was typically associated with physical systems such as chemical processes and mechanical engines. In today’s technological age, we also associate engineering with abstract information systems and computer programmes.

Now financial institutions can be viewed as massive, highly complex and highly specialised information systems. So from this perspective, one part of the Foundery’s task is to engineer the processes, interfaces and information networks of the bank of the future.

Engineering is the sphere which embodies problem solving. It is one thing to break new ground and make new discoveries and experiment with new solutions, but something else entirely to translate the pioneering spirit into technologies and systems with the potential to change the world.

Bringing the Spheres Together

On their own, science, design and engineering represent different aspects of the creation process: science is the process of discovery, design is the process of experimentation and refinement and engineering is the process of problem solving. But this view alone suggests that there is a linear order to the creation process: that each process must take place in phases.

This isn’t my view and certainly isn’t the aim of this blogpost. Rather, my interpretation of science, design and engineering is that they are abstract, multi-dimensional spheres which embody the creative process. They are self-contained concepts which exist in their own right, but with clear points of intersection which link science, design and engineering. Together they are a whole which is greater than the sum of its parts.

Whether it is the blockchain exchange, the novel application of machine learning to existing financial services or even our partnership-based organisational structure, science, design and engineering are very much at the Foundery’s core. These three spheres embody the pioneering spirit which drives our purpose: from the curiosity to explore more, to the courage to try more and the resolve to do more.

by Jonathan Sinai

 

 

 

 

 

The Dimensions Of An Effective Data Science Team

Organisations worldwide are increasingly looking to data science teams to provide business insight, understand customer behaviour and drive new product development. The broad field of Artificial Intelligence (AI) including Machine Learning (ML) and Deep Learning (DL) is exploding both in terms of academic research and business implementation. Some of the world’s biggest companies including Google, Facebook, Uber, Airbnb, and Goldman Sachs derive much of their value from data science effectiveness. These companies use data in very creative ways and are able to generate massive amounts of competitive advantage and business insight through the effective use of data.

https://static1.squarespace.com/static/5193ac7de4b0f3c8853ae813/5194e45be4b0dc6d4010952e/55ba8a68e4b0aac11e3339cd/1438288490143//img.jpg

The Need for Data Science

Organisations worldwide are increasingly looking to data science teams to provide business insight, understand customer behaviour and drive new product development. The broad field of Artificial Intelligence (AI) including Machine Learning (ML) and Deep Learning (DL) is exploding both in terms of academic research and business implementation. Some of the world’s biggest companies including Google, Facebook, Uber, Airbnb, and Goldman Sachs derive much of their value from data science effectiveness. These companies use data in very creative ways and are able to generate massive amounts of competitive advantage and business insight through the effective use of data.

Have you ever wondered how Google Maps predicts traffic? How does Facebook know your preferences so accurately? Why would Google give a platform as powerful as Gmail away for free? Having data and a great idea is a start – but the likes of Facebook’s and Google’s have figured out that a key step in the creation of amazing data products (and the resultant business value generation) is the formation of highly effective, aligned and organisationally-supported data science teams.

Effective Data Science Teams

How exactly have these leading data companies of the world established effective data science teams? What skills are required and what technologies have they employed? What processes do they have in place to enable effective data science? What cultures, behaviours and habits have been embraced by their staff and how have they set up their data science teams for success? The focus of this blog is to better understand at a high level what makes up an effective data science team and to discuss some practical steps to consider. This blog also poses several open-ended questions worth thinking about. Later blogs in this series will go into more detail in each of the dimensions discussed below.

Drew Harry, Director of Science at Twitch wrote an excellent article titled “Highly Effective Data Science teams”. He states that “Great data science work is built on a hierarchy of basic needs: powerful data infrastructure that is well maintained, protection from ad-hoc distractions, high-quality data, strong team research processes, and access to open-minded decision-makers with high leverage problems to solve” [1].

In my opinion, this definition accurately describes the various dimensions that are necessary for data science teams to be effective. As such, I would like to attempt to decompose this quote further and try to understand it in more detail.

Drew Harry’s Hierarchy of Basic Data Science Needs

Great data science requires powerful data infrastructure

A common pitfall of data science teams is that they are sometimes forced either through lack of resources or through lack of understanding of the role of data scientists, to do time-intensive data wrangling activities (sourcing, cleaning, preparing data). Additionally, data scientists are often asked to complete ad-hoc requests and build business intelligence reports. These tasks should ideally be removed from the responsibilities of a data science team to allow them to focus on their core capabilities: that is utilising their mathematical and statistical abilities to solve challenging business problems and find interesting patterns in data rather than expending their efforts on housekeeping work. To do this, ideally data scientists should be supported by a dedicated team of data engineers. Data engineers typically build robust data infrastructures and architectures, implement tools to assist with data acquisition, data modeling, ETL, data architecture etc.

https://sg-dae.kxcdn.com/blog/wp-content/uploads/2014/01/managerial-skills-hallmarks-great-leaders.jpg

An example of this is at Facebook, a world leader in data engineering. Just imagine for a second the technical challenges inherent in providing over one billion people a personalised homepage full of various posts, photos and videos on a near-real time basis. To do this, Facebook runs one of the world’s largest data warehouses storing over 300 petabytes of data [2] and employs a range of powerful and sophisticated data processing techniques and tools [3]. This data engineering capability enables thousands of Facebook employees to effectively use their data to focus on value enhancing activities for the company without worrying about the nuts and bolts of how the data got there.

I realise that we are not all blessed with the resources and data talent inherent in Silicon Valley firms such as Facebook. Our data landscapes are often siloed and our IT support teams where data engineers traditionally reside mainly focus on keeping the lights on and putting out fires. But this model has to change – set up your data science teams to have the best chance of success. Co-opt a data engineer onto the data science team. If this is not possible due to resource constraints then at least provide your data scientists with the tools to easily create ETL code and rapidly spin up bespoke data warehouses thus enabling them with rapid experimentation execution capability. Whatever you do, don’t let them be bogged down in operational data sludge.

Great data science requires easily accessible, high-quality data

https://gcn.com/~/media/GIG/GCN/Redesign/Articles/2015/May/datascience.png

Data should be trusted, and be of a high quality. Additionally, there should be enough data available to allow data scientists to be able to execute experiments. Data should be easily accessible, and the team should have processing power capable of running complex code in reasonable time frames. Data scientists should, within legal boundaries, have easy, autonomous, access to data. Data science teams should not be precluded from the use of data on production systems and mechanisms need to be put in place to allow for this rather than being banned from use just because “hey – this is production – don’t you dare touch!”

In order to support their army of business users and data scientists, eBay, one of the world’s largest auction and shopping sites, has successfully implemented a data analytics sandbox environment separate from the company’s production systems. eBay allows employees that want to analyse and explore data to create large virtual data marts inside their data warehouse. These sandboxes are walled off areas that offer a safe environment for data scientists to experiment with both internal data from the organisation as well as providing them with the ability to ingest other types of external data sources.

I would encourage you to explore the creation of such environments in your own organisations in order to provide your data science teams with easily accessible, high quality data that does not threaten production systems. It must be noted that to support this kind of environment, your data architecture must allow for the integration of all of the organisation’s (and other external) data – both structured and unstructured. As an example, eBay has an integrated data architecture that comprises of an enterprise data warehouse that stores transactional data, a separate Teradata deep storage data base which stores semi-structured data as well as a Hadoop implementation for unstructured data [4]. Other organisations are creating “data lakes” that allow raw, structured and unstructured data to be stored in a vast, low-cost data stores. The point is that the creation of such integrated data environments goes hand in hand with providing your data science team with analytics sandbox environments. As an aside, all the efforts going into your data management and data compliance projects will also greatly assist in this regard.

Great data science requires access to open-minded decision-makers with high leverage problems to solve

https://www.illoz.com/group_articles_images/3248184859.jpg

DJ Patel stated that “A data-driven organisation acquires, processes, and leverages data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape” [5]. This culture of being data-driven needs to be driven from the top down. As an example, Airbnb promotes a data-driven culture and uses data as a vital input in their decision-making process [6]. They use analytics in their everyday operations, conduct experiments to test various hypotheses, and build statistical models to generate business insights to great success.

Data science initiatives should always be supported by top-level organisational decision-makers. These leaders must be able to articulate the value that data science has brought to their business [1]. Wherever possible, co-create analytics solutions with your key business stakeholders.  Make them your product owners and provide feedback on insights to them on a regular basis. This will help keep the business context front of mind and allows them to experience the power and value of data science directly. Organisational decision-makers will also have the deepest understanding of company strategy and performance and can thus direct data science efforts to problems with the highest business impact.

Great data science requires strong team research processes

Data science teams should have strong operational research capabilities and robust internal processes. This will enable the team to be able to execute controlled experiments with high levels of confidence in their results. Effective internal processes can assist in promoting a culture of being able to fail fast, fail quickly and provide valuable feedback into the business experiment/data science loop. Google and Facebook have mastered this in their ability to amongst other things; aggregate vast quantities of anonymised data, conduct rapid experiments and share these insights internally with their partners thus generating substantial revenues in the process.

Think of this as employing robust software engineering principles to your data science practice. Ensure that your documentation is up to date and of a high standard. Ensure that there is a process for code review, and that you are able to correctly interpret the results that you are seeing in the data. Test the impact of this analysis with your key stakeholders. As Drew Harry states, “controlled experimentation is the most critical tool in data science’s arsenal and a team that doesn’t make regular use of it is doing something wrong” [1].

In Closing

This blog is based on a decomposition of Drew Harry’s definition of what enables great data science teams. It provides a few examples of companies doing this well and some practical steps and open-ended questions to consider.

To summarise: A well-balanced and effective data science team requires a data engineering team to support them from a data infrastructure and architecture perspective. They require large amounts of data that is accurate and trusted. They require data to be easily accessible and need some level of autonomy in accessing data. Top level decision makers need to buy into the value of data science and have an open mind when analysing the results of data science experiments. These leaders also need to be promoting a data-driven culture and provide the data science team with challenging and valuable business problems. Data science teams also need to keep their house clean and have adequate internal processes to execute accurate and effective experiments which will allow them to fail and learn quickly and ultimately become trusted business advisors.

Some Final Questions Worth Considering and Next Steps

In writing this, some intriguing questions come to mind: Surely there is an African context to consider here? What are we doing well on the African continent and how can we start becoming exporters of effective data science practices and talent. Other questions that come to mind include: To what end does all of the above need to be in place at once? What is the right mix of data scientists/engineers and analysts? What is the optimal mix of permanent, contractor and crowd-sourced resources (e.g. Kaggle-like initiatives [7])? Academia, consultancies and research houses are beating the drum of how important it is to be data-driven, but to what extent is this always necessary? Are there some problems that shouldn’t be using data as an input? Should we be purchasing external data to augment the internal data that we have, and if so, what data should we be purchasing? One of our competitors recently launched an advertising campaign explicitly stating that their customers are “more than just data” so does this imply that some sort of “data fatigue” is setting in for our clients?

My next blog will explore in more detail, the ideal skillsets required in a data engineering team and how data engineering can be practically implemented in an organisation’s data science strategy. I will also attempt to tackle some of the pertinent open-ended questions mentioned above.

The dimensions discussed in this blog are by no means exhaustive, and there are certainly more questions than answers at this stage. I would love to see your comments on how you may have seen data science being implemented effectively in your organisations or some vexing questions that you would like to discuss.

References

[1] https://medium.com/mit-media-lab/highly-effective-data-science-teams-e90bb13bb709

[2] https://blog.keen.io/architecture-of-giants-data-stacks-at-facebook-netflix-airbnb-and-pinterest-9b7cd881af54

[3] https://www.wired.com/2013/02/facebook-data-team/

[4] http://searchbusinessanalytics.techtarget.com/feature/Data-sandboxes-help-analysts-dig-deep-into-corporate-info

[5] https://books.google.co.za/books?id=wZHe0t4ZgWoC&printsec=frontcover#v=onepage&q&f=false

[6] https://medium.com/airbnb-engineering/data-infrastructure-at-airbnb-8adfb34f169c?s=keen-io

[7] https://www.kaggle.com/

by Nicholas Simigiannis