Epidemium is an open science project dedicated to using Big Data for understanding cancer. It involves a wide variety of actors and specialists. How to use the power of crowd, open source and open innovation to help scientists from various disciplines generate original solutions and advance our current understanding on cancer? Emerging methods of data analysis can provide a better understanding of complex fields such as medicine or life sciences. But this requires phrasing challenges that are understandable to doctors, patients and scientists alike. The C-K (Concept-Knowledge) method offers a solution.
Open source publications, communication technologies, digital platforms contribute to a broader knowledge access and enable science to be even more open, allowing participants that have expert knowledge and special abilities to solve significant questions.
The access to information, traditionally considered as a strategic feature in any scientific research organization, is not limited anymore to a few players that are capable to invest in expensive and long R&D programs. It is now accessible to almost anyone who is willing to learn, experiment and possibly make a scientific contribution. Increasing disclosure in scientific processes is driving the emergence of communities around scientific projects. Leveraging on these open communities to solve various problems and even to create scientific discoveries is becoming increasingly popular.
As Sauermann and Franzoni (2015) pointed out, a growing amount of scientific research is done in an open manner. Examples can be found across different domains and disciplines.
For instance, the Polymath project launched in 2009 by Tim Gower was sought as collaboration among mathematicians to solve important and difficult mathematical problems by coordinating many specialists to communicate with each other on finding the best route to the solution. Polymath resulted in more than 12 challenges launched. The project proved that many minds could work together to solve difficult mathematical problems.
A joint project between Harvard, TopCoder and Broad Institute and Crowd Innovation Lab was launched to organize a series of challenges on development of algorithms for faster DNA sequence alignment and to improve analysis of gene expression data. These examples clearly demonstrate that open initiatives can deal with extremely complex problems.
These projects often refer to crowd or open science. They are characterized by open participation and sharing data and problem solving techniques with participants. Open science promoters often highlight the possibility to learn, collaborate with others, and test new theories.
In recent years life science and medicine have been facing major changes with the apparition of new massive sources of information such as genomic identity or global patient environment. In parallel, new forms of treatment like biotherapies that consider diseases like cancer in their global environment or personalized treatments based on patient's genome information are becoming more available. Many areas such as epidemiology are undergoing major transformations that require new methods of data analysis. These disciplines are now using open collaborative settings to explore new ways to deal with these massive sources of data.
Epidemium, a collaborative initiative to explore new paths for cancer research, was launched in 2016. An inclusive and community-based open science program, Epidemium is a joint program of a pharmaceutical company, Roche, and an open and community laboratory, La Paillasse. The program uses data challenges, “Challenge4Cancer,” to approach the epidemiology of cancer in an open science framework.
Launched in 2016, the first Epidemium challenge was a blast: 678 people participated creating a broad community of experts bringing various competencies on data analysis, statistics, visualization, data mining, oncology, epidemiology. In total 15 different projects were developed over 6 months. These projects were subject of evaluation by the scientific and ethic committees to control the scientific validity of the results, originality, collaborative aspect, impact and perspectives on patients of the proposed approaches and to verify that explorations were ethically correct.
In a perspective of knowledge sharing, Challenge4Cancer’s participants had to document their advances and results on a wiki page accessible to anyone. This transparency allowed for continuous discussion during the challenge and enabled to create a vibrant community.
Despite these achievements, some difficulties related to the novelty, validity of the results and identification of promising research direction were underlined. One of the critical points was the identification of research questions and challenges.
It is known from the crowdsourcing and open innovation literature streams that problem formulation is one of the key factors to ensure successful outcomes and attract the right participants. Problems should be precise enough but avoid being too narrow, note Felin & Zenger (2014). This is particularly relevant for transdisciplinary challenges in open source. As pointed by Godemann (2008), crossing various forms of knowledge from different disciplines is highly beneficial for problem solving, but it raises difficulties in knowledge exchange and integration. Given the short duration of the challenge and the variety of participants, the appropriation of research areas should be better managed by the community. The research questions should be “understandable” to the different Epidemium communities such as doctors, patients, data scientists and incentivize them to work together. They also should be original and ambitious to attract high skilled participants.
Given the importance of designing research directions, in 2017 Epidemium decided to launch a preliminary exploration to create a better understanding of the stakes and identify research questions to tackle.
Solving questions using new approaches is exciting. But it is crucial to solve the right questions. What is the right knowledge gap to analyze? How to identify the research gaps in cancer research that can be relevant to tackle using data analysis? How to ensure that the relevant data is collected?
To design research questions, one would normally analyze the existing knowledge gaps and try to formulate questions that are novel enough. In case of Epidemium, the state of the art is quite broad, be it only since it includes disciplines related both to cancer and data analysis. Following traditional literature review would have been too costly and time consuming. Moreover, since the challenge aims to develop entirely new connections between different disciplines, knowledge advances should be presented in a way concise and simple enough to allow non-experts to have a quick understanding of what is going on.
In order to explore the possibilities related to data analysis & cancer research in a systematic way, to identify the framework of the current approaches and to generate a set of innovative concepts, a design theory based framework was applied.
This framework was based on a design tool derived from the Concept Knowledge (C-K) design theory of innovative design reasoning. Design theory was chosen since it allows for knowledge expandability that goes beyond pure combinatorial strategies and considers dynamic transformations, adaptations, hybridizations, discovery, invention and renewal of objects discovery. C-K design framework is useful for understanding novelty since it not only separates state of the art (available knowledge) and exploration phase (concept development) but also defines how to use the existing knowledge to structure the unknown.
C-K Design Theory is based on two interdependent spaces. The Concepts space has a tree based structure. This tree underlines the design paths for each idea and emphasizes its relation to other fields. The Knowledge space is represented by knowledge databases where different types of knowledge (with mention of its robustness and maturity) can be emphasized.
Along with workshops involving doctors, patients and data scientists, the C-K design framework was initially used to establish a common understanding of cancer and cancer treatment as well as the available data and data analysis techniques that can be used. This step was crucial to build a common vocabulary across experts from different domains, contextualize current approaches using the C-K framework and define the limits of current approaches.
Once this understanding was made explicit, alternatives were easier to identify by seeking the external knowledge and mapping the existing products. To imagine these alternatives, several workshops were organized with specialists of data analysis and cancer, and completed by literature review and close work with the Epidemium team. In total 25 experts participated in the workshops. They first shared their common vision of the field (i.e., cancer data is used by medical professionals, who collect this data and use it to better understand cancer, see Figure 2 for the extract of the map).
Alternatives were proposed at each level of map. For example, non-experts can use the data, different actors can access the data (and not just medical professionals) and these data can be used differently. Establishing the common understanding helped the experts to identify alternatives. For instance, today cancer screening is mostly performed by medical staff. The alternatives were imagined to explore self-screening techniques or screening performed by the third parties (these screening techniques should be non-invasive). Moreover, screening should occur not just when the first symptoms appear but on a regular basis. People at risk should be identified (through genome analysis, age, sexe, exposure to different risk factors) and they should benefit from frequent individual screening. In the future, continuous screening in real time should even be considered.
What about data? Different information was relevant (depending on the data use) such as data related to the patient health status, to the treatment efficiency and non-efficiency, to the patient’s behavior (nutrition, activity, work), to the environment or other external factors that can affect a person; epigenome data, data relate to patient care services, to the country economy, etc.
Different alternatives explored and structured thanks to the C-K framework enabled to identify 45 exploration axes such as automatically assigning patients to different departments based on a type of cancer, socio, treatments, assessing treatment efficiency or failure ex post including risk & environmental data, anticipating the efficiency of treatment and side effects per the patient profile and, for each organ, understanding which type of cancer can occur.
The first results were exposed to a larger Epidemium community (around 100 people) for their comments and suggestions. The results were validated with the scientific and ethical committee of the Epidemium community.
This collaborative work helped the community shape a variety of research directions and identify the knowledge needed to go further. The map is available to anyone who wishes to better understand the problematic of cancer and its treatment and to extend the map or complete it with existing projects.
Creating interdependencies between previously unrelated fields or concepts can lead to unexpected ideas. Forcing to create an interconnected map of concepts related to several rather independent fields allowed the Epidemium community to create a proximity between different experts and extend the exploration space to create a common understanding related to cancer.
Using design driven frameworks like Concept-Knowledge helped to understand and explore various alternatives that Epidemium can follow to build research directions and see how other initiatives are positioned, resulting in a visual benchmark of current research on data analysis initiatives for cancer. This map helped explore and generate potential hypotheses that are accessible to the community.
The proposed map is not exhaustive and it is subject to constant changes and improvement. Nevertheless, it offers a comprehensive overview for a complex problem and provides a rich set of research directions.
This approach aimed for a systematic exploration of all the possible alternatives, thus trying to avoid the cognitive biases that limit participants’ exploration capacity to solutions that are too obvious or exist already. Moreover, dealing with the existing knowledge fostered a better understand of what is the current state of the art and helped organize the search for new knowledge. It increased the ability of designers to generate original concepts.
We believe that this approach has a potential for developing more general models. It might be interesting to combine the design driven strategy with visualization, text mining or statistical approaches.
Dorst, K. (2006), “Design
problems and design paradoxes,” Design Issues,
Fecher, B., Friesike, S. & Hebing, M. (2015), “What drives academic data sharing?”, PloS one, 10, e0118053.
Godemann, J. (2008), “Knowledge integration: A key challenge for transdisciplinary cooperation,” Environmental Education Research, 14(6), 625-641.
Gowers, T. & Nielsen, M. (2009), “Massively collaborative mathematics,” Nature, 461, 879-881.
Hatchuel, A., P. Le Masson, Y. Reich and B. Weil (2011), “A systematic approach of design theories using generativeness and robustness,” Proceedings of the 18th International Conference on Engineering Design (ICED11), 2, 87-97.
Felin, T., Zenger, T., (2014), “Closed or open innovation? Problem solving and the governance choice,” Research Policy, 43, 5, 914-925.
Hooge, S., M. Agogué and T. Gillier (2012), “A new methodology for advanced engineering design: Lessons from experimenting CK Theory driven tools,” International Design Conference - Design 2012.
Khoury, M. J., Lam, T. K., Ioannidis, J. P., Hartge, P., Spitz, M. R., Buring, J. E., ... & Herceg, Z. (2013), “Transforming epidemiology for 21st century medicine and public health,” Cancer Epidemiology and Prevention Biomarkers, cebp-0146.
Le Masson, P., K. Dorst and E. Subrahmanian (2013), “Design theory: history, state of the art and advancements,” Research in Engineering Design 24(2), 97-103.
Sauermann, H. and C. Franzoni (2015), “Crowd science user contribution patterns and their implications,” Proceedings of the National Academy of Sciences, 112(3), 679-684.