Bumblekite’s state of
applied AI & data science
in healthcare & biosciences
survey
Are you a data scientist, data engineer, machine learning engineer, analyst, clinician, biostatistician, bioinformatician, researcher,
epidemiologist, health analyst, scientist, [insert your job title]...
a person who designs, develops or applies data science, statistical and machine learning methods within our global healthcare and biomedical ecosystem?
We want to understand our work lives and data environments
to make them more impactful,joyful and productive.
40+
organisations
1,000+
hospitals
YOU 40+
countries
tens of thousands of
people like yourself
This survey is a part of a global, unique and ambitious community-driven effort that aims to generate data insights around the professional lives of YOU and people like yourself — the engines that drive the innovation in our ecosystem forward.
As new types of jobs continue to emerge, data science and AI teams are built up across the globe and new upskilling opportunities arise for the current healthcare workforce, today is a critical moment to map out our professional day-to-day lives and our data infrastructures: how similar or different are we?
We want to enable you, a passionate and talented individual of any age and corner of our ecosystem, to achieve your most ambitious career goals. We also strive to equip our work environments with the data to facilitate your best, career-defining work in this rapidly growing and changing area through evolving our data, software and hardware tools, and human practices that accompany them.
With the help of two data sources: LinkedIn’s Economic Graph and our survey, in our inaugural year we are asking the basic human questions: How many of us are out there? Where do we come from? From a sheet of paper, a fax to an LLM: what technology tools do we enjoy using?
We strongly believe our people-centric approach will ultimately lead to safer, more equitable, cost-effective and efficient usages of data and newly emerging technologies in the future.
Join us on this amazing road ahead!
This effort promises to provide the AI community with critical information about the state and needs of contemporary AI practitioners – especially those in healthcare – which will serve as a key piece in the AI workforce readiness puzzle. There is high demand as evidenced by the attention the work garnered at the National Artificial Intelligence Institute’s International Summit for AI in Healthcare.
- Gil Alterovitz, the inaugural Chief AI Officer, U.S. Department of Veteran Affairs
Our inaugural survey was conducted from November 2023 - January 2024.
Our data insights will be published here in the spring 2025. See you soon!
the latest
At the session "Biomedicine in the age of AI: human & machine learning intertwined", co-hosted by FOUR during the World Economic Forum in Davos in January 2024, Karin Kimbrough, Chief Economist at LinkedIn, shared the first preliminary insight from our global, collaborative research study. Our session was held as a part of the House of Switzerland programme, curated by the Swiss Federal Department of Foreign Affairs.
Stanford Health Care and Mediclinic International joined our growing group of organisations!
Our plenary session, where we introduced this work, delivered at the VA’s International Summit for AI in Health Care, early September 2023 in D.C., was the most viewed session of the Summit among a deeply impressive lineup of speakers, including the VA leadership, Pulitzer and MacArthur award recipients and many others, in front of an audience of over 1,000 attendees.
our partners
The hallmark of our work is its grassroots and peer-to-peer design: from practitioners to practitioners. This work has been co-created with the generous feedback given through hundreds of hours of conversations with your peers in the following organisations and research groups.
Our scientific collaborator is Prof. Papanicolas, head of the Center for Health System Sustainability (CHeSS) at Brown’s School of Public Health. To the newcomers to her work we love suggesting her brilliant TEDMED ‘18 talk. Our data partner is LinkedIn.
Health care organizations are turning to data and advanced analytics to solve some of the most pressing value-based healthcare and coverage challenges. At Kaiser Permanente, we have made significant strides in becoming more data driven from the very start. And, we still need to continue to advance and mature. At the heart of these efforts are multi-disciplinary product-agile data and analytical teams consists of both business and technology experts working in deep collaboration and continuous innovation.
- Vivian Tan, VP, Strategic Information Management & Global Relationships,
Kaiser Permanente
Acknowledgements
We would like to recognise and thank the following groups of people and individuals who supported the creation of this work: the executive assistants and supporting staff of the leaders we have been in correspondence with, Miriam Donaldson, Tom Wehmeier (partner, head of insights, Atomico), Camille Ricketts (investor, XYZ), Herko Coomans (international digital health coordinator, The Netherlands Ministry of Health, Welfare and Sport), Dominic Cushnan (director of AI, NHS England), Karim Palant, Pat Walters (chief data officer, Relay Therapeutics), Ralf Molitor (managing director, Helsana HealthInvest), Alex Shipps (digital strategy coordinator, MIT CSAIL) and Danil Mikhailov (executive director, data.org).
Financial support for our work has been provided by the Gordon and Betty Moore Foundation.
tracking our impact
With our data insights we aim to drive change in the decision making, both personal, assisting in charting one’s own career pathway, and organisational, influencing workplace design, making them more data-driven.
Below you can read about a few examples of projects and decisions that your input will catalyse. We chose to highlight them for two reasons:
1. to emphasise that your effort is not done in a vacuum. It will meaningfully contribute to improving the work lives of people like yourself across the globe.
2. to inspire and assist you in joining or designing similar (or completely different!) efforts within your own local environment.
We fiercely believe in the bottom-up, organic creation of initiatives, driven by the practitioners for the practitioners, supporting each other in creating the work we will look back on with immense pride and joy.
Our impact award
To continue our focus on how our data insights are used across the globe, we will be launching the applications for our impact award in the near future to acknowledge, big and small, the most creative and thoughtful usages of the data insights we produce.
Guys’ and St Thomas’s NHS Trust’s Clinical Scientific Computing Team: choosing a meaningful job title; what does an AI Fellow do in a hospital?
Haris Shuaib, head of CSC team, writes:
Guys’ and St Thomas’s NHS Trust’s clinical scientific computing team is the first dedicated medical AI team that has been created in the UK healthcare system, founded in January 2021.
I was initially working as an MRI Physicist in the MR Physics team but increasingly leading clinical scientific computing activities prior to the formal founding of the team, largely due to my experience and interest in image analysis and machine learning.
Guy’s & St Thomas’ NHS Foundation Trust is an NHS foundation trust of the English National Health Service, one of the prestigious Shelford Group. It runs Guy's Hospital in London Bridge, St Thomas' Hospital in Waterloo, Evelina London Children's Hospital, two specialist heart and lung hospitals, Royal Brompton and Harefield and community services in Lambeth, Southwark and Lewisham. Together GSTT has over 2.6m patient contacts per year and £2.8billion in annual turnover.
As of October 2022, our clinical scientific computing team has 17 members, 53% of which are women, both numbers which we are very proud of. The mission of our team is to develop people, platforms and policy for digital health, including the creation and deployment of new software within GSTT and NHS.
After overseeing 50+ AI development and deployment projects across the London area, I noticed that we were lacking doctors with AI knowledge to co-design and supervise the deployment of novel technologies within their local environments. As a part of CSC’s mission of developing people, 2 years ago I proposed to launch a world first, clinical fellowship in artificial intelligence, moving away from research and development to involving clinicians in deployment of AI and data driven technology. After a successful launch of 2 cohorts, providing fellowships to 27 doctors in total, the programme is now being expanded nationally across the UK.
What is the role of our data insights?
I wish this data resource existed when I started building our team. Here is an example:
Recently we onboarded a new team member, a fantastic clinician who will be joining our team for 50% of her time. In our conversations we found it challenging to find a job title that would meaningfully describe to her family what she does when she is not a doctor and what impact that position has on the healthcare system. What does an “AI Fellow” do in a hospital? How do you explain that to your family when you are a doctor? A (similar) quintessential question many engineers have asked themselves. I believe our conversation is not an exception and there are many more similar ones occurring across the globe.
For me as a team leader it’s important to have better data for future conversations like these. The development of cutting-edge AI technologies that touch millions of patients starts with a single conversation like this one and it’s important to get it right.
Roche: establishing a grassroots data science network to foster talent and form impactful connections
Ryan Copping, VP of Computational Catalysts at Genentech, writes:
Roche Advanced Analytics Network (RAAN) is Roche’s internal data science community that currently has over 1,600 members across 40+ Roche countries and a variety of divisions across the Roche Group.
A small group of passionate volunteers formed the RAAN in 2017 (using their free time!) with the intent of it being a grassroots effort to break down silos across the organisation and ultimately:
(1) Connect and empower our talent who have an interest in advanced analytics
(2) Foster knowledge sharing and continue to develop our expertise across the organisation, and
(3) Impact our research, business and patients by creating actionable insights from data.
We originally hoped to get maybe ~50 team members to sign up, but after organising our first few activities including a “RAAN Day” (attended by our CEO to show his support!) and an internal Kaggle-like data challenge (using RWD to build a prognostic model to predict likelihood of 1 year survival in aNSCLC patients) we quickly had about 700 people sign up within the first year and it has grown year on year since then!
It was really amazing to see that people were passionate about the subject matter and also really enjoyed getting involved in activities (with the data challenge being particularly successful – people enjoyed the fun competitive nature of it!) so we decided to continue with the same formula.
Each year we have been asking for volunteers to help run RAAN and organise activities (no one has a “job” to do this, it's all volunteers) including our annual RAAN Festival which spans a few weeks in Q1 each year and usually has ~90 volunteer led sessions including workshops, discussion clubs, presentations, poster sessions and even external speakers. We also initiated a RAAN intern program where we have so far hired ~ over 100 interns across the Roche Group working on some of our Advanced Analytics data and challenges (note: some of which have since joined the organisation permanently). During COVID we switched to doing more virtual activities which has been really successful and helped increase the inclusivity for those at many different locations.
Overall, from a personal perspective, being in RAAN has been one of my most satisfying career highlights over recent years. It’s been amazing seeing how people who are passionate about a topic are willing to come together across functional boundaries and collaborate – my personal highlight is that folks often mention to me examples of where connections formed through RAAN have helped them from a personal growth and career development perspective.
All this was created by a passionate bunch of volunteers – it has never had a top-down mandate which I think is pretty cool.
I joined Roche in 2003 as an intern in the Biostatistics group in Roche UK and have been here ever since! I have always been in the “data science” space but have been lucky enough to work in different parts of the organisation (including the clinical trials, real-world data, personalised healthcare and now research domains) as well as at four different Roche and Genentech locations (UK, New Jersey, NYC and now San Francisco).
I feel really privileged to have worked with great people, to have had the opportunity to learn & develop and ultimately to work on a really important mission for our patients.
What is the role of our data insights?
RAAN’s goal is to connect and empower Roche’s advanced analytics community through a variety of activities from events and training, annual Roche-wide data challenges (that attracted over 500 participants in the past), hosting dozens of interns to sharing infrastructure and tooling for the advanced analytics work.
Our bet is fundamentally on our internal talent which is why we are continuously seeking ways to further empower and grow them through a variety of mechanisms. They are the core engine fueling our ambition to be the leader in advanced analytics in pharma to bring impact to our patients.
For an individual, one of the key areas is of course career planning: **how do I reach the next career milestone and what is it? **We are working towards highlighting the career journeys of some of the members of our RAAN community as examples of what is possible.
The data insights produced through this project will serve as a great complement to the stories that we are composing and in years to come become the data lighthouse for people getting ready for their next most ambitious career jump, hopefully with us in our RAAN community.
behind the craft
If you are thinking: “I have so many questions!”, we have built this section just for you.
If you can not find an answer to yours below, please send us an email and we will do our best to provide you with a substantive answer.
"Do I belong to this group of people? I only do..." YES!
We have received this question a significant amount of times, in many shapes and forms.
This is why one of our core goals in the first year is to create a sense of belonging within this deeply heterogeneous group of people working in biomedicine and healthcare.
No matter your job title, which computing tools that you are using, be it a Jupyter notebook, Excel sheet or something else, we are working to create a welcoming environment where you feel like you belong. This is one of the reasons why we are using the phrase “people who design, develop or apply data science, statistical and machine learning methods within our healthcare and biomedical ecosystem” and not "a data scientist" and why its usage at the very top of our website is crucial to the success of this project.
We are at an exciting junction of our ecosystem development; there is a plethora of new jobs and their corresponding titles emerging. One of our main analysis goals in the first year is to be able to quantitatively describe them and showcase how diverse they are. Finally, job titles are also a deeply personal, emotional matter that may touch on the identity of each of us, an element that should not be discontinued in its importance within this discourse.
If you are feeling a bit self aware, uncomfortable or perhaps “not there yet”... could we convince you otherwise? :)
If after reading through our website there are still some doubts left on whether you should be filling out our survey, please drop us a note, we would love to read more about your work and answer any questions you may have.
Why? Project motivation
This project started in the aftermath of our inaugural machine learning summer school in healthcare and biosciences, held at ETH Zürich in August 2022.
While analysing a plethora of survey data we collected at our event, we noticed a significant information gap between senior leaders we invited as our lecturers and our participants, the majority of them at the beginning of their career journeys.
You can read more about our data analysis as well as our approach towards receiving and acting upon critical feedback on our lessons learned 2023 page.
The information gap we identified is composed of a variety of elements, such as the value of sharpening one’s communication skills and the definition of “machine learning” extending beyond the data modelling step of the pipeline. Below are some examples of participants’ reflections we received in 2022 and 2023:
“In my opinion, communication sessions were pointless and totally useless... Maybe it would be a good idea to throw them away and free some time.”
“I felt like time was allocated poorly such that we spent too much time on what I perceived as less useful things (communication workshops…) when we didn't have enough time for technical tutorials.”
"The communication workshop was a surprise, in the sense that I found it useful to reflect about myself and my communication style especially in how I relate with my supervisors."
“...while it is nice to work with real-world data, it did not teach me much as, the first tutorials we only cleaned data.”
On the other side, Sebastiano Caprara, head of digital medicine unit at University Hospital Balgrist and our Bumblekite MLSS 2023 lecturer writes:
When transitioning from my research role to my current position, the primary objective of our unit became bridging the gap between researchers and IT. Initially, I had limited knowledge of the hospital's complex IT systems, which include a mix of on-premise and cloud solutions across various network zones, and hospital IT processes. The IT department's primary focus is ensuring hospital operations, with research and digitalization as secondary concerns. To familiarise myself with these intricacies, I actively participated in meetings and technical sessions, listening to pain points and challenges. Now, I'm one of the few individuals in my organisation who can effectively communicate between research and IT. This skill proves indispensable when integrating new applications, developing tools, or upgrading infrastructure, as it ensures a mutual understanding of goals and existing processes. I continuously strive to find optimal solutions, though it often involves compromise, and I engage in this work on a near-daily basis.
See the gap?
After reading these reflections, we seeked to put ourselves in the shoes of our participants and asked ourselves:
could we point them towards a dataset that describes how a professional day-to-day of a person working in the healthcare and biomedical ecosystem looks like?
In our quest to find a dataset that addresses this gap, which failed, this work was born. Moreover, in the preliminary conversations with leaders across the globe, we were excited to find a significant appetite for these data insights to be generated and disseminated across our ecosystem.
Main goals of our inaugural year
We see this work as a multi-decade long project. We are at its very beginning.
Our main goals of our inaugural year are the following:
1. create and nurture an inclusive community with a strong sense of belonging
Gather the global healthcare and biomedical community around this work and create a convening centre for the decades to come.
2. establish the critical importance of this work
Establish the importance of this work by, among others, describing specific challenges, initiatives and projects in different types of organisations across our ecosystem that our data insights will support.
3. produce an original and unique set of data insights
Produce a preliminary set of data insights and prepare for the second year of our ambitious endeavour.
4. create an environment that invites critical feedback
Create communication avenues to invite and receive critical and constructive feedback that will inform the directions in which we will evolve this work in the near future.
While being very proud of the amount of effort, time and craftsmanship that has been invested in this work, we are equally excited about the space to improve and grow it from where it is today.
If interested in reading more about our work in this space, we invite you to read our lessons learned 2023 where we summarised the feedback we received for our machine learning summer school in healthcare and biosciences that informed the 2022 / 2023 evolution of our flagship community event.
What's next? Project timeline
Sept 2022
We kicked off this work by analysing the data obtained at our inaugural machine learning summer school in healthcare and biosciences, held in August 2022 at ETH Zürich, Switzerland.
Sept 2023
We introduced this project to the public on the 7th of September at the plenary session of the International AI Summit in Health Care in D.C., organised by the U.S. Department of Veteran Affairs’ National AI Institute.
Our session garnered the accolade of being the most viewed one of the Summit, among a deeply impressive lineup of speakers, including the VA leadership, Pulitzer and MacArthur award recipients and many others, in front of an audience of over 1,000 attendees.
Some of our favourite impressions from the Summit were the ambition and enthusiasm expressed by the VA's clinical staff in attendance to be in the driver's seat, co-creating and co-designing these technologies and seeking new educational opportunities in the data science and AI area. We left D.C. feeling inspired.
November 2023
Our website is launched!
spring 2025
We expect to publish our preliminary set of data insights.
Data analysis questions and themes
In hundreds of conversations we have had over the course of the months of building this project we have received magnitudes more of questions than what we could address at the very beginning, in the very first year of this work. This speaks to the breadth and depth of the information gap that we aim to close, enthusiasm of our community, many potential uses of the data insights and the impact we hope they will have.
There are two main concepts we would like to quantify in our inaugural year:
1. Size of the opportunity in front of us
What is the % of the workforce in our sector that “people who design, develop and apply AI and data science in health and biomedicine” currently occupy? How fast are we growing?
2. Interdisciplinarity and inclusiveness of our ecosystem
What are the educational backgrounds and sectors from which people come from? The teams around us are multidisciplinary and so beautifully diverse. How do we quantify this?
Describing these two concepts in a quantitative manner will set the data foundation on which we will continue building in the years to come.
Data privacy-first: "nothing about me without me"
One of our core values is privacy. We invite you to read our thoughts on it in our About section. Our collaborators at Apollo Hospitals have a saying that we adore, that embodies our thinking on this topic perfectly: "Nothing about me without me".
When choosing the data sources from which we will compute the data insights from, we were deeply committed and intentional about choosing those where you as a user opt into putting your data into the product. Though this limits the amount of data partners that we can consider partnering with, this is also the only choice that aligns with our thinking about privacy, behind which we are proud to stand.
We believe this is also the only sustainable and substantive way to build trust with you and the rest of our community.
Where is the data stored?
The data collected by the survey respondents is stored on FOUR’s Google Drive, on servers located in Europe.
Is my survey submission linked to my LinkedIn profile?
No. Our data partnership with LinkedIn is technically separate from the survey data that we are gathering.
Our data partnership with LinkedIn
Casey Weston, senior manager, Public Policy & Economic Graph at LinkedIn writes:
LinkedIn is excited to support this landmark exploration into data science careers within the healthcare space through our Data for Impact program. We hope our insights around AI skills, healthcare career pathways, and gender equity can help inform data science professionals, employers, and policymakers about this critical and quickly growing ecosystem.
LinkedIn has signed an agreement with the U.S. Department of Veterans Affairs to collaboratively support this work by sharing aggregated, anonymized insights on the state of applied AI and data science in healthcare and biosciences.
This work occurs via LinkedIn’s Data for Impact programme, through which LinkedIn collaborates with a number of government and multilateral partners including the World Bank, OECD and the European Bank.
Research methodology
About our survey
Our survey includes demographic questions, questions about your daily work, your organisation types, questions asking for your opinions on what could bring you more joy at your workplace, and questions asking for feedback on the survey itself.
All of this input from you is anonymous.
There are 3 types of questions included in our survey: multiple choice questions, scaled questions and open-ended questions
Analysis, validity & reliability
Both within our team and with our collaborators, we ensure that we are maximising the validity and reliability of our results.
- Triangulation is used to combine quantitative and qualitative responses.
- Our analysis results are complemented by and compared with LinkedIn data.
- The data cleaning and analysis processes are cross checked by different researchers in our internal team.
- The wording of our survey has been examined and advised by experienced researchers from our partner organisations, so that the options provided are inclusive, considerate, relevant, and accurate.
- To ensure the responses received are rational, we reverse-phrased certain questions to eliminate outliers.
- We are transparent about data protection and the purpose of the survey, so that you could feel safe providing your views on your work life.
Data analysis example
Here is an example of how the questions will be analysed: after comparing the participants’ previous industries (question 6 of “About Your Work”), chosen from a list provided by LinkedIn, we will run correlation tests for questions such as “What are the top reasons that people transition from different industries to the healthcare ecosystem (question 7 in ‘About Your Work’)?”
Questions 6 and 7 are asked together because we want to shed light on not only where you come from, but also what you value the most in your workplace. Then, we can find out if these attractive points are significantly correlated (p>0.05) with factors such as your organisation type, its size, and your most and least favourite responsibilities etc.
We will use aggregated, anonymized insights from LinkedIn to cross-check our findings and explore how those insights can supplement our research findings. E.g., what are the most listed skills for different job titles, do any of these skills, or the missing of, make a significant difference on individuals’ happiness levels at work?
These results are important because we want to provide valuable information on an individual level for you and especially people who are thinking of joining the healthcare ecosystem to work with machine learning and data science methods. With our results, we hope you can find out which responsibilities and organisations are most likely to be aligning with what you value the most at work, e.g., more intellectual stimulations.
Evolving the HOW, tracking our impact
In addition to our unique and impactful set of data insights that we aim to produce we would also like to propose to evolve how this type of work is being done in a number of, for us, meaningful and substantive ways:
A focus on consent and privacy, “nothing about me without me”
The survey input form, as well as the hyperlinks that are shared from us to our community to spread the word about this work do not contain technologies that could provide us data on survey completion percentages, link click rates, correlations between the two and similar. Unless you explicitly decide to provide us with the information on e.g. where did you hear about the survey, we do not think it is our right to collect this information without your consent.
Our data insights will be freely available on our website once launched. We do not intend to ask you for your email address in order to access them.
Our bet on the web: no reports in a pdf format
Though we are deeply appreciative and respectful of the historical efforts that were made into creating reading materials of meaningful lengths, we do not believe this format is the one that our audience has or will have significant engagement with. Our data insights will be available on our website towards the end of 2024.
Co-creation and design vs. consumption: tracking our impact
We believe there is ample opportunity in creating data insights to inspire and provide useful information for you and your organisation to continue creating, building and designing vs. to capture your attention to passively, briefly and superficially consume the published data insights content.
This is why in the near future, we are excited to launch our inaugural impact award, whose goal will be to reward the most insightful, creative and impactful usages of the data insights that we aim to share as well as thoughtful feedback on what was missing from our data insights for them to be used. No matter how small or large your decision was, we want to hear about it!
Our impact tracking work has been inspired by the pioneering work The Markup, a nonprofit newsroom focused on data-driven journalism, has done in this area.
Editors’ choice: data without recommendations
While we are focused on our mission, we are flexible on how we collectively get there. Healthcare and biomedical organisations and we as individuals that aspire to or already work in them exist in a wide variety of legal and cultural frameworks. Moreover, healthcare is fiercely local. What works for some of us, will simply not work for others. We believe this is one of our strongest features, not a bug. :)
This is why as we gear towards publishing our data insights, they will come without any accompanying recommendations on how to use them. To celebrate the beauty of the heterogeneity of our approaches, we look forward to reading more about the future data usages and highlighting them through our impact award.
This is also one of the reasons why we choose not to engage in creating a public ranking of organisations working in the healthcare and biomedical area.
Into 2025... interested in joining?
We are always looking for thoughtful organisations and leaders within them that are building new tools, programmes and initiatives to better the work lives of those around them, their organisations, ecosystems and countries.
Due to the outstanding enthusiasm of the global community since the very inception of this project, we are looking to grow the number of organisations we work with in the near future, as we prepare to launch the 2nd iteration of our survey and its corresponding set of data insights: join us!
wall of joy
We asked you what brings you joy at work. Here are some of your responses.
“The secret of joy in work is contained in one word - excellence. To know how to do something well is to enjoy it." - Pearl S. Buck
Knowing that by spending time working with hospital staff and understanding how they use the machine learning models I build, I am contributing to a world where more people have access to world class health care and financially-sustainable health systems in their country.
Generate a new hypothesis from data that triggers a new research or program direction. Or transforming the hype around a new technology in useful, impactful solutions.
Doing work that is likely to improve the lives of patients.
I like finding out new things about our world with my research, how we are kept alive everyday by our bodies and environment and what makes us exist. I also like transferring knowledge and to see my team succeed at work and in their personal development.
At some point, I found what I'm doing can really help to improve people's life quality. I can always learn new things through collaborations with partners from different backgrounds (technical, cultural).
Interdisciplinary teams make the dream happen. The initial effort to get everyone on the same page is definitely worth it!