Charting the Course: Signpost AI Research Roadmap

Overview

The increasingly larger language models (LLMs) being trained on increasingly larger quantities of texts have been at the heart of the accelerated progress in Generative Artificial Intelligence (GenAI). The release of ChatGPT in 2022 opened the gates on a plethora of Generative AI tools, being released at breakneck speed. There are great hopes for this technology with McKinsey estimating economic benefits of around $6.1 to $7.9 trillion when applied to knowledge worker activities around the world [1]. 

Similar forecasts have abound in different realms; in 2025, 30% of outbound marketing messages from large organizations will be synthetically generated. By 2026, Generative AI will automate 60% of the design effort for new websites and mobile apps. Over 100 million humans will engage robo colleagues (synthetic virtual colleagues) to contribute to enterprise work. And by 2027, nearly 15% of new applications will be automatically generated by AI without a human in the loop, up from 0% today. By reducing the workload of users by at least 20%,Generative AI tools can transform and boost performance across different humanitarian functions. [2] Realities of adoption pains, relevant use-case implementation and a larger questioning of outsized Generative AI recently have tempered this optimism. For example, Goldman Sachs has noted that for all of the outsized investment on AI in the last two years, there has not been substantial tangible benefits yet [3].

The history of technology teaches us that some technologies take time to mature and find their broader use-case (e.g. internet, personal computer, electricity). That may be the case with Generative AI. As GenAI technologies evolve and become more accessible, their influence is expected to permeate an ever-expanding range of applications, transforming how we work, communicate, and create across the digital landscape.

The Humanitarian Context and Signpost AI 

Generative AI presents an opportunity to give power and impetus to humanitarian efforts. Generative AI tools like Large Language Models (LLMs) are essentially language models and hence hold immense potential to enhance communication, and provide vital support to vulnerable populations.

Specifically, it offers a unique opportunity for Signpost to evaluate the technology from a humanitarian perspective. Signpost is the world’s first scalable community-led information program. Since 2015, it has become the largest such service in the aid sector by leveraging cutting-edge technology to reach vulnerable communities wherever they are, empowering them to understand their options, solve problems, make decisions for themselves, and access vital services. One of Signpost’s key services is the provision of crucial and timely information to those in need. 

This is the specific use-case where Signpost AI is attempting frontier development and research, to openly, transparently and ethically develop country-programme specific chatbots (Signpostchat) or as they will be termed here collectively, Signpost AI agent technology. There is an increasingly unmet information need and the development of Signpost AI agent  technology is to scale vital information provision in a safe, ethical and manner. Signpost AI’s approach in developing this technology are firmly rooted in humanitarian people-centered and do-no-harm considerations: 

  • Ethical considerations are foundational to how we create our products. They are omnipresent in all of our technical, evaluative and quality decision-making processes. This is to ensure our AI portfolio is safe,  equal, human-centered, and does no harm goes here

  • We are dedicated to documenting and sharing all aspects of our AI work through blogs, case studies and research papers. This would include disseminating technical process documentation, AI Impact Assessments, Decision-making processes and Ethical frameworks, etc. This extensive documentation serves not only as a guide for partners but a process philosophy that ensures our AI is open, accountable and trustworthy.

  • We are committed to providing insights grounded in rigorous research, analysis and empirical evidence.  This emphasizes our dedication to using sound scientific methods to ensure effectiveness, competence and credibility of our information products and services

  • We believe that to utilize AI solutions in the humanitarian space positively requires partnerships and collaborations, based on inclusion, mutual knowledge sharing and production. These collaborations include a range of important stakeholders including communities, humanitarian organizations, academic research institutions and technology partners.

You can read in detail about Signpost AI and its visions and principles here.

Signpost AI Research Roadmap

This Research Roadmap offers an outline of research priorities and sub-priorities related to a praxis-grounded development of the Signpost AI agent technology. Together the priorities underscore the need for focused research to advance humanitarian understanding of how to develop ethical, effective and impactful Generative AI tools by foregrounding humanitarian-principles and approaching the larger question of Generative AI as one to be solved by the humanitarian community. Our fundamental research question for the roadmap is as follows:

“How can effective, inclusive and sustainable Generative AI technology be purposefully developed to build access to critical information for people in crisis in a collaborative, transparent and ethical manner in the humanitarian sector?”

The research roadmap highlights that Generative AI, its development and adoption are not just technological challenges; they are also challenges of accountability, transparency, decision-making, and data governance. 

By articulating our larger and smaller research questions here, this roadmap also seeks to provide a framework for other organizations looking to evaluate whether Generative AI technologies are fit for humanitarian purposes. 

The fundamental question to be answered is divided into the following three research priorities, each priority containing a set of 12 sub-priorities with each sub-priority with its own set of specific research questions (54 questions at the time of writing):

  1. Demonstrating Ethical Efficacy and Impact

  2. Ethical and Responsible AI Leadership

  3. Enabling Partnership, Scale and Sustainability

You can see a bird’s eye view of the research roadmap below:

This map of research questions is what guides the research for the Signpost AI Agent technology development. Taking a flexible approach, the research agenda is structured in two ways: there are comprehensive research documents dedicated to a single research question and there are multi-topic documents which answer a range of research questions presented here. The key is to ensure together that the corpus of research published here gives a cohesive, actionable and comprehensive response to our fundamental question which advances the humanitarian community’s understanding on the topic.

This is a living document; as the Signpost AI Generative AI agent technology development comes along, research priorities and questions will inevitably be modified, changed, added to or subtracted from to reflect happenings on the ground. 

The results of this research are being openly published on signpostai.org and will be content of the following types: blogs, novel research, research reports, collaborative research, case-studies, technical and process documentations, etc. 

This is just one approach Signpost AI is taking in being transparent about its development, Quality evaluation methods, project management, Red Teaming and decision-making processes. It is meant to be an open view of what worked, what did not work and what are lessons that can be learned from this experiment in developing a humanitarian-aid specific AI tool.  

Who is this guide for?

This Research Roadmap is a guide that can be used by organizations in the humanitarian aid sector, looking to trial Generative AI technology in their offerings.  Organizations can use the guide to see where productive research partnerships can be forged to produce co-authored think-pieces, case studies and reports. For example, topics such as Ethical and Responsible AI best practices, and Primers on Technical Safety and Evaluation Methodologies are crucial knowledge needs in the Humanitarian context for all of the aforementioned stakeholders.

By transparently detailing current and future research endeavors, it is an open invitation for dialogue and collaboration with partners in academic/industry research institutions, humanitarian organizations and technology companies.

Objectives of the Research Roadmap

  1. Ensure that the development the Signpost AI agent technology is informed by the best available technological, ethical, and social research

  2. Produce an evidence-based blueprint for innovating safely, responsibly and effectively on community-based information-based services in the humanitarian space

  3. Provide Guidance and Lessons Learned on Ethical and Responsible AI 

  4. Support current and future practical implementations of GenAI Technology in the humanitarian sector

  5. Enable inter and cross sector collaboration with humanitarian, technology and policy partners

Main Research Priorities

  1. Demonstrating Ethical Efficacy and Impact

What are the technical processes of software engineering, evaluations, implementation, and impact assessment in the making of the Signpost AI Chatbot? How do you structure ethical and practical decision-making in these processes?

  • What are best practices, processes and decision-making frameworks for designing AI development and engineering systems?

    1.1.1. Non-technical Explainer: How does the Signpost AI Chatbot work? 

    1.1.2. Architecture: What is the chatbot model architecture and infrastructure? 

    1.1.3. LLM Selection: What is the framework for selecting an LLM? What factors were prioritized for decision-making?

    1.1.4. Safeguarding Outputs: What are our Constitution AI rules, Local and System Prompts? How do we use them to ensure reliable and safe outputs?

    1.1.5. Data Infrastructure Safety and Privacy: How are data pipelines  configured and customized for internal/third party data ingestion, integration, processing, safety and privacy?

    1.1.6. Integrating User Feedback: What processes and practices are used to integrate user feedback into the iterative development cycle of the AI chatbot? 

    1.1.7. Challenges: What are AI agent technical development implementation challenges in the context of Humanitarian use-case? What are lessons learned?

    1.1.8. Generative AI Limitations: What are known and unknown technical limitations of current Generative AI technologies?

    1.1.9. Alignment: How do you align datasets, content, and models to user-AI interactions in real-world contexts?

    1.1.10. Technical Infrastructure: What technical infrastructure and resources are needed to deploy the Signpost AI agent technology in various settings, including low-bandwidth or offline environments?

    1.1.11. Infrastructure Stability: What are the decision criteria (cost, expertise, etc.) for committing to stable infrastructure?

    1.1.12. Human-AI Collaboration: What are important considerations in deciding Human-AI approaches (human-in-the-loop, no-human-in-the-loop, humans-across-the-loop) when developing AI products?

  • What are Signpost AI Quality and Red-Team processes? How are they integrated into the AI development lifecycle to ensure safety, trustworthiness and alignment with user needs?

    1.2.1. Quality: What is a quality Signpost AI chatbot response? Why? 

    1.2.2. Red Teaming Explainer: What is Red Teaming? How does Signpost use red teaming to evaluate and iterate a secure, controlled, high performing AI agent technology?

    1.2.3. Insights and Learnings: What are key insights and learnings from quality assurance and red teaming activities that can  be effectively incorporated into the iterative improvement and refinement of Signpost AI agent technology?

    1.2.4. Translating Human Guidance: How do you adapt guidance for humans to guidance for AI agents?

    1.2.5.Humanitarian Work Practices: How do AI agents change the way humanitarian workers operate, collaborate, and make decisions? What are the qualitative benefits or challenges they experience?

  • What are best practices for AI system evaluation in a humanitarian and context-specific use case? What are associated challenges and limitations? How do you measure impact across users, and staff?

    1.3.1. Testing and Evaluation: How do you conduct a GenAI System Evaluation in a humanitarian and context-specific use-case?

    • What are the domain-specific evaluation criteria and human assessment frameworks?  

    • What are associated test sets, controls and metrics? 

    • How do you ensure human evaluation judgements are similar?

    • What are baseline measures?

    • What are acceptable results? 

    • What are recommended practices for creating effective test datasets?

    1.3.2. Cross-Functional Evaluation: How do you conduct cross-functional quality (quality and red team) evaluations to ensure agent responses are trauma-informed, client-centered and safe? 

    1.3.3. Performance Testing: What are performance (response time, scalability, load and stress testing),  security and usability testing metrics? What are acceptable thresholds?

    1.3.4. Evaluation Challenges: What were the main challenges faced with test sets, metrics and human assessment? What are lessons learned?

    1.3.5. Mitigating Harms: How do you mitigate potential AI related harms in practice during the development and testing phases?

    1.3.6. Bias and Fairness: How can we continuously evaluate and mitigate biases in agent responses? 

    1.3.7. Benchmark Comparison: How does our AI agent compare to existing communication or information-sharing tools and approaches used in humanitarian responses? What unique value does it offer?

    1.3.8. Measuring AI Agent Impact: How can we measure the agent's tangible impact on crisis-affected populations? What metrics (e.g., lives saved, time saved, improved access to information) can demonstrate its effectiveness? 

    1.3.9. Cultural Sensitivity: How can we ensure the agent responses are culturally sensitive and respectful, avoiding unintentional offense or misunderstanding? 

    1.3.10. Online Evaluations: What are best approaches to developing online evaluations for AI products in the humanitarian context?

    1.3.11. Automated Evaluations: At what point is implementing automated evaluations feasible? 

  • How does project management operationalize technical development needs with ethical principles and values? How is this operationalization reflected in everyday decision-making processes?

    1.4.1. Project Strategies: What are effective project management strategies for cross-functional team performance, timely product delivery and aligning ethical and operational concerns? 

    1.4.2. Staffing Model and Skill Sets: How can staff be structured for this undertaking? What are the necessary skill sets and expertise required?

    1.4.3. Staff Upskilling: What are the potential positive and negative implications of SignPost AI on staff workflows, quality of work and job security?

    1.4.4. Cross Functional Collaboration: What are recommended SOPs for creating effective cross-functional workflow between different project teams?



2. Ethical & Responsible AI Thought Leadership:

"How does  Signpost AI work to make its AI product adhere to the highest values of ethics, transparency, explainability and responsibility, while balancing it with Generative AI’s short and long term considerations in the humanitarian aid sector?

  • What is the SignPost AI Philosophy?  How do its core mission, values and objectives align with an Ethical and Responsible AI approach to provide crucial information services that are accountable, equitable and “do no harm”?

    2.1.1. Signpost AI: What are the fundamental values of SignPost AI? 

    2.1.2. Ethical and Responsible AI: What are key principles for developing Ethical and Responsible AI services in the humanitarian aid sector?

    2.1.3. Trade-offs: What are the trade-offs associated with implementing GenAI for information access in the humanitarian aid sector?

    2.1.4. Transparency and Explainability: What are ways in which Signpost communicates technical, ethical and decision-making transparency and explanability to both users and stakeholders? 

    • What are these communication forms (blogs, case-studies, technical documentation, etc) and channels?

    • Depending on the audience, what level of specificity is required?

    2.1.5. Lessons Learned: What are Signpost lessons learned for adopting Generative AI in the humanitarian sector?

    2.1.6. AI Risk: How does Signpost approach risk measurement, tolerance and prioritization?

  • What are the challenges, risks, and ethical dilemmas of using AI to provide services in  the specific use case of the humanitarian aid sector? What challenges, risks and dilemmas are specific to Signpost AI?

    2.2.1. Challenges and Dilemmas: What ethical challenges and dilemmas surround Generative AI technology?

    • How have these topics affected the chatbot's development, testing and piloting? What is required to address them? 

    2.2.2. Risks and Mitigation: What are known limitations and risk factors of AI Generative technology? How can we mitigate them?

    • How do Generative AI risks differ from traditional software risks?

    2.2.3. Positive and Negative Pathways: What are long-term positive and negative implications of deploying Generative AI technology in the humanitarian aid sector for users, staff and society?

  • What are the best practices for fostering inclusivity,  community engagement and trust in Generative AI services?

    2.3.1. Community Engagement: How can we involve users, communities and local organizations in the design and governance of the AI agent to ensure it meets their specific needs and respects cultural sensitivities? 

    2.3.2. Inclusive Design: What design principles and features can make the AI agent accessible to diverse populations, including those with disabilities, low literacy levels, or limited internet access?

    2.3.3. Trust: What guidelines can be used for thinking about Generative AI and Trust?

  • What should humanitarians know about salient macro factors that can potentially impact Generative AI technologies?

    2.4.1. The Emerging AI Landscape: What are current regulatory challenges, economic drivers and societal pressures that are impacting Generative AI? How can AI be shaped by them?



3. Enabling Partnership, Scale and Sustainability:

How is SignPost AI building to be scalable and sustainable through research partnerships/collaborations, sustainability planning and sharing of AI expertise?

  • What and where are the opportunities for research collaboration in Responsible and Ethical AI? What are the profiles of good institutional, industry and sector partners?

    3.1.1. Research Topics: What research topics related to evaluating AI agents and developing responsible, ethical AI could provide promising opportunities for forming productive research collaborations?

    3.1.2. Identifying Partnerships and Collaborations: Who are complementary partners and collaborators to conduct research with and expand agent reach and impact within the humanitarian aid sector? What identification mechanisms are used?

  • What are the financial realities of using LLM-based chatbots in smaller organizations? What are short and long term options for sustainable use and future development?

    3.2.1. Financial Sustainability: What funding models and revenue streams are optimal for short and long-term sustainability of AI Chatbot projects in less funded organizations?

    3.2.2. Cost Benefit Analysis of LLMs: What is the cost benefit analysis of using LLM-based services in the short and the long term?

    • Fine Tuning: At what financial breakdown point does it become feasible to use different techniques such as fine-tuning?

  • What opportunities for scale would Generative AI offer Signpost?

    3.3.1. Scaling Information Provision: How can Signpost AI products enhance information provision and access to crucial services for underserved communities? What are potential drawbacks?

    3.3.2. Rapid Localization: What are the efficient and effective methods for adapting AI agent contexts and languages?

  • How can Signpost facilitate knowledge access, and adoption of its AI offerings and expertise in other humanitarian sector organizations?

    3.4.1. Knowledge Sharing and Research: What are ways of sharing Signpost's AI lessons, tools, knowledge and resources to cultivate humanitarian-sector wide collaborations and partnerships?

    3.4.2. AI Scaling in Humanitarian Context: How can a collective humanitarian effort effectively evaluate Generative AI use in the sector?

References

[1] Mckinsey. “Economic Potential of Generative AI.” Retrieved August 1, 2024 (https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier#business-value).

[2] Motalebi, Nasim & Verity, Andrej. 2023. “Generative AI for Humanitarians - September 2023 - World | ReliefWeb.” DHN. Retrieved August 9, 2024 (https://reliefweb.int/report/world/generative-ai-humanitarians-september-2023).

[3] Gen AI: too much spend, too little benefit? | Goldman Sachs

Previous
Previous

Signpost AI Agent Architecture, Infrastructure and Workflow

Next
Next

Navigating Generative AI at Signpost: Risks, Mitigations, Benefits and Trade-offs