We need ethnographic AI safety studies
Social Science Approaches to AI Governance

The Role of AI in Social Sciences
We need ethnographic AI safety studies.
Artificial Intelligence (AI) models are inherently socio-technical systems. At the very least, they are the product of selected human decisions embedded in physical infrastructures and have various potential impacts on a number of societal stakeholders. AI is shaped by the context of how it’s developed and by who, rendering it far from being impartial when it comes to the values or views it represents. However, there is a strong tendency to assume neutrality of the technical and to take it at “face value” with limited questioning of how it got to where it did. In this vein, AI companies and organisations usually restrict themselves to technical approaches and solutions when amending AI biases or predicting risks, overlooking how society and culture impact these decisions.
In contrast, we want to expand on the idea that “mathematical and computational approaches do not adequately capture the societal impacts of AI systems” (Schwartz et al., 2022) because they neglect the role of the human(s), and the diverse number of actors who design, interact and are impacted by AI technologies. We advocate for using ethnographic methods as a complementary critical tool for AI scholars to increase the accuracy of their safety evaluations, risk assessments and policy recommendations — a method that is rooted in the socio-technical study of AI systems, inquiring into both the social and technical aspects of their inputs and outputs.
Acknowledging cultural AI development
The work of Diane Forsythe (2002), a pioneering anthropologist of science, technology and AI, highlights modern practices in AI development. She studied the cultural nature of scientific knowledge and practices and the “taken-for-granted” assumptions of AI researchers. As such, she contested the neutrality of the ‘technical’, arguing that viewing AI development as such is characterised by the tendency of decontextualised thinking, preference for explicit models and the belief in a single correct interpretation of events, among others.
Since the wake of (Large) Language Models ((L)LMs), there has been an increased focus on the in-built biases in AI designs (Tamkin et al., 2023), echoing Forsythe’s (2002) concerns and recognising the kinds of knowledge silos that AI systems are developed in. These silos often come into tension with more individual or societal views.
Technical artefacts have political qualities due to the social or economic systems they are embedded in (Winner, 1980). Blackwell (2021:206) argues that “AI is shaped by culturally-specific imaginaries” of “cultural agents” involved in its production, such as the AI engineers creating algorithms or sales teams shaping the business narratives. By asking, “What would AI look like if it were invented in Africa?” (Blackwell 2021:204), he highlights the undeniable cultural forces shaping these emerging technologies that are too often overlooked in safety evaluations.
On a similar note, Anthropic published a study earlier this year on subjective global opinions dominating LLMs, finding that “[b]y default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases.” (Durmus et al., 2024:1). The authors concluded by calling for more “transparency into the opinions encoded and reflected by current language models” which “is critical for building AI systems that represent and serve all people equally” (Durmus et al., 2024:3). Embedding more diverse viewpoints and social awareness into model development and deployment may help mitigate cultural limitations of AI, especially when they are deployed globally.
More recently, Buyl et al. (2024:13) have studied the ideological views of LLMs by prompting them on prominent and controversial historical figures. The diverging outputs reflect the numerous normative human design choices made by the LLMs’ creators. The authors have additionally criticised the notion of “ideological neutrality” in the context of AI, arguing that “neutrality is itself a culturally and ideologically defined concept” (Buyl et al., 2024:13), calling thus for acknowledging the ideological diversity embedded into technologies, moving away from the discourse of bias. However, despite the noble-sounding idea of LLMs’ ideological diversity, the limited number of the current fully-operational LLMs presents a picture of ideological domination rather than pluralism. This outlook can lean towards tech-utopianism, wherein we are led to believe that technology — especially emerging technology — can solve all society’s challenges, without really considering the kinds of challenges that different societies grapple with.
The social forces shaping emerging technologies in particular directions are manifold and act on many occasions throughout the AI life cycle, which we visualise using the comprehensive Center for Data Analytics and Cognition (CDAC) model (De Silva and Alahakoon, 2022):
Figure 1. The CDAC AI life cycle: Three phases of (1) design, (2) development, and (3) deployment and 19 stages.
The model in Figure 1 depicts the large number of human choices that go into the design, development and deployment of AI models. From the beginning, that is the problem formulation, an AI model is a canvas for its inventors. However, what the diagram obscures is the diversity of actors engaged in each of the 19 stages, both in and outside of the lab settings.
The reason we highlight the embedded cultural baggage of AI, its creators and users, is that its implications are often neglected by AI safety studies while having undeniable impacts on the actual risk levels posed by these technologies. For example, in 2023, Microsoft's Bing chatbot, powered by OpenAI's GPT-4, exhibited unexpected behaviours like expressing a desire to be human and professing love for a user, blurring the line between the chatbot's limitations as a non-sentient entity. Depending on the user’s beliefs and cultural background, such interaction could have a myriad of negative consequences on individual- or community levels, such as causing psychological harm.
Gaps in AI risk assessment
We expect that AI safety evaluations and risk assessments inevitably underestimate the degree of potential negative societal impacts of AI models since they tend to omit the contextualised nature of AI systems and are conducted in narrow siloes that downplay the risks emerging in the trans-contextual deployment of AI.
Currently, the approach to safety evaluations is rather static and limited to the technical environment of lab settings (Ibrahim et al., 2024), rather than evaluating models in naturalistic settings to ensure satisfactory performance (Raji et al., 2022; Tamkin et al., 2023). That means, for example, that the psychological impact of AI models on human users’ emotions, cognitions and behaviours is often under-assessed (Ibrahim et al., 2024), and that the diversity of human-model interactions is under-captured, leaving gaps in the risk level predictions. Yet, an AI system inherently involves interactions between other systems, environments, designers or users that interact with the model (Leslie et al, 2024). By being too model-centric, static safety evaluations may miss the complexity of these contextual interactions (DeWitt Prat et al., 2024) and their unique risk factors.
Indeed, Blackwell (2021:198) argues that it is “human questions rather than technical ones” that turn out to be “most problematic” and these are equally present in and outside the lab. Therefore, the focus on humans in the AI lifecycle highlights the need for more dynamic and contextualised evaluations to better identify and guardrail against potential risks.
Figure 2 contrasts static and dynamic evaluations. Ibrahim et al. (2024) have written about “interactive evaluations” that “engage humans as subjects of the evaluation who either respond to or actively elicit model outputs through interacting with the model”. We draw on Ibrahim et al. (2024) concept, yet use a broader term of “dynamic evaluations” to include AI interactions not only with humans, but also with environments, and other AI agents.
How Evaluations Are Done | Static Evaluations (Technical Assumptions) | Dynamic Evaluations (Contextualized Needs) |
---|---|---|
Emphasis | Fixed benchmarks, standard datasets, and predefined metrics. | Adaptive methods; culturally aware and context-sensitive tools. |
Strengths | Reliable, repeatable, and easy to compare across models. | Real-world relevance; accounts for specific societal needs. |
Challenges | Limited adaptability; risks ignoring societal/cultural shifts. | Harder to standardize; may require region-specific data. |
Examples | BLEU scores for translation; accuracy on standard datasets. | Evaluations with local language fluency; ethical trade-offs. |
Figure 2. Differences between static and dynamic AI evaluations.
Multilingual AI Safety
Risks arising from multicultural and multilingual uses of AI models are more popular examples of the problematic “human questions”, yet ones that nevertheless still require deeper understanding and mitigation, given the sheer diversity in human cultures and languages uncaptured in the AI labs. The choice of strategy and language in which users prompt LLMs may lead to bypassing the lab-based in-built safety filters. For example, OpenAI’s GPT-4 has been documented to advise users on committing terrorism and financial fraud when prompts were translated into low-resource languages, like Zulu.
Similarly, Ghanim et al. (2024) studied jailbreaking LLMs in Arabic. Jailbreaking is the process of “circumventing AI’s safety mechanisms to generate harmful responses” (Yong et al., 2024:2). They documented how using Arabic transliteration and chatspeak (or arabizi), which is a Latin-based form of Arabic characters commonly used by young Arabic speakers, prompted OpenAI GPT-4 and Anthropic Claude 3 Sonnet facilitated the generation of unsafe content, even though the models were robust to Arabic in its standardised form. As a result of jailbreaking the LLMs’ English-based safety protocols, the authors call for “more comprehensive trainings across all languages” (Ghanim et al., 2024). Yong et al. (2024) argue that such cross-lingual vulnerability of safety mechanisms results from the linguistic inequality of safety training data. They document an increase in the probability of bypassing GPT-4’s safety filter from <1% to 79% when translating the prompts into low-resource languages and warn that translation-based LLM jailbreaking appears increasingly dangerous since the public access to translation APIs allows anyone, not only low-resource language speakers, to exploit the safety gaps. Finally, the authors call for safety mechanisms to adequately reflect that “low-resource language speakers make up around 1.2 billion people around the world.” (Yong et al., 2024:6).
The combination of LMs being more likely to generate false claims in low-resource languages (Bayes et al., 2024) with the limited understanding of the risky AI deployments in settings other than assumed by its producers, calls for more dynamic and contextualised evaluations. They could help gather a diversity of on-the-ground insights into the range of risk factors to feed safety red-teaming protocols. As such, they could contribute to timely mitigating negative societal impacts that risk being unidentified or underestimated due to the limitations of AI lab settings and the subsequent knowledge gap about human-agent interactions.
Contribution of ethnographic tools
Japan AI Safety Institute (2024:13) stated in its “Guide to Evaluation Perspectives on AI Safety” that the specific evaluation policy and methods for ensuring the safety of the socio-technical AI problems “will require further consideration.” In this part, we position ethnography as a relevant tool to consider for mitigating the potential negative impacts of AI systems on various societal stakeholders.
Ethnography can be described as a qualitative research method focused on in-depth case studies of humans in social contexts. Historically, it has been characterised by fieldwork data collection, where an ethnographer immerses themselves in the studied community for an extended period using an open-ended methodology and a high level of reflexivity, which acknowledges how the researcher’s orientations are shaped by their positionality and place within the research context. With novelties such as agent-time speed of processes and AI systems capable of imitating ethnographers, the exact definition of ethnography is challenged, yet its fundamental principles offer insightful contributions to AI studies.
Amidst the technical understanding of AI systems and growing attempts to understand their societal risks, there seems to be limited inquiry placed on the processes and practices that the “humans in the loop” are creating, both in the pre-and post-deployment stages. Unlike traditional top-down approaches, using ethnography as a process of inquiry in AI risk assessments provides a bottom-up, in-progress approach.
There is a need to better understand how different people deploy AI systems in their lives. The model of using a universal template of AI use cases that is transferable between societies and people, suggests that AI labs assume the differences in factors like cultural context, socioeconomic conditions, people’s attitudes towards technology, and existing technology infrastructure to be of low importance. Such an approach leads to a mismatch between AI solutions and the realities of their users, reinforcing inequalities and limiting the transformative potential of AI technologies. Ethnography challenges these assumptions by highlighting localized experiences and the unique ways people adopt, adapt, or resist AI systems in varied contexts.
Studying ethnographically the inside of AI labs (e.g. documenting the decision-making and goal-setting during the AI development stage) alongside studying contextualised AI deployments, could offer deep insights into the potential mismatches relevant for safety evaluators to take into consideration when testing models.
As such, ethnographic methods, which include participant-observation, interviews, note-taking, or documentary analysis, among others, present useful tools for dynamic, contextualised evaluations thanks to their unique characteristics:
Ethnographic Approach & Description | Contribution to AI Safety |
---|---|
Immersion in the Field – Active engagement in natural settings, capturing nuance and complexity beyond simple causality. | Improves realism in safety benchmarks, predicts harmful effects, and informs risk modeling (Rezaev & Tregubova, 2023). Helps reduce ecological fallacies in AI assessments (Krase, 2016). |
Observing the Said & Unsaid – Triangulating interviews with field observation and note-taking to compare what people say with what they do. | Reduces social desirability bias, enhances understanding of user-AI interactions beyond verbal reports. |
Reflexivity – Acknowledging researcher’s positionality and that no research is truly value-free. | Helps AI safety assessments by identifying biases in lab decision-making. |
Knowledge Accumulation – Capturing AI system deployment examples and preserving knowledge beyond just successes. | Informs AI development lifecycles, documenting both failures and successes. |
Open-Ended Inquiry – Conducting inductive, interactive data collection that remains open to surprises. | Strengthens risk assessments by uncovering unforeseen risks in AI systems. |
Figure 4. Examples of ethnographic characteristics that offer contributions to AI evaluations and risk assessments.
Ethnography’s contributions appear to be manifold and are relevant to various aspects of AI studies, such as AI design, safety and governance. For example, Rezaev and Tregubova (2023:11) studied the growing Human-Centered AI (HCAI) approach and argued that it “lacks clarity in what ‘human-centered/ric’ AI is and in what research methodologies are available to study it.” They also stressed that the numerous HCAI framings used by hubs spread across the United States and Europe present “a general (and relatively empty) term” (Rezaev and Tregubova, 2023:4). Given this HCAI ambiguity, ethnography could offer a relevant grounded methodology and framework for understanding the very humans who are to be placed at the centre of AI design and governance — where there isn’t a singular ‘human’, but a diversity of lived experiences that are accounted for.
Next steps
To present a more diverse picture of human influence on AI safety, we outline possible directions for further work in mitigating societal AI risks using ethnography:
-
More socio-technical research to improve the understanding of the undercaptured societal AI risks by lab-based evaluations.
Understanding further the nature and size of societal AI risks derived from multi-contextual AI deployment and the involved human-agent interactions could help inform the size and form of necessary safety actions to be taken by AI labs and governments.
-
Frameworks for systematically embedding ethnographic methods into risk assessments and AI safety trainings.
Developing a methodological framework for how and when ethnography should be used to timely inform AI risk assessments and safety trainings could facilitate a systematic incorporation of ethnographic tools into the AI lifecycle. The questions are not only about the use of ethnography per se but also about the translation of its complex data into model-based processes that have a simplifying tendency of human multidimensionality. -
Compiling ethnographic “implications for design” (Dourish 2006) in AI development.
AI developers would benefit from clarity on what implications the ethnographic work on AI has for their day-to-day work in creating safe and inclusive AI systems. Systematic reviews of existing participatory AI studies and future ethnographic work could offer insights for a more humans-centered, -owned and -proof AI design. For example, these takeaways could benefit the design of community-based models or AI fine-tuning for various groups and values.
-
Developing agent-ethnography tools.
Given the potential of AI agents to act as agent-time ethnographers when interacting with their users or other AI systems, it could be worth deepening the understanding of how such ethnographic data could allow models to fine-tune in real-time, both towards safety and a more pluralistic user experience.
Tractability and limitations
The potential ethnographic contributions to AI studies come with their challenges and limitations that it’s important to outline and adequately address.
Ethnography is “preeminently interpretive” (Schensul and LeCompte, 2012:320) and subjectivity is its intrinsic characteristic. Critics tend to disqualify the contributions of ethnographic methods, pointing out that such inherent subjectivity reduces the replication and validity of the ethnographic processes. More broadly, qualitative work has been recorded to unsettle the quantitative proponents who assume that “truth” can transcend personal opinion or bias (Carey, 1989; Schwandt, 1997) and who see complex empirical data as unnecessary, unreliable, and impressionistic (Denzin & Lincoln, 2017).
However, the very purpose of an ethnographic study is to focus in-depth on single cases, capturing the nuances and complexities that are often missed by the quantitative approach’s tendency towards uniform model fitting. In ethnography, replication is limited by design: the unique combination of the studied case, its environment at the time and the ethnographer is in fact irreplicable. What’s more, ethnographers do not control what happens in the field, as opposed to carefully planned experiments in lab settings, and they can only observe and document it (Schensul and LeCompte, 2012).
Yet, the limited replicability doesn’t mean that the credibility of ethnographic data should be disregarded. Ethnographic processes and products shall not be evaluated through the standards of quantitative approaches, because they have a different set of principles of inquiry altogether. Rather, ethnographic credibility and trustworthiness should be analysed through more relevant criteria (see Lather, 1986; Lincoln and Guba, 1990), such as participants’ judgement of the truthfulness of the ethnographic output or the degree of participants’ empowerment as a result of the study.
In sum, the value of ethnography for AI studies lies exactly in its stark differences from the more technical and quantitative methods, their assumptions and politics. Its role is to complement the existing AI safety and risk assessment work by grounding the lab-imaginings in on-the-ground examples and understandings of the human-in-the-AI-loop across various contexts. With this introduction to ethnography for AI safety studies, we hope to invite a more sympathetic view of ethnography among technical AI researchers, bring together a community of supporters for this interdisciplinary approach, and motivate meaningful ethnographic fieldwork to mitigate societal AI risks.
Reference list
Blackwell, A. (2021). Ethnographic artificial intelligence. Interdisciplinary Science Reviews, 46(1-2), pp.198–211. doi:https://doi.org/10.1080/03080188.2020.1840226.
Buyl, M., Rogiers, A., Noels, S., Dominguez-Catena, I., Heiter, E., Romero, R., Johary, I., Mara, A.C., Lijffijt, J. and De Bie, T. (2024). Large Language Models Reflect the Ideology of their Creators. [online] ArXiv, pp.1–35. Available at: https://arxiv.org/abs/2410.18417 [Accessed 11 Dec. 2024].
Carey, J.W. (2008). Communication as Culture. Revised ed. Routledge.
De Silva, D. and Alahakoon, D. (2022). An artificial intelligence life cycle: From conception to production. Patterns, 3(6), pp.1–13. doi:https://doi.org/10.1016/j.patter.2022.100489.
Denzin, N.K. and Lincoln, Y.S. (2017). Introduction: The Discipline and Practice of Qualitative Research. In: N.K. Denzin and Y.S. Lincoln, eds., The SAGE Handbook of Qualitative Research. SAGE Publications, Inc, pp.1–19.
DeWitt Prat, L., Lucas, O., Golias, C. and Lewis, M. (2024). Decolonizing LLMs: An Ethnographic Framework for AI in African Contexts. EPIC Proceedings, [online] pp.45–84. Available at: https://www.epicpeople.org/decolonizing-llmsethnographic-%20framework-for-ai-in-african-contexts [Accessed 11 Dec. 2024].
Durmus, E., Nguyen, K., Liao, T.I., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D., Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J. and Ganguli, D. (2024). Towards Measuring the Representation of Subjective Global Opinions in Language Models. [online] Anthropic, pp.1–30. Available at: https://llmglobalvalues.anthropic.com [Accessed 11 Dec. 2024].
Forsythe, D. (2002). Studying Those Who Study Us: An Anthropologist in the World of Artificial Intelligence . Redwood City, California: Stanford University Press.
Ghanim, M.A., Almohaimeed, S., Zheng, M., Solihin, Y. and Lou, Q. (2024). Jailbreaking LLMs with Arabic Transliteration and Arabizi. [online] ArXiv. Available at: https://arxiv.org/pdf/2406.18725 [Accessed 11 Dec. 2024].
Ibrahim, L., Huang, S., Ahmad, L. and Anderljung, M. (2024). Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks. [online] ArXiv. Available at: https://arxiv.org/abs/2405.10632 [Accessed 11 Dec. 2024].
Japan AI Safety Institute (2024). Guide to Evaluation Perspectives on AI Safety. [online] Japan AI Safety Institute , pp.1–40. Available at: https://aisi.go.jp/assets/pdf/ai\_safety\_eval\_v1.01\_en.pdf [Accessed 11 Dec. 2024].
Kgomo, J. (2024). AttackSpace. GitHub. Available at: https://github.com/equiano-institute/attackspace [Accessed 11 Dec. 2024].
Krase, J. (2016). Ethnography: Bridging the Qualitative-Quantitative Divide. Diogenes, 63(3-4), pp.51–61. doi:https://doi.org/10.1177/0392192117740027.
Lather, P. (1986). Issues of validity in openly ideological research: Between a rock and a soft place. Interchange, 17(4), pp.63–84.
Leslie, D., Rincón, C., Briggs, M., Perini, A., Jayadeva, S., Borda, A., Bennett, S., Burr, C. and Fischer, C. (2027). AI Safety in Practice. [online] The Alan Turing Institute. Available at: https://www.turing.ac.uk/sites/default/files/2024-06/aieg-ati-6-safetyv1.2.pdf.
Lincoln, Y.S. and Guba, E.G. (1990). Judging the quality of case study reports. Internation Journal of Qualitative Studies in Education, 3(1), pp.53–59. doi:https://doi.org/10.1080/0951839900030105.
Raji, I.D., Kumar, I.E., Horowitz, A. and Selbst, A. (2022). The fallacy of AI functionality. In: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. pp.959–972.
Rezaev, A.V. and Tregubova, N.D. (2023). Looking at human-centered artificial intelligence as a problem and prospect for sociology: An analytic review. Current Sociology, pp.1–19. doi:https://doi.org/10.1177/00113921231211580.
Schensul, J. and LeCompte, M.D. eds., (2012). Essential Ethnographic Methods: A Mixed Methods Approach. 2nd ed. AltaMira Press.
Schwandt, T.A. (1997). Textual gymnastics, ethics and angst. In: W.G. Tierney and Y.S. Lincoln, eds., Representation and the text: Re-framing the narrative voice. Albany: State University of New York Press, pp.305–311.
Schwartz, R., Vassilev, A., Greene, K., Perine, L., Burt, A. and Hall, P. (2022). Towards a Standard for Identifying and Managing Bias in Artificial Intelligence, Special Publication (NIST SP). [online] National Institute of Standards and Technology. Gaithersburg, MD. Available at: https://tsapps.nist.gov/publication/get\_pdf.cfm?pub\_id=934464 [Accessed 11 Dec. 2024].
Tamkin, A., Askell, A., Lovitt, L., Durmus, E., Joseph, N., Kravec, S., Nguyen, K., Kaplan, J. and Ganguli, D. (2023). Evaluating and Mitigating Discrimination in Language Model Decisions. [online] Anthropic, pp.1–28. Available at: https://www.anthropic.com/research/evaluating-and-mitigating-discrimination-in-language-model-decisions [Accessed 11 Dec. 2024].
Winner, L. (1980). Do artifacts have politics? Daedalus, 109(1), pp.121–136.
Yong, Z., Menghini, C. and Bach, S.H. (2024). Low-Resource Languages Jailbreak GPT-4. [online] ArXiv, pp.1–15. Available at: https://arxiv.org/pdf/2310.02446 [Accessed 11 Dec. 2024].
-
E.g. https://www.newscientist.com/article/2398656-gpt-4-gave-advice-on-planning-terrorist-attacks-when-asked-in-zulu/ -- #user-content-fnref-2
-
An LLM prompted by a user accumulates data about the user, such as: the prompts asked, the way the prompts are asked, the form and the time of an interaction, as well as the history of previous engagements, etc,.. This data could be seen as ethnographic data, similar to that produced during an interview. -- #user-content-fnref-3