Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Trust and safety

From Wikipedia, the free encyclopedia
User protection on online platforms

Trust and Safety (T&S) refers to the organizational functions, teams, policies, and technologies that online platforms use to protect users from harmful content, abusive behavior,fraud, andsecurity threats. The term originated ine-commerce contexts in the late 1990s,[1] where it described efforts to build trust between buyers and sellers inonline marketplaces.[2] Associal media platforms grew in the 2000s and 2010s, T&S expanded to address challenges related touser-generated content, includingharassment, onlinechild safety,hate speech,misinformation, andviolent extremism.[3]

Trust and Safety work combines human review with automated detection systems to enforce platform policies.[4] The field has faced scrutiny over enforcement practices,labor conditions formoderators,[5] and questions aboutplatform accountability, with regulatory frameworks increasingly mandating specific T&S requirements.[6][7]

History

[edit]

The concept of "Trust and Safety" (T&S) emerged as a critical function in the early days ofe-commerce, driven by the inherent risks of transactions between physically separated and anonymous parties.[1] In a 1999 press release, the online auction siteeBay used the term while introducing their "SafeHarbor trust and safety program," which included easy access toescrow services andcustomer support.[8] Initially, eBay's strategy was built on a "community trust" model as its primary security communication, encouraging users to regulate themselves.[2] "Trust" was used to reference trust among eBay users and between eBay users and eBay itself; "safety" was used to reference keeping platform users safe.[3] With internet platforms growing in scale and complexity, there was a marked increase in the scope of online harms. Social media, app stores, and marketplaces faced unique threats, including impersonation, the spread of malware, and sophisticated scams. The term soon spread throughout the tech industry, expanding from online marketplaces to social media,dating websites, andapp stores.[9] Trust and Safety teams emerged as distinct entities across the tech sector, with increasing specialization in fields such aschild protection,cyberbullying, andharassment prevention.

Regulatory response

[edit]

While the term evolved within the private sector, judicial rulings began shaping a legislative framework that encouraged investment in online user protection teams. The landmark New York state court decision inStratton Oakmont, Inc. v. Prodigy Services Co. in 1995 ruled that the early online service providerProdigy could be liable as a "publisher" for defamatory content posted by a user because the service actively filtered and edited user posts.[10] The ruling created a disincentive for interactive computer services to engage incontent moderation, as any attempt to regulate content could result in publisher liability for all user-generated material on their platforms. In response, Congress enactedSection 230 of theCommunications Decency Act in 1996, which provided immunity for online platforms hosting third-party content.[11]

Professionalization

[edit]

During the 2010s, the Trust & Safety field matured as a distinct professional discipline with major technology firms investing heavily in Trust and Safety, creating expansive teams staffed by professionals from legal, policy, technical, and social sciences backgrounds.[1] In February 2018, Santa Clara University School of Law hosted the inaugural "Content Moderation & Removal at Scale" conference, where representatives from major technology companies, including Google, Facebook, Reddit, and Pinterest, publicly discussed their content moderation operations for the first time.[12] During this conference, a group of human rights organizations, advocates, and academic experts developed the Santa Clara Principles on Transparency and Accountability in Content Moderation, which outlined standards for meaningful transparency and due process in platform content moderation.[13]

The Trust & Safety Professional Association (TSPA) and Trust & Safety Foundation (TSF) were jointly launched in 2020 as a result of the Santa Clara conference.[14][15][16] TSPA was formed as a501(c)(6)non-profit,membership-based organization supporting the global community of professionals working in T&S with resources, peer connections, and spaces for exchanging best practices.[14] TSF is a501(c)(3) and focuses on research.[17] TSF co-hosted the inaugural Trust & Safety Research Conference held at Stanford University.[18][19]

Core functions

[edit]

While organizational structures vary across companies, T&S teams typically focus on the following core functions:

Account integrity

[edit]

Account integrity teams work to detect and prevent fraudulent accounts, fake profiles,account takeovers, and coordinated inauthentic behavior.[20] This function combinesbehavioral analysis,pattern recognition, andauthentication systems to identify suspicious account activity, and focuses on ensuring that accounts represent real individuals or legitimate entities and operate within platform guidelines.

Fraud

[edit]

Fraud prevention extends beyond fake accounts to include detection offinancial scams,phishing attempts, and marketplace fraud in order to protect users from financial harm and maintains platform trustworthiness for legitimate transactions.[21] This can include manual and automatic detection systems that analyze transaction patterns including velocity (frequency and volume), geographic anomalies, mismatched billing information, and connections to known fraudulent accounts or payment instruments. Machine learning is frequently used to assign risk levels to transactions based on multiple factors, enabling automated blocking of high-risk payments while allowing low-risk transactions to proceed smoothly.[22][23][24]

Content moderation

[edit]

Content moderation involves reviewing, classifying, and taking action on content (includinguser-generated content,advertisements, andcompany-generated content) that violates a platform'spolicies, such asexplicit nudity, misinformation,graphic violence and any other non compliant materials.[25][26] Platforms use a combination of automated systems and human reviewers to enforce content policies at scale before the content is live, proactive review after the content is live, and reactive review after the content is live.[4][27]

Child safety

[edit]

Child safety in Trust & Safety contexts typically refers to the prevention, detection, and reporting ofonline child sexual exploitation and abuse and represents a critical function in the field.[28][29][30] Platforms deploy specialized detection systems to identifychild sexual abuse material (CSAM), includinghash-matching technologies that detect known CSAM images andmachine learning classifiers designed to discover previously unknown material.[31] Child safety also includes the detection and disruption ofgrooming and solicitation of minors by adults.[32]

Platform manipulation and coordinated behavior

[edit]

Platform manipulation includes detectingbot networks,spam operations, and coordinated inauthentic behavior designed toartificially amplify content or manipulate public discourse. Detecting coordinated campaigns, includingsock puppet attacks, requires understanding evolvingadversarial tactics and identifying patterns across multiple accounts.[33][34]

Regulatory compliance

[edit]

Regulatory compliance has increasingly become a distinct T&S function as regulatory frameworks have expanded globally. This includes managing copyright takedown requests under laws like theDigital Millennium Copyright Act (DMCA), implementing requirements under the EU'sDigital Services Act (DSA), and responding tolaw enforcement requests.[35] The organization and structure of this function differs by company, with some teams being embedded in Trust & Safety or kept separate in legal or compliance departments.[36]

Approaches to Trust and Safety

[edit]

Policy development and enforcement

[edit]

Platform policies are the rules and standards that govern user behavior and content on online platforms. Most platforms distinguish between public-facing documents such as community guidelines orterms of service that describe acceptable use, and internal enforcement materials, which provide moderators with detailed instructions for applying those policies.[37]

In the early stages of an online service, platform policies are often written byfounders or earlyengineering and operations teams.[37] As platforms grow, responsibility for policy development typically shifts to dedicated Legal, Policy, or Trust & Safety departments.[38] Researchers have noted that these policies are shaped by a range of factors,[27] including the platform's stated purpose, regulatory requirements, andbusiness priorities.[37] Policy development can also shift rapidly and involve continuous iteration in response to emerging circumstances.[4] Policy frameworks tend to evolve over time as companies develop new features and respond to user feedback, enforcement data, public controversies, and stakeholder pressure.[39][40][41] Approaches to policy-setting differ widely. On centralized commercial platforms, rules are typically written and enforced by internal staff, whereas some decentralized or community-based platforms distribute policymaking to volunteer moderators or user councils.[42]

Enforcement approaches vary across platforms. Some adopt rules-based systems with standardized responses to specific violations, while others implement context-dependent frameworks that consider user intent, cultural norms, and potential harm.[43][44] Regardless of the model or approach, enforcement efforts generally have two goals: consistent application of rules and the ability to implement them at scale.[45]

Human review and operations

[edit]

Trust and Safety operations combine human reviewers with automated systems to evaluate content, accounts, and behavior against platform policies.[5] As digital platforms scaled globally with increasing needs for global scale and language support,[46]Business Process Outsourcing (BPO) firms have become instrumental, providing large teams of trained moderators usually based in regions likeSoutheast Asia,Eastern Europe, andLatin America. This model of commercial content moderation is used by large companies such asFacebook,[47]TikTok,[48] andGoogle as well as smaller platforms such asPinterest,Snapchat, andBluesky. Some platforms likeDiscord andReddit rely on a mix of moderators employed by the platform as well as volunteer moderators.[49] The operating model differs by company, depending on the size of the moderation cost and impact of brand risk.[50]

Studies on moderator labor conditions reveal significant psychological costs,[51] with reviewers experiencingtrauma,burnout, and mental health impacts from sustained exposure to graphic violence, child abuse imagery, and other harmful content.[5][52]

Automation and tooling

[edit]

Automated detection systems enable platforms to identify potential policy violations at scales exceeding human capacity.[53] These technologies include hash-matching systems such asPhotoDNA, PDQ, and CSAI Match that identify known illegal content, such as CSAM andterrorism and violent extermism, through digital fingerprinting;[54] machine learning classifiers that analyze visual, textual, and behavioral patterns;[55] natural language processing tools for analyzing context and meaning;[56] and network analysis systems that detect coordinated behavior patterns.[57] Platforms integrate detection technologies with case management systems that route flagged content into review queues, assign priority levels, track enforcement decisions, and manage user appeals.[44]

Technical infrastructure also includes integration with external databases maintained by organizations including theNational Center for Missing & Exploited Children (NCMEC) and intelligence sharing programs like Project Lantern of the Technology Coalition, facilitating information sharing across platforms and with dedicated nonprofit organizations tasked with investigating specific harms.[58] Internal enforcement guidelines are typically confidential, though leaked documents have occasionally provided public insight into implementation practices.[59][60][61]

See also

[edit]

References

[edit]
  1. ^abc"The Evolution of Trust & Safety".Trust & Safety Professional Association. Retrieved2025-11-08.
  2. ^abBoyd, Josh (April 1, 2002)."In Community We Trust: Online Security Communication at eBay".Journal of Computer-Mediated Communication.7 (3).
  3. ^abCryst, Elena; Grossman, Shelby; Hancock, Jeff; Stamos, Alex; Thiel, David (2021)."Introducing the Journal of Online Trust and Safety".Journal of Online Trust and Safety.1 (1).doi:10.54501/jots.v1i1.8.ISSN 2770-3142.
  4. ^abcGillespie, Tarleton (2018-06-26).Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press.ISBN 978-0-300-23502-9.
  5. ^abcRoberts, Sarah T. (2019).Behind the Screen: Content Moderation in the Shadows of Social Media. Yale University Press.doi:10.2307/j.ctvhrcz0v.ISBN 978-0-300-23588-3.JSTOR j.ctvhrcz0v.
  6. ^Langvardt, Kyle (2017–2018)."Regulating Online Content Moderation".Georgetown Law Journal.106: 1353.
  7. ^"White Paper: Regulation, the Internet Way | Data-Smart City Solutions".datasmart.hks.harvard.edu. 2015-04-08. Retrieved2025-11-09.
  8. ^"About eBay: Press Releases".pages.ebay.com. Archived fromthe original on 2000-08-15. Retrieved2025-11-08.
  9. ^Keats Citron, Danielle; Waldman, Ari Ezra (August 23, 2025)."The Evolution of Trust and Safety".Emory Law Journal. Forthcoming.SSRN 5401604.
  10. ^"Stratton Oakmont, Inc. et al. v. Prodigy Services Company, et al - Internet Library of Law and Court Decisions".www.internetlibrary.com. Retrieved2025-11-08.
  11. ^Dickinson, Gregory M. (2024)."Section 230: A Juridical History".Stanford Technology Law Review.28: 1.
  12. ^"An Exercise in Moderation". Santa Clara University. Retrieved2025-11-08.
  13. ^"Santa Clara Principles on Transparency and Accountability in Content Moderation".Santa Clara Principles. Retrieved2025-11-08.
  14. ^abCai, Adelin; Tsao, Clara (2020-08-28)."The Trust & Safety Professional Association: Advancing The Trust And Safety Profession Through A Shared Community Of Practice".Techdirt. Retrieved2025-11-09.
  15. ^tspa-production (2020-06-17)."A Pre-History of the Trust & Safety Professional Association (TSPA)".Trust & Safety Professional Association. Retrieved2025-11-09.
  16. ^"Databite No. 134: Origins of Trust and Safety with Alexander Macgillivray and Nicole Wong".Data & Society. Retrieved2025-11-09.
  17. ^Menking, Amanda; Elswah, Mona; Grüning, David J.; Hansen, Lasse H.; Huang, Irene; Kamin, Julia; Normann, Catrine (2025-07-17),Bridging Boundaries: How to Foster Effective Research Collaborations Across Affiliations in the Field of Trust and Safety,arXiv:2507.13008, retrieved2025-11-09
  18. ^"Trust and Safety Research Conference 2025".cyber.fsi.stanford.edu. Retrieved2025-11-09.
  19. ^Hendrix, Justin (2022-09-25)."Trust and Safety Comes of Age? | TechPolicy.Press".Tech Policy Press. Retrieved2025-11-09.
  20. ^Weedon, Jen; Nuland, William; Stamos, Alex (2017),Information operations and Facebook
  21. ^Castell, Michelle (April 2013),Mitigating Online Account Takeovers: The Case for Education(PDF), Retail Payments Risk Forum Survey Paper, Federal Reserve Bank of Atlanta, archived fromthe original(PDF) on 2021-09-25
  22. ^Bin Sulaiman, Rejwan; Schetinin, Vitaly; Sant, Paul (2022-06-01)."Review of Machine Learning Approach on Credit Card Fraud Detection".Human-Centric Intelligent Systems.2 (1):55–68.doi:10.1007/s44230-022-00004-0.ISSN 2667-1336.
  23. ^Ali, Abdulalem; Abd Razak, Shukor; Othman, Siti Hajar; Eisa, Taiseer Abdalla Elfadil; Al-Dhaqm, Arafat; Nasser, Maged; Elhassan, Tusneem; Elshafie, Hashim; Saif, Abdu (2022-09-26)."Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review".Applied Sciences.12 (19): 9637.doi:10.3390/app12199637.ISSN 2076-3417.
  24. ^Raghavan, Pradheepan; Gayar, Neamat El (December 2019). "Fraud Detection using Machine Learning and Deep Learning".2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). pp. 334–339.doi:10.1109/ICCIKE47802.2019.9004231.ISBN 978-1-7281-3778-0.
  25. ^PricewaterhouseCoopers."The quest for truth: Content moderation".PwC. Retrieved2023-03-08.
  26. ^Cinelli, Matteo; Pelicon, Andraž; Mozetič, Igor; Quattrociocchi, Walter; Novak, Petra Kralj; Zollo, Fabiana (2021-11-11)."Dynamics of online hate and misinformation".Scientific Reports.11 (1): 22083.Bibcode:2021NatSR..1122083C.doi:10.1038/s41598-021-01487-w.ISSN 2045-2322.PMC 8585974.PMID 34764344.
  27. ^abKlonick, Kate (2018-04-10)."The New Governors: The People, Rules, and Processes Governing Online Speech".Harvard Law Review. Retrieved2025-11-09.
  28. ^"CyberTipline".National Center for Missing & Exploited Children. Archived fromthe original on 2025-08-05. Retrieved2025-11-09.
  29. ^Jang, Yujin; Ko, Bomin (2023-08-19)."Online Safety for Children and Youth under the 4Cs Framework-A Focus on Digital Policies in Australia, Canada, and the UK".Children (Basel, Switzerland).10 (8): 1415.doi:10.3390/children10081415.ISSN 2227-9067.PMC 10453252.PMID 37628414.
  30. ^Thakur, Dhanaraj (2024-11-21)."Real Time Threats: Analysis of Trust and Safety Practices for Child Sexual Exploitation and Abuse (CSEA) Prevention on Livestreaming Platforms".Center for Democracy and Technology. Retrieved2025-11-09.
  31. ^Sujay, Devangana; Kapoor, Vineet; Kumar Shandilya, Shishir (2024-12-27)."A Comprehensive Survey of Technological Approaches in the Detection of CSAM".Taylor & Francis:30–43.doi:10.1201/9781003471103-3.ISBN 978-1-003-47110-3. Archived fromthe original on 2025-04-29.
  32. ^"Grooming in the Digital Age".National Center for Missing & Exploited Children. Archived fromthe original on 2025-09-24. Retrieved2025-11-09.
  33. ^"Exposing Cross-Platform Coordinated Inauthentic Activity in the Run-Up to the 2024 U.S. Election".arxiv.org. Retrieved2025-11-09.
  34. ^Cinelli, Matteo; Cresci, Stefano; Quattrociocchi, Walter; Tesconi, Maurizio; Zola, Paola (2025-03-19),"Coordinated inauthentic behavior and information spreading on Twitter",Decision Support Systems,160 113819,arXiv:2503.15720,doi:10.1016/j.dss.2022.113819, retrieved2025-11-09
  35. ^Mackey, Rory Mir and Aaron (2025-06-26)."How Cops Can Get Your Private Online Data".Electronic Frontier Foundation. Retrieved2025-11-09.
  36. ^"Law Enforcement Response".Trust & Safety Professional Association. Retrieved2025-11-09.
  37. ^abc"Policy Development".Trust & Safety Professional Association. Retrieved2025-11-11.
  38. ^"Key Functions and Roles".Trust & Safety Professional Association. Retrieved2025-11-11.
  39. ^Suzor, Nicolas P.; West, Sarah Myers; Quodling, Andrew; York, Jillian (2019-03-27)."What Do We Mean When We Talk About Transparency? Toward Meaningful Transparency in Commercial Content Moderation".International Journal of Communication.13:18–18.ISSN 1932-8036.
  40. ^Gorwa, Robert (2024).The Politics of Platform Regulation: How Governments Shape Online Content Moderation. Oxford University Press.ISBN 978-0-19-769285-1.
  41. ^Newman, Lily Hay."The Daily Stormer's Last Defender in Tech Just Dropped It".Wired.ISSN 1059-1028. Retrieved2025-11-11.
  42. ^Buckley, Nicole; Schafer, Joseph Scott (August 2, 2021)."Censorship-Free Platforms: Evaluating Content Moderation Policies and Practices of Alternative Social Media".for(e)dialogue.
  43. ^Edelson, Laura (2024), Hurwitz, Justin (Gus); Langvardt, Kyle (eds.),"Content Moderation in Practice",Media and Society After Technological Disruption, Cambridge: Cambridge University Press, pp. 150–160,ISBN 978-1-009-17441-1, retrieved2025-11-11
  44. ^abFrançois, Camille; Shen, Juliet; Roth, Yoel; Lai, Samantha; Povolny, Mariel (July 30, 2025)."Four Functional Quadrants for Trust & Safety Tools: Detection, Investigation, Review & Enforcement (DIRE)".Trust, Safety, and the Internet We Share: Multistakeholder Insights, Edited Volume, Taylor & Francis (Forthcoming).SSRN 5369158.
  45. ^Schaffner, Brennan; Bhagoji, Arjun Nitin; Cheng, Siyuan; Mei, Jacqueline; Shen, Jay L; Wang, Grace; Chetty, Marshini; Feamster, Nick; Lakier, Genevieve; Tan, Chenhao (2024-05-11).""Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies".Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. CHI '24. New York, NY, USA: Association for Computing Machinery:1–16.doi:10.1145/3613904.3642333.ISBN 979-8-4007-0330-0.
  46. ^Mukherjee, Sujata; Eissfeldt, Jan (2023-09-25)."Evaluating the Forces Shaping the Trust & Safety Industry | TechPolicy.Press".Tech Policy Press. Retrieved2025-11-09.
  47. ^"The Silent Partner Cleaning Up Facebook for $500 Million a Year (Published 2021)". 2021-08-31. Retrieved2025-11-09.
  48. ^Shead, Sam (2020-11-12)."TikTok is luring Facebook moderators to fill new trust and safety hubs".CNBC. Retrieved2025-11-09.
  49. ^Seering, Joseph; Dym, Brianna; Kaufman, Geoff; Bernstein, Michael (2022-02-28)."Pride and Professionalization in Volunteer Moderation: Lessons for Effective Platform-User Collaboration".Journal of Online Trust and Safety.1 (2).doi:10.54501/jots.v1i2.34.ISSN 2770-3142.
  50. ^Madio, Leonardo; Quinn, Martin (2025)."Content moderation and advertising in social media platforms".Journal of Economics & Management Strategy.34 (2):342–369.doi:10.1111/jems.12602.hdl:11577/3516499.ISSN 1530-9134.
  51. ^Pinchevski, Amit (2023-01-01). "Social media's canaries: content moderators between digital labor and mediated trauma".Media, Culture & Society.45 (1):212–221.doi:10.1177/01634437221122226.ISSN 0163-4437.
  52. ^Spence, Ruth; Bifulco, Antonia; Bradbury, Paula; Martellozzo, Elena; DeMarco, Jeffrey (2023-09-18)."The psychological impacts of content moderation on content moderators: A qualitative study".Cyberpsychology: Journal of Psychosocial Research on Cyberspace.17 (4).doi:10.5817/CP2023-4-8.ISSN 1802-7962.
  53. ^Gorwa, Robert; Binns, Reuben; Katzenbach, Christian (2020-01-01)."Algorithmic content moderation: Technical and political challenges in the automation of platform governance".Big Data & Society.7 (1) 2053951719897945.doi:10.1177/2053951719897945.ISSN 2053-9517.
  54. ^Teunissen, Coen; Napier, Sarah (July 2022)."Child sexual abuse material and end-to-end encryption on social media platforms: An overview".Trends and Issues in Crime and Criminal Justice (653):1–19.
  55. ^Chen, Thomas M. (2021-10-10)."Automated Content Classification in Social Media Platforms".Taylor & Francis:53–71.doi:10.1201/9781003134527-6.ISBN 978-1-003-13452-7. Archived fromthe original on 2025-05-06.
  56. ^Khan, Zeeshan (2025-02-28)."Natural Language Processing Techniques for Automated Content Moderation".International Journal of Web of Multidisciplinary Studies.2 (2):21–27.ISSN 3049-2424.
  57. ^Haythornthwaite, Caroline (2023-07-01)."Moderation, Networks, and Anti-Social Behavior Online".Social Media + Society.9 (3) 20563051231196874.doi:10.1177/20563051231196874.ISSN 2056-3051.
  58. ^"What is Lantern?".inhope.org. Retrieved2025-11-09.
  59. ^"Inside Facebook: Die geheimen Lösch-Regeln von Facebook".Süddeutsche.de (in German). 2016-12-16. Retrieved2025-11-11.
  60. ^Köver, Chris; Reuter, Markus (2019-12-02)."Discrimination: TikTok curbed reach for people with disabilities".netzpolitik.org (in German). Retrieved2025-11-11.
  61. ^"Inside Facebook's Secret Rulebook for Global Political Speech (Published 2018)". 2018-12-27. Retrieved2025-11-11.
Concepts and practices
Technologies and methods
Laws and regulations
Related topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Trust_and_safety&oldid=1322993976"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp