The first Cerification of AI Trustworthiness in the HR industry

At TTA's AI Reliability Center, we have meticulously certified an AI-powered video interview assessment system for the high-stakes recruitment field.

Leo

Jun 13, 2024

AI Interview Assessment System

At TTA's AI Reliability Center, we have meticulously certified an AI-powered video interview assessment system for its reliability in the high-stakes field of recruitment. This product, known as the 'subject product,' has undergone a rigorous certification process, showcasing the developer's unwavering commitment to meeting the stringent requirements for AI reliability (TTAK.KO-10.1497). The developer's dedication and diligence in ensuring a trustworthy AI system are evident.

We aim to provide practical guidance on achieving reliable AI by showcasing this exemplary case of AI reliability certification in recruitment. This will be valuable for stakeholders considering the adoption of AI in recruitment.

The subject product, a unique AI-powered video interview assessment system, evaluates candidates by analyzing video interviews and responses through AI. It combines soft skills assessment and certification of communication abilities suitable for the organization and job with behaviour event interview (BEI) techniques. BEI assesses past behaviour to predict high-performance job capabilities. Together, these methods provide a comprehensive evaluation to support decision-makers.

The subject product's users fall into two main groups: clients (organizations using the product for their recruitment processes) and candidates (individuals applying for positions and undergoing interviews).

The certification case details the adherence to the AI reliability enhancement requirements (TTAK.KO-10.1497), documented sequentially by requirement number.

1. Risk Management Plan and Implementation for AI Systems

The developer has implemented a comprehensive risk management plan that leaves no stone unturned. This plan addresses risks that may arise during the development and operation stages, involving nearly all departments (planning, development, quality assurance, operations, etc.). The process is meticulously documented for effective management, ensuring the security of the AI system.

Each department plays a vital role in risk management, identifying potential risks based on empirical and research-based knowledge of the AI lifecycle. This comprehensive process includes:

Risk Factors: Identified based on each department's experiential and research knowledge.
Risk Levels: Assessed by each department using their expertise and evidence, categorized as “High,” “Medium,” or “Low.”
Mitigation Strategies: Developed through discussions across all departments to address identified risks.
Controlled Risk Levels: Evaluated after mitigation actions, using a five-tier scale: “High,” “Medium-High,” “Medium,” “Medium-Low,” and “Low.”
Responsible Departments/Persons: To monitor each risk's occurrence, response, and outcomes.

Risk Management Table Template

The pivotal AI Reliability Governance Committee oversees the subject product's risk management at the organizational level. With its detailed roles and responsibilities, this committee is crucial in ensuring the system's reliability and adherence to ethical core Requirements.

2. AI Governance Structure

To ensure AI reliability, the developer has established and operates an "AI Reliability Governance Committee." This committee has undertaken the following key activities:

Organizational Structure: Comprises seven individual departments based on the product lifecycle and the Governance Committee Chair.
Defining Ethical Core Requirements: Establishes ethical core requirements for the product's research, development, and operation stages.

AI Ethical Core Requirements

Autonomous Ethics Checklist: In collaboration with the Korea Information Society Development Institute (KISDI), the committee has created an ethics checklist to ensure the implementation of ethical core requirements and conducts self-assessments using this checklist.

Example of AI Ethics Checklist - Source: 2023 AI Ethics Standards Self-Assessment Checklist (Draft)

Risk Response Process Development and Implementation
- (Individual Departments): Identify key risk factors for each stage of the AI lifecycle.
- (Governance Committee Chair): Recognize and communicate risk factors at the organizational level.
- (Governance Organization, Chair): Analyze and evaluate risk factors and devise risk response strategies.
- (Individual Departments): Implement response actions.
- (Governance Organization, Chair): Confirm risk factors are mitigated or eliminated post-response.
- (Governance Committee Chair): Review outcomes and report to management if necessary.

Governance Risk Response Process

Activity Documentation: The committee maintains comprehensive documentation of all activities, including meeting minutes, risk assessment rationale, and risk response records.
Activity Frequency: Regular quarterly meetings and inspections, with emergency meetings convened at the Chair's discretion.

3. AI System Reliability Testing Plan

The operational environment for the subject product includes the operational managers and interviewers (referred to as recruitment experts) from client companies using the product for their recruitment processes, as well as the interview candidates (referred to as applicants).

A working group that included these users and developers was formed to design the test for the subject product. The tests were conducted in an environment identical to actual video interviews. Below are the details of the tests that were created and performed.

Tests to Mitigate Uncertainty in Inference Results
- Measured the reliability coefficients of the recruitment experts' evaluations and the system's inference results to verify their significance.
- Assessed the reliability coefficients for each competency individually to predict their contribution to the inference results.
- Measured the reliability coefficients among recruitment experts to identify human variance.

Case Study on the Significance Verification between AI Systems and Human Interviewers

Tests to Ensure the Explainability and Interpretability of Expected Outputs:
- Conducted usability surveys with clients and recruitment experts.
- Conducted usability surveys with applicants.

Summary of User Survey Results

Tests to Identify and Address Vulnerabilities:
- Identified and tested scenarios of malicious use.

4. AI System Traceability and Change History

(Traceability of AI Decision-Making) The analysis results displayed on the operational page can be verified against information in the applicant database. Additionally, the system allows tracking the initial input information via the applicant key, such as the uploaded initial video's storage path.

(Source of Training Data) Only the interview videos of actual applicants using the system were utilized as the source of training data, ensuring a relatively reliable data source. The subject product maintains an internal system (the "training data system") for processing, storing, and managing the training data. This system logs data usage, facilitating easy tracking of changes.

(Changes to Training Data) To manage changes to the data used for model training, the development organization maintains "data sheets" and "model cards." These documents enable the reproduction of specific datasets used by models.

Type	Details
Data Sheet	This record records all information required to reconstruct the same dataset from the training data system, including details about models and related models and research conducted using the dataset.
Model Card	Contains an overview of the model, description, structure, training dataset, libraries used, input/output details, use cases, benchmark results, usage issues, provision history and status, reliability records (bias, validity), and related models.

5. Detailed Information for Data Utilization

(Refined Information) Since the interview videos uploaded through the actual system are used as the source of training data, primary data refinement occurs during this process. This is known as the "Information Insufficiency" process, which involves detecting and preventing issues in the videos before and during the interview.

Pre-Interview (Environment Check):
- Cheating Prevention Refinement: Detects face presence, average lighting, multiple persons, mask usage, and frontal face presence.
- Device Environment Check Error Refinement: Detects frame extraction failures, insufficient face detection rate, low video FPS, face movement out of frame, high face obstruction rate, audio extraction failures, low audio volume, and insufficient speech recognition character count.
During Interview (Actual Participation):
- Ongoing Detection During the Interview: It detects frame extraction failures, an insufficient face detection rate, an absence of face, audio extraction failures, and an inadequate speech recognition character count.

(Metadata) The training data system automatically attaches metadata to the uploaded data, so no separate specification is required. However, the following information can be accessed:

Metadata List and Contents
Basic statistical information about the data.
Information filtered by each metadata.

(Protected Variables) No demographic information is tagged in the training data. Therefore, protected variables are not explicitly designated or managed, but other factors that could cause bias in the results are defined and managed separately. This is detailed in Requirement 9.

(Labeling Training and Guidelines) A labelling workshop was conducted to educate labellers. During the seminar, documentation on data labelling standards (inspection criteria) and methods was provided and taught.

Labeling Criteria for BEI Behavioral Indicators

6. Anomaly Data Inspection for Ensuring Data Robustness

The strategy for ensuring the subject product's robustness is to block the inflow of anomalous data preemptively. In addition to the "Information Insufficiency" process described in Requirement 5, the "AI Supervisor" feature was implemented to inform applicants of and avoid providing anomalous data. The system detects anomalies in the table below during the interview recording. When anomalies such as capturing, recording, or screen sharing are detected, the system displays a warning or enforces a logout.

Features	Details
Prevent Proxy Testing	This method compares the interview video with the profile picture to confirm the same person's identity.
Detection of Accompanied Persons	Checks if any face other than the applicant's is detected.
Prevention of Capturing, Recording, and Screen Sharing	Verifies whether the screen is captured or recorded.
Mask Detection	Detects if a mask is worn to hide facial expressions.
Answer Similarity Check Among Applicants	Detects if the applicant verbally repeats answers from a consultation or previous applicant.

By implementing these measures, the system aims to maintain the integrity and reliability of the data collected during the interview process, ensuring that the results are accurate and trustworthy.

7. Bias Removal from Collected and Processed Training Data

(Human Bias in Data Collection) The source training data for the subject product is the interview videos recorded by the applicants. As described in Requirements 5 and 6, this data undergoes a cleaning process before being uploaded to the training system. Since no separate data collector is involved in this process, the intentional selection of source data does not occur.

(Physical Bias of Collection Devices) Applicants record their videos using their own devices. During the interview process using the subject product, users are provided with minimum device specifications and guidelines. The system collects inputs from a variety of devices within these specifications.

Device Specifications Guidance for PC Use

Device Specifications Guidance for Mobile Device Use

8. Security and Compatibility Check of Open Source Libraries

The development organization has established rules for using open-source libraries and frameworks and manages these checks through documented verification. To use open-source libraries and frameworks, the following aspects must be verified:

License Verification
Activity Review
Compatibility Review
Security Vulnerability Review

The management documents record the following details.

Document	Details
Total open source usage status	open source name, purpose of use, homepage, license, version, activity level, creation date
Open source usage status by model	model ID, applicable service, department using, open source name, license, license display obligation, GitHub link, activity level (number of stars), creation date

9. Bias Removal in AI Models

When updating or applying new models to the AI system, the company follows a unique procedure to verify their validity and bias. The development company has established step-by-step verification datasets for all defined verification elements. The initial construction of several verification sets has been completed, and datasets for the remaining verification elements are being developed for future experiments.

(Procedure) - Internal Model Verification - Official Validity Verification - Model Deployment Testing

(Method) Utilize verification datasets built under actual video interview conditions to validate the significance and validity of verification metrics.

(Verification Elements) Define elements that may cause bias as verification elements and check for significant biases in the AI inference results.

Since accurate labels were not tagged for gender, a discernment model was used to estimate gender and construct the verification set.

Type	Details
Verification Set Construction Completed	Brightness, recording equipment, gender, glasses, volume, pitch, speech information, camera angle, recording distance.
Verification Set Construction in Progress	Resolution, hairstyle, speech speed, pronunciation accuracy, specific behaviours, accessories, lighting, technical jargon, etc.

(Verification Metrics) Various metrics are used for each verification element

Quantitative Metrics (Correlation Coefficient)
- T-test for Difference Verification (p-value < 0.01, 0.05, 0.1)
- ANOVA for Difference Verification (p-value < 0.01, 0.05, 0.1)
- SPDD
- PCC
Bias Significance (Inter-Rater Reliability)

Interpretation of Selection Tool Validity: Follows the standards recommended for HR by the ETA (Employment and Training Administration) of the U.S. Department of Labor.

10. Defense Measures Against AI Model Attacks

The following measures have been established to address potential attacks on the AI system, such as model extraction and model evasion.

Defence Against Model Extraction in Operational Scenarios
- Unique Access Codes: Clients are given unique codes to provide to eligible applicants, allowing system access.
- Single Upload Opportunity: Each code permits only one video upload opportunity per applicant (ensuring equal opportunity for all applicants).
- Restricted Model Results: Applicants cannot access the model's inference results, which are reviewed solely by the client.
Defence Against Model Evasion Through Pre-Tests and Feature Implementation
- QA Test Scenarios: Various evasion scenarios are tested during QA, such as replacing the face video with a drawing or substituting answers with song lyrics.

QA Experiment Records

- "AI Supervisor" Function: Defines types of model evasion (see Requirement 6) and implements notifications or forced logouts when such types are detected.
- "Information Insufficiency" Process: Automatically filters out data during training that lacks sufficient information for model analysis.

Information Insufficiency Scenarios

11. AI Model Specifications and Explanation of Inference Results

(AI Model Specifications) The subject product's inference model is managed as a pair with its data, using model cards and data sheets (see Requirement 4) to specify and manage model versions.

(Inference Basis) Various XAI (Explainable AI) technologies were experimented with to identify factors significantly influencing bias (no actual applications were made). The technologies tested include:

- LIME, SHAP, Feature Ablation

Example of SHAP Experiment Results

(Interpretation of Inference Results) The development company undertakes the following activities to aid in the interpretation of the model's inference results:

- Conducted surveys with clients to gather feedback and improve presentation methods.
- Provided clients with explanatory materials to help interpret the inference results pages.

Example of Result Page Interpretation Materials

Supplied clients with significant metrics and their explanations for model performance.
- Measured and visualized correlation coefficients in the same manner as described in Requirement 9.

Example of Significance Explanation: Correlation between AI Interview Composite Score and Face-to-Face Interview Outcomes (Validity)

12. Removing Bias in AI System Implementation

(Bias from User Interface) User input is received through video input devices, and the system displays queries in text. Bias mitigation related to devices is included in the verification elements as part of the company’s validation procedures (see Requirement 9).

(Bias from Interaction Methods) The system functions as a one-way communication tool, and efforts have been made to enhance the clarity of questions to reduce bias based on the applicant’s understanding.

Example of Question Clarity Improvement Efforts

13. Safe Mode Implementation and Issue Notification Procedures for AI Systems

(User Error Guidance and Exception Handling Policies) Applicants are preemptively provided with Common issues and solutions. A dedicated channel is available for users to report exceptions or system problems, and this channel is always open.

Issue Reporting Channel

(Defense Measures Against System Attacks) To defend against attacks, the measures outlined in Requirement 10 are implemented. The system undergoes regular vulnerability scans, and actions are taken based on the results.

Example of Action Taken Based on Vulnerability Scans

(Human Intervention in AI Inference Results) The primary function of the subject product is to support interviewers’ decisions. The system provides reference screens for final decision-making. The option to replace interviews is available, but the decision is left to the client. If clients use the system to replace interviews, they may incorporate human intervention according to their policies.

(System Error Monitoring and Notification) Continuous server monitoring and auto-scaling technology are applied to prevent server anomalies. Notifications are sent to the development team via collaboration tools if a server issue occurs, prompting immediate response based on internal procedures.

Example of Server Error Notification Message

14. User Understanding of AI System Explanations

(User Surveys) Surveys were conducted with applicants and client companies to gather feedback, which was then integrated into the system.

Example of Applicant Survey and Response Materials

(Providing Interpretation Materials) Guides explaining the inference results page, including the meaning of technical terms and result metrics, are provided to interviewers accessing the AI inference results.

Example of Result Page Interpretation Materials

15. Explanation of Service Scope and Interaction Targets

(Correct Use of Service) Clients are provided with explanatory materials outlining the purpose and goals of the subject product. As the primary operators, clients are guided to explain the service’s limitations and scope to applicants, including AI analyzing interview videos.

(Human Intervention in Decision-Making) The product’s inferences are meant to support client interviewers' decisions. Adopting these inferences for final decision-making depends on the client's policies. If clients fully replace interview processes with the product, they are strongly advised to disclose this and establish procedures for human intervention.

Explanation and Disclosure Materials for Automated Decisions

응시자 동의 요청 예시

AI Case Study

The first Cerification of AI Trustworthiness in the HR industry

AI Interview Assessment System

1. Risk Management Plan and Implementation for AI Systems

2. AI Governance Structure

3. AI System Reliability Testing Plan

4. AI System Traceability and Change History

5. Detailed Information for Data Utilization

6. Anomaly Data Inspection for Ensuring Data Robustness

7. Bias Removal from Collected and Processed Training Data

8. Security and Compatibility Check of Open Source Libraries

9. Bias Removal in AI Models

10. Defense Measures Against AI Model Attacks

11. AI Model Specifications and Explanation of Inference Results

12. Removing Bias in AI System Implementation

13. Safe Mode Implementation and Issue Notification Procedures for AI Systems

14. User Understanding of AI System Explanations

15. Explanation of Service Scope and Interaction Targets

Similar posts

Improve service with candidate service surveys

'AI interviewer' chosen by Enterprises [Story Pack - Genesis Labs ①]

Ethics and trust in AI recruiting solutions

The first Cerification of AI Trustworthiness in the HR industry

AI Interview Assessment System

1. Risk Management Plan and Implementation for AI Systems

2. AI Governance Structure

3. AI System Reliability Testing Plan

4. AI System Traceability and Change History

5. Detailed Information for Data Utilization

6. Anomaly Data Inspection for Ensuring Data Robustness

7. Bias Removal from Collected and Processed Training Data

8. Security and Compatibility Check of Open Source Libraries

9. Bias Removal in AI Models

10. Defense Measures Against AI Model Attacks

11. AI Model Specifications and Explanation of Inference Results

12. Removing Bias in AI System Implementation

13. Safe Mode Implementation and Issue Notification Procedures for AI Systems

14. User Understanding of AI System Explanations

15. Explanation of Service Scope and Interaction Targets

Similar posts

Improve service with candidate service surveys

'AI interviewer' chosen by Enterprises [Story Pack - Genesis Labs ①]

Ethics and trust in AI recruiting solutions

Get the latest HR insights.