The first Cerification of AI Trustworthiness in the HR industry

Written by Leo | Jun 13, 2024 3:02:57 AM

AI Interview Assessment System

At TTA's AI Reliability Center, we have meticulously certified an AI-powered video interview assessment system for its reliability in the high-stakes field of recruitment. This product, known as the 'subject product,' has undergone a rigorous certification process, showcasing the developer's unwavering commitment to meeting the stringent requirements for AI reliability (TTAK.KO-10.1497). The developer's dedication and diligence in ensuring a trustworthy AI system are evident.

We aim to provide practical guidance on achieving reliable AI by showcasing this exemplary case of AI reliability certification in recruitment. This will be valuable for stakeholders considering the adoption of AI in recruitment.

The subject product, a unique AI-powered video interview assessment system, evaluates candidates by analyzing video interviews and responses through AI. It combines soft skills assessment and certification of communication abilities suitable for the organization and job with behaviour event interview (BEI) techniques. BEI assesses past behaviour to predict high-performance job capabilities. Together, these methods provide a comprehensive evaluation to support decision-makers.

The subject product's users fall into two main groups: clients (organizations using the product for their recruitment processes) and candidates (individuals applying for positions and undergoing interviews).

The certification case details the adherence to the AI reliability enhancement requirements (TTAK.KO-10.1497), documented sequentially by requirement number.

1. Risk Management Plan and Implementation for AI Systems

The developer has implemented a comprehensive risk management plan that leaves no stone unturned. This plan addresses risks that may arise during the development and operation stages, involving nearly all departments (planning, development, quality assurance, operations, etc.). The process is meticulously documented for effective management, ensuring the security of the AI system.

Each department plays a vital role in risk management, identifying potential risks based on empirical and research-based knowledge of the AI lifecycle. This comprehensive process includes:

Risk Factors: Identified based on each department's experiential and research knowledge.
Risk Levels: Assessed by each department using their expertise and evidence, categorized as “High,” “Medium,” or “Low.”
Mitigation Strategies: Developed through discussions across all departments to address identified risks.
Controlled Risk Levels: Evaluated after mitigation actions, using a five-tier scale: “High,” “Medium-High,” “Medium,” “Medium-Low,” and “Low.”
Responsible Departments/Persons: To monitor each risk's occurrence, response, and outcomes.

The pivotal AI Reliability Governance Committee oversees the subject product's risk management at the organizational level. With its detailed roles and responsibilities, this committee is crucial in ensuring the system's reliability and adherence to ethical core Requirements.

2. AI Governance Structure

To ensure AI reliability, the developer has established and operates an "AI Reliability Governance Committee." This committee has undertaken the following key activities:

Organizational Structure: Comprises seven individual departments based on the product lifecycle and the Governance Committee Chair.
Defining Ethical Core Requirements: Establishes ethical core requirements for the product's research, development, and operation stages.

Autonomous Ethics Checklist: In collaboration with the Korea Information Society Development Institute (KISDI), the committee has created an ethics checklist to ensure the implementation of ethical core requirements and conducts self-assessments using this checklist.

Risk Response Process Development and Implementation
- (Individual Departments): Identify key risk factors for each stage of the AI lifecycle.
- (Governance Committee Chair): Recognize and communicate risk factors at the organizational level.
- (Governance Organization, Chair): Analyze and evaluate risk factors and devise risk response strategies.
- (Individual Departments): Implement response actions.
- (Governance Organization, Chair): Confirm risk factors are mitigated or eliminated post-response.
- (Governance Committee Chair): Review outcomes and report to management if necessary.

Activity Documentation: The committee maintains comprehensive documentation of all activities, including meeting minutes, risk assessment rationale, and risk response records.
Activity Frequency: Regular quarterly meetings and inspections, with emergency meetings convened at the Chair's discretion.

3. AI System Reliability Testing Plan

The operational environment for the subject product includes the operational managers and interviewers (referred to as recruitment experts) from client companies using the product for their recruitment processes, as well as the interview candidates (referred to as applicants).

A working group that included these users and developers was formed to design the test for the subject product. The tests were conducted in an environment identical to actual video interviews. Below are the details of the tests that were created and performed.

Tests to Mitigate Uncertainty in Inference Results
- Measured the reliability coefficients of the recruitment experts' evaluations and the system's inference results to verify their significance.
- Assessed the reliability coefficients for each competency individually to predict their contribution to the inference results.
- Measured the reliability coefficients among recruitment experts to identify human variance.

Tests to Ensure the Explainability and Interpretability of Expected Outputs:
- Conducted usability surveys with clients and recruitment experts.
- Conducted usability surveys with applicants.

Tests to Identify and Address Vulnerabilities:
- Identified and tested scenarios of malicious use.

4. AI System Traceability and Change History

(Traceability of AI Decision-Making) The analysis results displayed on the operational page can be verified against information in the applicant database. Additionally, the system allows tracking the initial input information via the applicant key, such as the uploaded initial video's storage path.

(Source of Training Data) Only the interview videos of actual applicants using the system were utilized as the source of training data, ensuring a relatively reliable data source. The subject product maintains an internal system (the "training data system") for processing, storing, and managing the training data. This system logs data usage, facilitating easy tracking of changes.

(Changes to Training Data) To manage changes to the data used for model training, the development organization maintains "data sheets" and "model cards." These documents enable the reproduction of specific datasets used by models.

Type	Details
Data Sheet	This record records all information required to reconstruct the same dataset from the training data system, including details about models and related models and research conducted using the dataset.
Model Card	Contains an overview of the model, description, structure, training dataset, libraries used, input/output details, use cases, benchmark results, usage issues, provision history and status, reliability records (bias, validity), and related models.

5. Detailed Information for Data Utilization

(Refined Information) Since the interview videos uploaded through the actual system are used as the source of training data, primary data refinement occurs during this process. This is known as the "Information Insufficiency" process, which involves detecting and preventing issues in the videos before and during the interview.

Pre-Interview (Environment Check):
- Cheating Prevention Refinement: Detects face presence, average lighting, multiple persons, mask usage, and frontal face presence.
- Device Environment Check Error Refinement: Detects frame extraction failures, insufficient face detection rate, low video FPS, face movement out of frame, high face obstruction rate, audio extraction failures, low audio volume, and insufficient speech recognition character count.
During Interview (Actual Participation):
- Ongoing Detection During the Interview: It detects frame extraction failures, an insufficient face detection rate, an absence of face, audio extraction failures, and an inadequate speech recognition character count.

(Metadata) The training data system automatically attaches metadata to the uploaded data, so no separate specification is required. However, the following information can be accessed:

Metadata List and Contents
Basic statistical information about the data.
Information filtered by each metadata.

(Protected Variables) No demographic information is tagged in the training data. Therefore, protected variables are not explicitly designated or managed, but other factors that could cause bias in the results are defined and managed separately. This is detailed in Requirement 9.

(Labeling Training and Guidelines) A labelling workshop was conducted to educate labellers. During the seminar, documentation on data labelling standards (inspection criteria) and methods was provided and taught.

6. Anomaly Data Inspection for Ensuring Data Robustness

The strategy for ensuring the subject product's robustness is to block the inflow of anomalous data preemptively. In addition to the "Information Insufficiency" process described in Requirement 5, the "AI Supervisor" feature was implemented to inform applicants of and avoid providing anomalous data. The system detects anomalies in the table below during the interview recording. When anomalies such as capturing, recording, or screen sharing are detected, the system displays a warning or enforces a logout.

Features	Details
Prevent Proxy Testing	This method compares the interview video with the profile picture to confirm the same person's identity.
Detection of Accompanied Persons	Checks if any face other than the applicant's is detected.
Prevention of Capturing, Recording, and Screen Sharing	Verifies whether the screen is captured or recorded.
Mask Detection	Detects if a mask is worn to hide facial expressions.
Answer Similarity Check Among Applicants	Detects if the applicant verbally repeats answers from a consultation or previous applicant.

By implementing these measures, the system aims to maintain the integrity and reliability of the data collected during the interview process, ensuring that the results are accurate and trustworthy.

7. Bias Removal from Collected and Processed Training Data

(Human Bias in Data Collection) The source training data for the subject product is the interview videos recorded by the applicants. As described in Requirements 5 and 6, this data undergoes a cleaning process before being uploaded to the training system. Since no separate data collector is involved in this process, the intentional selection of source data does not occur.

(Physical Bias of Collection Devices) Applicants record their videos using their own devices. During the interview process using the subject product, users are provided with minimum device specifications and guidelines. The system collects inputs from a variety of devices within these specifications.

8. Security and Compatibility Check of Open Source Libraries

The development organization has established rules for using open-source libraries and frameworks and manages these checks through documented verification. To use open-source libraries and frameworks, the following aspects must be verified:

License Verification
Activity Review
Compatibility Review
Security Vulnerability Review

The management documents record the following details.

Document	Details
Total open source usage status	open source name, purpose of use, homepage, license, version, activity level, creation date
Open source usage status by model	model ID, applicable service, department using, open source name, license, license display obligation, GitHub link, activity level (number of stars), creation date

9. Bias Removal in AI Models

When updating or applying new models to the AI system, the company follows a unique procedure to verify their validity and bias. The development company has established step-by-step verification datasets for all defined verification elements. The initial construction of several verification sets has been completed, and datasets for the remaining verification elements are being developed for future experiments.

(Procedure) - Internal Model Verification - Official Validity Verification - Model Deployment Testing

(Method) Utilize verification datasets built under actual video interview conditions to validate the significance and validity of verification metrics.

(Verification Elements) Define elements that may cause bias as verification elements and check for significant biases in the AI inference results.

Since accurate labels were not tagged for gender, a discernment model was used to estimate gender and construct the verification set.

Type	Details
Verification Set Construction Completed	Brightness, recording equipment, gender, glasses, volume, pitch, speech information, camera angle, recording distance.
Verification Set Construction in Progress	Resolution, hairstyle, speech speed, pronunciation accuracy, specific behaviours, accessories, lighting, technical jargon, etc.

(Verification Metrics) Various metrics are used for each verification element

Quantitative Metrics (Correlation Coefficient)
- T-test for Difference Verification (p-value < 0.01, 0.05, 0.1)
- ANOVA for Difference Verification (p-value < 0.01, 0.05, 0.1)
- SPDD
- PCC
Bias Significance (Inter-Rater Reliability)

Interpretation of Selection Tool Validity: Follows the standards recommended for HR by the ETA (Employment and Training Administration) of the U.S. Department of Labor.

10. Defense Measures Against AI Model Attacks

The following measures have been established to address potential attacks on the AI system, such as model extraction and model evasion.

Defence Against Model Extraction in Operational Scenarios
- Unique Access Codes: Clients are given unique codes to provide to eligible applicants, allowing system access.
- Single Upload Opportunity: Each code permits only one video upload opportunity per applicant (ensuring equal opportunity for all applicants).
- Restricted Model Results: Applicants cannot access the model's inference results, which are reviewed solely by the client.
Defence Against Model Evasion Through Pre-Tests and Feature Implementation
- QA Test Scenarios: Various evasion scenarios are tested during QA, such as replacing the face video with a drawing or substituting answers with song lyrics.

- "AI Supervisor" Function: Defines types of model evasion (see Requirement 6) and implements notifications or forced logouts when such types are detected.
- "Information Insufficiency" Process: Automatically filters out data during training that lacks sufficient information for model analysis.

11. AI Model Specifications and Explanation of Inference Results

(AI Model Specifications) The subject product's inference model is managed as a pair with its data, using model cards and data sheets (see Requirement 4) to specify and manage model versions.

(Inference Basis) Various XAI (Explainable AI) technologies were experimented with to identify factors significantly influencing bias (no actual applications were made). The technologies tested include:

- LIME, SHAP, Feature Ablation

(Interpretation of Inference Results) The development company undertakes the following activities to aid in the interpretation of the model's inference results:

- Conducted surveys with clients to gather feedback and improve presentation methods.
- Provided clients with explanatory materials to help interpret the inference results pages.

Supplied clients with significant metrics and their explanations for model performance.
- Measured and visualized correlation coefficients in the same manner as described in Requirement 9.

12. Removing Bias in AI System Implementation

(Bias from User Interface) User input is received through video input devices, and the system displays queries in text. Bias mitigation related to devices is included in the verification elements as part of the company’s validation procedures (see Requirement 9).

(Bias from Interaction Methods) The system functions as a one-way communication tool, and efforts have been made to enhance the clarity of questions to reduce bias based on the applicant’s understanding.

13. Safe Mode Implementation and Issue Notification Procedures for AI Systems

(User Error Guidance and Exception Handling Policies) Applicants are preemptively provided with Common issues and solutions. A dedicated channel is available for users to report exceptions or system problems, and this channel is always open.

(Defense Measures Against System Attacks) To defend against attacks, the measures outlined in Requirement 10 are implemented. The system undergoes regular vulnerability scans, and actions are taken based on the results.

(Human Intervention in AI Inference Results) The primary function of the subject product is to support interviewers’ decisions. The system provides reference screens for final decision-making. The option to replace interviews is available, but the decision is left to the client. If clients use the system to replace interviews, they may incorporate human intervention according to their policies.

(System Error Monitoring and Notification) Continuous server monitoring and auto-scaling technology are applied to prevent server anomalies. Notifications are sent to the development team via collaboration tools if a server issue occurs, prompting immediate response based on internal procedures.

14. User Understanding of AI System Explanations

(User Surveys) Surveys were conducted with applicants and client companies to gather feedback, which was then integrated into the system.

(Providing Interpretation Materials) Guides explaining the inference results page, including the meaning of technical terms and result metrics, are provided to interviewers accessing the AI inference results.

15. Explanation of Service Scope and Interaction Targets

(Correct Use of Service) Clients are provided with explanatory materials outlining the purpose and goals of the subject product. As the primary operators, clients are guided to explain the service’s limitations and scope to applicants, including AI analyzing interview videos.

(Human Intervention in Decision-Making) The product’s inferences are meant to support client interviewers' decisions. Adopting these inferences for final decision-making depends on the client's policies. If clients fully replace interview processes with the product, they are strongly advised to disclose this and establish procedures for human intervention.

View full post