My Work | Khiem Pham

MOVE Fellow Expert

Nov 2025 - Present, Freelance

LLM Agentic - Associate Researcher

Oct 2025 - Mar 2026, Contract/ Full-time

Finance Research Expert

Nov 2025 - Dec 2025, Freelance (Project-based)

Founder/ Corporate Finance Analyst

Sep 2023 - Present, Part-time

Quality Analyst - Internal QA Team

Feb 2026 - Present, Contract/ Freelance

Project Lead Quality

Aug 2025 - Oct 2025, Contract

Quality Analyst

Jan 2025 - Jun 2026, Contract/ Freelance

GenAI Associate - Red Teaming

Jun 2025 - Aug 2025, Contract

V-Star Internship

Sep 2024 - Nov 2024, Internship

Career History

Founder/ Corporate Finance Analyst at Finstock, Inc.

Founder/ Corporate Finance Analyst at Finstock, Inc, Sep 2023 - Present

- Founded and served as the initial architect of Finstock’s early-stage vision and objectives; managed core financial flows and corporate finance activities during the company’s formative phase.

- Built and supported a finance + trading support offering that provides investment advisory services, in-depth financial analysis, and customized strategies, with primary focus on U.S. equities and additional support for forex and crypto markets.

- Developed analytical frameworks that combine fundamental analysis, technical analysis, and quantitative modeling (including macro/micro factors) to generate data-driven insights and strategy recommendations.

- Led integration of AI-driven analytics into investment tools: real-time data processing, predictive modeling, automation of routine analysis, NLP-based extraction from unstructured sources (news/financial reports/sentiment), and enhanced risk assessment workflows.

- Contributed to designing and maintaining a multi-product service suite, including TradingView indicators, MT5-supported online investment/robo-advisor tools, risk modeling (beta), and periodic macro/finance reporting products.

Associate Researcher/ Calibrator at Turing

Associate Researcher/ Calibrator at Turing, Oct 2025 - Mar 2026

- Designed and iterated complex prompt suites (single-turn + multi-turn) to benchmark LLM reasoning, focusing on correctness, instruction-following, hallucination risk, and reasoning structure using rubric-driven evaluations.

- Performed pairwise comparisons / preference rankings across multiple model completions, documenting failure modes (logic errors, unsupported claims, weak reasoning steps) to generate high-signal evaluation/training data that supports preference/reward modeling workflows.

- Owned the end-to-end data production cycle-drafting prompts, revising for clarity, and executing quality checks-to ensure consistent, high-integrity outputs aligned with strict evaluation guidelines.

V-Star Internship at VIVO

V-Star Internship at VIVO, Sep 2024 - Dec 2024

- Analyzed sell-through and sales performance across 10+ key retail chains, using structured tracking to identify gaps by channel/SKU/period, and delivered actionable insights that supported the regional team in improving sell-through by 12%.

- Built and maintained recurring performance summaries for channel stakeholders, translating retail signals into clear recommendations to support vivo’s user- and retail-partner-focused go-to-market approach.

- Supported cross-team execution with retail partners by aligning analysis with in-store experience and partner programs-consistent with vivo’s broader strategy of working closely with major retail chains to enhance customer experience and value.

- Contributed to data-driven commercial decisions in a market where multi-channel retail is increasingly important, helping ensure insights were relevant across offline chains and broader omni-channel conditions.

Sale Specialist/ Manager at FAHASA

Sale Specialist/ Manager at FAHASA, Mar 2022 - Mar 2023

Optimized store inventory and visual merchandising strategies, ensuring 100% product availability during high-traffic periods.

Analyzed consumer trends to adjust product displays, directly driving a sustained 20% YoY revenue growth for the branch.

Other Contract/Freelance Roles

Finance Research Expert at AfterQuery

MOVE Fellow Expert at Handshake

Project Lead Quality at Telus Digital

GenAI Associate - Red Teaming at Innodata

Quality Analyst at Invisible Technologies

Expert Financial Advisory Cases - Reviewer at Vetto AI

Finance Research Expert at AfterQuery

I design finance-focused questions to evaluate and stress-test AI reasoning models using real, verifiable data from U.S.-listed companies’ public disclosures (e.g., revenue, net income, and details from filings such as Form 10-K/10-Q and related financial statements). Beyond writing the question itself, I also draft a clear reasoning framework so the expected solution is logically sound and traceable to the source data, while still being challenging enough to reveal common model errors in accounting, valuation, market interpretation, and economics.

A key difficulty of the project is that the questions can’t be “generic” or easily solvable with surface-level finance knowledge. I often need to engineer subtle traps-such as timing mismatches (TTM vs. annual), segment vs. consolidated numbers, non-recurring items, share count definitions (basic vs. diluted), or classification differences (operating vs. non-operating)-to intentionally push models toward incorrect conclusions. At the same time, the question must remain fair and defensible, with inputs that come from real filings and a reasoning path that can be audited.

Another major challenge is balancing multiple models at once: I’m effectively testing around 10 finance reasoning models simultaneously, and I must ensure the question is difficult enough that at least 5 out of 10 models answer incorrectly. That means iterating on difficulty, wording, and numeric setup until the question reliably differentiates strong vs. weak reasoning-without becoming ambiguous or misleading. After submission through the platform, my work goes through quality review, and I respond to reviewer feedback by making precise revisions (tightening assumptions, clarifying constraints, or improving the reasoning steps) so the final approved questions meet both accuracy and evaluation objectives.

Below are images of some in-depth finance analyses that I worked on during the project:

MOVE Fellow Expert at Handshake

As a MOVE Fellow Expert at Handshake AI in the Business & Finance domain, I validate and improve AI model performance by designing and reviewing finance evaluation content that reflects real-world concepts (markets, accounting, valuation, and economics). I build high-quality questions and accompanying reasoning using verifiable figures from U.S.-listed companies’ public filings (e.g., 10-K/10-Q and financial statements), then test how multiple finance reasoning models respond. A core part of my role is intentionally crafting challenging setups that expose common model mistakes-while keeping the question fair, auditable, and grounded in source data-often aiming for consistent failure patterns across several models.

I submit work through the platform, address reviewer feedback, and make targeted revisions to ensure the final deliverables meet strict quality and evaluation standards.

Currently Project: Project Phoenix

Project description:

Created original graduate-level mathematical proof problems designed to expose reasoning weaknesses in leading AI models. Wrote complete ground-truth proofs, evaluated model responses, and documented failure cases for tasks that passed review.

Outcome of the project:

Produced high-quality benchmark tasks that identified genuine reasoning errors in AI systems and contributed reviewed proof-based data for AI evaluation and improvement.

Project Lead Quality at Telus Digital

Led end-to-end quality management for AI training datasets (annotation + validation), coordinating a distributed reviewer/QA workflow aligned with TELUS Digital’s “AI Community” delivery model and enterprise data pipelines.

Managed and coached a remote QA team to enforce consistent labeling/validation standards, using structured reviewer feedback loops so contributors applied guidelines the same way across edge cases.

Implemented scalable QC controls commonly used in high-volume annotation programs-e.g., spot-checking, sampling-based audits, and escalation paths for ambiguous items-supporting accuracy at both the instance level and dataset level.

Established calibration and alignment practices (example sets, edge-case clarifications, disagreement resolution) to improve inter-reviewer consistency and reduce rework, while monitoring productivity signals that can indicate drift or fatigue.

GenAI Associate - Red Teaming at Innodata

Executed human-led adversarial red teaming to surface LLM weaknesses (e.g., unsafe behavior, hallucination patterns, bias/toxicity, security-style failure modes such as prompt injection), aligned with Innodata’s focus on LLM trust & safety, benchmarking, and vulnerability evaluation.

Built and iterated challenge prompt suites / scenarios mapped to risk categories, then recorded reproducible failure cases and severity notes to help teams prioritize mitigations and regression testing.

Collaborated cross-functionally with R&D / ML stakeholders to communicate high-impact logic and safety gaps and support remediation workflows (tight feedback loops from discovery -> triage -> fix validation).

Supported evaluation dataset creation and annotation pipelines by converting discovered vulnerabilities into training-ready artifacts (prompt–response examples, labels, edge-case sets), consistent with Innodata’s broader generative AI services (model testing, red teaming, fine-tuning/RLHF-oriented workflows).

Quality Analyst at Invisible Technologies

Developed 100+ Master’s/PhD-level math prompts (and validation notes) to train and evaluate state-of-the-art LLMs for Tier-1 tech partners, ensuring each item was unambiguous, appropriately difficult, and solvable with a defensible “gold” approach.

As an Expert AI Data Trainer (Mathematics), generated domain questions that genuinely require mathematical expertise, then queried models, iterated prompt wording, and refined test design to remove loopholes and improve coverage depth/breadth.

Evaluated model answers for correctness and reasoning quality, identifying common failure modes (e.g., incorrect assumptions, invalid steps, fragile algebra/calculus logic) and documenting reproducible error cases to inform model improvement.

Performed QA and data validation on training/evaluation outputs (spot checks, guideline compliance checks, consistency audits), helping maintain production-grade quality across iterative dataset updates.

Expert Financial Advisory Cases - Reviewer at Vetto AI

Reviewed complex, real-world client scenarios in investment advisory and financial planning created by other annotators. Assessed case quality, realism, completeness, and consistency of financial reasoning, with a focus on key client information, constraints, risks, and how recommendations evolved across multiple client interactions. Provided structured feedback to improve scenario quality and support the development of AI systems capable of understanding nuanced financial decision-making beyond one-size-fits-all solutions.

Skills

Investment Advisory
Financial Planning
Financial Analysis
Risk Assessment
Quality Review

Client & Company Experience!

MOVE Fellow Expert

LLM Agentic - Associate Researcher

Finance Research Expert

Founder/ Corporate Finance Analyst

Quality Analyst - Internal QA Team

Project Lead Quality

Quality Analyst

GenAI Associate - Red Teaming

V-Star Internship

Career History

Founder/ Corporate Finance Analyst at Finstock, Inc, Sep 2023 - Present

Associate Researcher/ Calibrator at Turing, Oct 2025 - Mar 2026

V-Star Internship at VIVO, Sep 2024 - Dec 2024

Sale Specialist/ Manager at FAHASA, Mar 2022 - Mar 2023

Other Contract/Freelance Roles

KHIEM PHAM