Skip to main content
  • Client & Company Experience!

MOVE Fellow Expert

Nov 2025 - Present, Freelance

LLM Agentic - Researcher

Oct 2025 - Present, Contract/ Full-time

Finance Research Expert

Nov 2025 - Dec 2025, Freelance (Project-based)


Founder

Sep 2023 - Present

Quality Analyst

Jan 2025 - Jun 2026, Contract/ Freelance

Project Lead Quality

Aug 2025 - Oct 2025, Contract/ Full-time

V-Star Internship

Sep 2024 - Nov 2024, Internship

GenAI Associate - Red Teaming

Jun 2025 - Aug 2025, Contract/ Full-time

More Details

More details on my responsibilities and achievements at these companies.

Finance Research Expert at AfterQuery
MOVE Fellow Expert at Handshake
LLM Agentic - Researcher at Turing
Founder/ Corporate Finance Analyst at Finstock, Inc.
Project Lead Quality at Telus Digital
GenAI Associate - Red Teaming at Innodata
Quality Analyst at Invisible Technologies
V-Star Internship at VIVO
Finance Research Expert at AfterQuery

I design finance-focused questions to evaluate and stress-test AI reasoning models using real, verifiable data from U.S.-listed companies’ public disclosures (e.g., revenue, net income, and details from filings such as Form 10-K/10-Q and related financial statements). Beyond writing the question itself, I also draft a clear reasoning framework so the expected solution is logically sound and traceable to the source data, while still being challenging enough to reveal common model errors in accounting, valuation, market interpretation, and economics.


A key difficulty of the project is that the questions can’t be “generic” or easily solvable with surface-level finance knowledge. I often need to engineer subtle traps-such as timing mismatches (TTM vs. annual), segment vs. consolidated numbers, non-recurring items, share count definitions (basic vs. diluted), or classification differences (operating vs. non-operating)-to intentionally push models toward incorrect conclusions. At the same time, the question must remain fair and defensible, with inputs that come from real filings and a reasoning path that can be audited.


Another major challenge is balancing multiple models at once: I’m effectively testing around 10 finance reasoning models simultaneously, and I must ensure the question is difficult enough that at least 5 out of 10 models answer incorrectly. That means iterating on difficulty, wording, and numeric setup until the question reliably differentiates strong vs. weak reasoning-without becoming ambiguous or misleading. After submission through the platform, my work goes through quality review, and I respond to reviewer feedback by making precise revisions (tightening assumptions, clarifying constraints, or improving the reasoning steps) so the final approved questions meet both accuracy and evaluation objectives.


Below are images of some in-depth finance analyses that I worked on during the project:

MOVE Fellow Expert at Handshake

As a MOVE Fellow Expert at Handshake AI in the Business & Finance domain, I validate and improve AI model performance by designing and reviewing finance evaluation content that reflects real-world concepts (markets, accounting, valuation, and economics). I build high-quality questions and accompanying reasoning using verifiable figures from U.S.-listed companies’ public filings (e.g., 10-K/10-Q and financial statements), then test how multiple finance reasoning models respond. A core part of my role is intentionally crafting challenging setups that expose common model mistakes-while keeping the question fair, auditable, and grounded in source data-often aiming for consistent failure patterns across several models.

 

I submit work through the platform, address reviewer feedback, and make targeted revisions to ensure the final deliverables meet strict quality and evaluation standards.

LLM Agentic - Researcher at Turing

​Designed and iterated complex prompt suites (single-turn + multi-turn) to benchmark LLM reasoning, focusing on correctness, instruction-following, hallucination risk, and reasoning structure using rubric-driven evaluations.


Performed pairwise comparisons / preference rankings across multiple model completions, documenting failure modes (logic errors, unsupported claims, weak reasoning steps) to generate high-signal evaluation/training data that supports preference/reward modeling workflows. 


Owned the end-to-end data production cycle-drafting prompts, revising for clarity, and executing quality checks-to ensure consistent, high-integrity outputs aligned with strict evaluation guidelines.

Founder/ Corporate Finance Analyst at Finstock, Inc.

Founded and served as the initial architect of Finstock’s early-stage vision and objectives; managed core financial flows and corporate finance activities during the company’s formative phase.

Built and supported a finance + trading support offering that provides investment advisory services, in-depth financial analysis, and customized strategies, with primary focus on U.S. equities and additional support for forex and crypto markets.

Developed analytical frameworks that combine fundamental analysis, technical analysis, and quantitative modeling (including macro/micro factors) to generate data-driven insights and strategy recommendations.

Led integration of AI-driven analytics into investment tools: real-time data processing, predictive modeling, automation of routine analysis, NLP-based extraction from unstructured sources (news/financial reports/sentiment), and enhanced risk assessment workflows.

Contributed to designing and maintaining a multi-product service suite, including TradingView indicators, MT5-supported online investment/robo-advisor tools, risk modeling (beta), and periodic macro/finance reporting products. 

Project Lead Quality at Telus Digital

Led end-to-end quality management for AI training datasets (annotation + validation), coordinating a distributed reviewer/QA workflow aligned with TELUS Digital’s “AI Community” delivery model and enterprise data pipelines. 

Managed and coached a remote QA team to enforce consistent labeling/validation standards, using structured reviewer feedback loops so contributors applied guidelines the same way across edge cases. 

Implemented scalable QC controls commonly used in high-volume annotation programs-e.g., spot-checking, sampling-based audits, and escalation paths for ambiguous items-supporting accuracy at both the instance level and dataset level. 

Established calibration and alignment practices (example sets, edge-case clarifications, disagreement resolution) to improve inter-reviewer consistency and reduce rework, while monitoring productivity signals that can indicate drift or fatigue.

GenAI Associate - Red Teaming at Innodata

Executed human-led adversarial red teaming to surface LLM weaknesses (e.g., unsafe behavior, hallucination patterns, bias/toxicity, security-style failure modes such as prompt injection), aligned with Innodata’s focus on LLM trust & safety, benchmarking, and vulnerability evaluation. 

Built and iterated challenge prompt suites / scenarios mapped to risk categories, then recorded reproducible failure cases and severity notes to help teams prioritize mitigations and regression testing. 
Collaborated cross-functionally with R&D / ML stakeholders to communicate high-impact logic and safety gaps and support remediation workflows (tight feedback loops from discovery -> triage -> fix validation). 

Supported evaluation dataset creation and annotation pipelines by converting discovered vulnerabilities into training-ready artifacts (prompt–response examples, labels, edge-case sets), consistent with Innodata’s broader generative AI services (model testing, red teaming, fine-tuning/RLHF-oriented workflows).

Quality Analyst at Invisible Technologies

Developed 100+ Master’s/PhD-level math prompts (and validation notes) to train and evaluate state-of-the-art LLMs for Tier-1 tech partners, ensuring each item was unambiguous, appropriately difficult, and solvable with a defensible “gold” approach. 

As an Expert AI Data Trainer (Mathematics), generated domain questions that genuinely require mathematical expertise, then queried models, iterated prompt wording, and refined test design to remove loopholes and improve coverage depth/breadth. 

Evaluated model answers for correctness and reasoning quality, identifying common failure modes (e.g., incorrect assumptions, invalid steps, fragile algebra/calculus logic) and documenting reproducible error cases to inform model improvement. 

Performed QA and data validation on training/evaluation outputs (spot checks, guideline compliance checks, consistency audits), helping maintain production-grade quality across iterative dataset updates.

V-Star Internship at VIVO

Analyzed sell-through and sales performance across 10+ key retail chains, using structured tracking to identify gaps by channel/SKU/period, and delivered actionable insights that supported the regional team in improving sell-through by 12%. 

Built and maintained recurring performance summaries for channel stakeholders, translating retail signals into clear recommendations to support vivo’s user- and retail-partner-focused go-to-market approach. 

Supported cross-team execution with retail partners by aligning analysis with in-store experience and partner programs-consistent with vivo’s broader strategy of working closely with major retail chains to enhance customer experience and value.

Contributed to data-driven commercial decisions in a market where multi-channel retail is increasingly important, helping ensure insights were relevant across offline chains and broader omni-channel conditions.