InterviewCoach
Local interview coach with live feedback.
Technical interviews are noisy, fast, and cognitively expensive. A candidate has to listen carefully, identify the actual question, choose the right structure, remember the important concepts, and answer naturally, all in a few seconds.
Interview Coach is built for that exact moment. It listens to the interview audio, extracts the interviewer’s Data Science, Machine Learning, AI, or System Design question, and shows a compact coaching card with important pointers. The goal is not to generate an answer for the candidate or handhold them through the interview. The goal is to provide timely reminders so the candidate can cover the important parts of their own answer.
The app also helps after the interview. It extracts question-answer pairs from noisy transcripts, stores them in SQLite, and evaluates the candidate’s answers with structured feedback. That makes it useful both as a live coaching tool and as a practice-review system.
The full setup is designed to run locally on a Mac. This keeps latency low, avoids sending interview audio to a cloud service, and makes the experience feel private and responsive.
Project links:
At a high level, Interview Coach has two paths: a fast live coaching path and a slower post-session evaluation path.
Live audio
-> Speech-to-text
-> Fast question boundary detection
-> Fine-tuned topic/pattern classifier
-> Coaching hint generator
-> Floating coaching cards
Full transcript
-> Q&A extraction
-> SQLite persistence
-> Evaluation agent
-> Feedback table / CSV export
The live path is optimized for speed. It tries to show the coaching card immediately after the interviewer asks the question, before the candidate has already started answering. The post-session path is optimized for accuracy and reflection, so it can spend more time cleaning up the transcript and evaluating the answer.
The project uses a multi-model approach. Instead of forcing one model to handle every task, each model has a focused responsibility.
| Model | Approx Size | Purpose |
|---|---|---|
mlx-community/whisper-large-v3-turbo |
~809M parameters | Higher-quality local MLX transcription for live audio |
build-small-hackathon/interview-coach-3b |
3B base model plus LoRA adapter | Fine-tuned topic and pattern detection |
Qwen/Qwen2.5-3B-Instruct |
~3B parameters | General reasoning, coaching hint generation, transcript cleanup, and evaluation |
| SQLite | Local database | Session, exchange, and evaluation storage |
This split is important. The fine-tuned model is used for what it is good at: quickly identifying the type and pattern of the question. The general instruction model is used when more flexible reasoning is needed, such as generating hints or extracting structured Q&A from a noisy transcript.
The fine-tuned model was trained to map interview questions to a type and a set of coarse answering steps. For example:
{
"prompt": "How would you handle class imbalance in a fraud detection model?",
"completion": "Type: Data Science\nSteps: Check base rate -> Choose right metric -> Handle imbalance -> Tune threshold -> Evaluate"
}
This helped in two ways.
First, the coaching system became more reliable at deciding whether a transcript segment was relevant. Greetings, logistics, and generic discussion should not trigger a coaching card. The fine-tuned model helps classify only meaningful DS, ML, AI, or System Design questions.
Second, it gave the live system a fast topic/pattern signal. The app does not need to ask a larger model to deeply reason about the question before showing anything. It can quickly classify the question and then use that classification as context for generating better hints.
The final design uses the fine-tuned 3B model for type and pattern detection, not for the final coaching bullets. The displayed hints are generated separately so they can be more specific and useful.
The most critical product challenge was timing.
If the coaching card appears after the candidate has already started answering, it is too late. The card has to appear right after the interviewer asks the question and before the candidate has committed to an answer structure.
The first version used a general LLM to extract the clean question before showing the card. That was accurate, but too slow. The fix was to split the live path from the post-processing path.
For the live path, the app now uses:
The live extractor intentionally does not try to produce a perfect transcript. It aims to identify the current question quickly enough to help the candidate. Accuracy cleanup happens later in Process Text.
This tradeoff made the coaching card feel much faster and more useful during an actual interview.
Live transcripts are messy. A single transcript may contain:
One recurring issue was that the candidate’s answer was being included inside the question. Another issue was that the extracted answer was sometimes shortened too aggressively.
The solution was to separate extraction responsibilities:
This makes the session log more useful. It does not just store the shortest direct answer; it stores what the candidate actually said, lightly cleaned for transcription noise.
Another challenge was duplicate cards. Live transcription can produce slightly different versions of the same question:
How does linear regression work?
Can you briefly explain how linear regression works?
Can you explain how linear regression works?
Exact string matching was not enough. The app now creates a dedupe key from the important words in the question and compares similarity between cards. That prevents repeated STT variants from creating multiple coaching cards for the same question.
The app also filters out non-target topics. If the detected type is too generic or not relevant to DS, ML, AI, or System Design, the live card keeps listening instead of showing noise.
Evaluation had a subtle failure mode. The evaluator generated a benchmark answer so the candidate could learn, but the feedback sometimes looked like it was evaluating the benchmark rather than the candidate’s actual answer.
The fix was to make the evaluator contract explicit:
This made the evaluation more faithful to what the candidate actually said.
Another practical challenge was model startup.
The app uses multiple local models, and loading them lazily during the first live question made the product feel slow. The first question is often the most important moment in the demo, but that was exactly when model weights were being loaded into memory.
The fix was to move model warmup into application initialization. When the Gradio app opens, it starts loading the general instruction model, the fine-tuned topic/pattern model, and the speech-to-text model in the background. The UI shows a simple startup status at the top and changes to “All models loaded” when the app is ready.
This keeps the complexity away from the user. They do not need to know which model is loading or press a separate warmup button. They just wait for the ready message and then start the interview flow.
I also had to make coaching-card generation non-blocking. Earlier, when the card was being generated, live transcription could pause because the same flow was waiting on the LLM call. The current version lets transcription continue while the coaching card loads in the background. That makes the app feel much more natural during a live conversation.
Running locally and running on Hugging Face Spaces are not the same environment.
Locally, the app can use MLX Whisper on Apple Silicon and system audio routing. On Spaces, the browser microphone is the realistic input path, and model loading, caching, and hardware behavior are different.
The main Spaces challenges were:
requirements.txt.The app now separates local and Space runtime behavior where needed. Locally it can use MLX-based transcription and system audio. On Spaces it uses browser audio and avoids assumptions that only hold on a Mac. This made the demo more portable, even though the best real-time experience is still the local Mac setup.
The entire system can run locally on a Mac. That is a major part of the design.
Local execution gives three practical benefits:
The local setup uses Python, Gradio, SQLite, Hugging Face Transformers, PEFT, and MLX Whisper. For system audio capture on macOS, the app can use a local audio routing setup such as BlackHole.
The app is agentic because it is not a single prompt wrapped in a UI. It is a pipeline of specialized steps, each with a clear role and state handoff.
The main agents and nodes are:
This modular design made it easier to improve one part without breaking the rest. For example, the live coaching path could be optimized for speed while the post-session extractor stayed more careful and LLM-driven.
Interview Coach is designed around a simple idea: candidates do not need a model to answer for them, but they can benefit from timely reminders that help them structure their own thinking.
The project combines local speech-to-text, a fine-tuned 3B model, a general instruction model, SQLite persistence, and a Gradio interface into a practical interview practice system. The hardest part was not simply building a chatbot. It was making the coaching card appear at the right moment, extracting useful Q&A from noisy speech, and evaluating the candidate’s actual answer fairly.
The result is a local-first tool that helps during the interview and becomes a feedback system after the interview.
Local interview coach with live feedback.
More from this author