Zaka & BLACKBOX Technical Challenge

BLACKBOX AI Junior ML Engineer Challenge

Test your skills in developer tools, customer support AI, anomaly detection, and trendy ML applications. Complete one project in 24 hours to earn an internship opportunity!

Challenge Overview

Welcome to the BLACKBOX AI 24-Hour ML Challenge! This is your chance to demonstrate your machine learning skills in areas like dev tools, customer support automation, credit card anomaly detection, and other developer-related or trendy topics (e.g., AI for code generation or GitHub workflows). These projects are designed to be completable in 24 hours, using small datasets and pre-trained models to focus on your problem-solving and implementation abilities.

Why this challenge? We're looking for junior ML engineers who can quickly prototype, learn from data, and apply ML to practical scenarios. Successful candidates will earn a paid internship with us.

Instructions

  • Time Limit: You have 24 hours from when you receive this challenge to submit. Start timing yourself upon reading. Deadline Sunday 6pm

  • Choose One Project: Select from the 10 options below. Focus on building a minimum viable prototype. (Bonus if you are able to complete more than 1)

  • Tech Stack: Use Python with libraries like Hugging Face Transformers, scikit-learn, PyTorch/TensorFlow, Pandas, etc. Free tools like Google Colab (with GPU) are recommended.

  • Submission: Create a public GitHub repo with: your code (e.g., Jupyter notebook or scripts), a README.md explaining your approach, challenges faced, decisions made, results/metrics, and how it relates to BLACKBOX AI's work. Email the repo link to [email protected] with subject "ML Challenge Submission - [Your Name]".

  • Evaluation Criteria: Code quality (clean, reproducible), ML understanding (correct use of models/metrics), creativity/problem-solving, documentation, and relevance to themes. Bonus for optimizations, visualizations, or ethical considerations (e.g., bias mitigation).

  • Tips: Use subsets of data to save time. Prioritize core functionality over perfection. If stuck, document why and your alternatives. No external help; this is individual.

Project Options

Each project includes an overview, objectives, requirements, steps, deliverables, skills assessed, tips, and time estimate.

1. Dev Tool: Code Comment Generator

Overview: Build an AI tool that automatically generates descriptive comments for code snippets, aiding developers in maintaining readable codebases. This is trendy in AI-assisted programming, similar to tools like GitHub Copilot.

Objectives: Fine-tune a model to take code as input and output relevant comments. Achieve reasonable generation quality on a small test set.

Requirements: Python, Hugging Face Transformers; subset (~1,000 samples) of CodeSearchNet dataset (public on GitHub/Hugging Face).

Steps to Complete:

  1. Download and preprocess data (pair code with existing comments).

  2. Fine-tune a small LLM like CodeT5 or GPT-2 (use Colab GPU).

  3. Generate comments for 10-20 test snippets.

  4. Evaluate with metrics like BLEU and manual review.

Deliverables: Notebook/script with model code, sample inputs/outputs, metrics, README.

Skills Assessed: LLM fine-tuning for dev tasks, text generation, quick prototyping.

Tips for Success: Start with a pre-trained model to minimize training time. Focus on Python code snippets for simplicity. If generation is noisy, experiment with temperature settings.

Time Estimate: 6h data prep, 10h fine-tuning, 8h eval/docs.

2. Customer Support: Ticket Sentiment Classifier

Overview: Create a classifier to analyze sentiment in customer support tickets, helping prioritize urgent issues in dev support systems.

Objectives: Classify tickets as positive, negative, or urgent with high accuracy on a test set.

Requirements: Hugging Face; subset (~5,000 samples) of support ticket data (e.g., from Kaggle's customer support datasets).

Steps to Complete:

  1. Perform EDA (e.g., word clouds).

  2. Fine-tune DistilBERT.

  3. Evaluate with accuracy, F1-score, and confusion matrix.

Deliverables: Code, EDA visuals, model predictions, README.

Skills Assessed: NLP for support, evaluation metrics.

Tips for Success: Balance classes if needed. Use pre-trained models to speed up.

Time Estimate: 6h EDA/data prep, 10h modeling, 8h report.

3. Dev Tool: Bug Detection in Code Snippets

Overview: Develop a tool to detect bugs in code snippets, useful for automated code reviews in dev pipelines.

Objectives: Classify snippets as buggy or clean with good precision.

Requirements: Hugging Face CodeBERT; subset (~2,000 samples) of BugsInPy dataset.

Steps to Complete:

  1. Preprocess code data.

  2. Fine-tune model.

  3. Test on holdout set with metrics.

Deliverables: Script, sample detections, metrics, README.

Skills Assessed: ML for code analysis.

Tips for Success: Focus on common bug types like syntax errors.

Time Estimate: 5h data loading, 11h fine-tuning, 8h evaluation.

4. Customer Support: FAQ Chatbot

Overview: Build a chatbot for answering developer FAQs, like common API or coding issues.

Objectives: Handle 3-5 intents accurately.

Requirements: RASA or scikit-learn; ~500 StackOverflow FAQ samples.

Steps to Complete:

  1. Define intents.

  2. Train classifier.

  3. Test with sample conversations.

Deliverables: Chatbot script, test logs, README.

Skills Assessed: Conversational AI.

Tips for Success: Use rule-based fallbacks.

Time Estimate: 5h setup, 11h building, 8h testing.

5. Credit Card Anomaly Detection

Overview: Detect fraudulent transactions in credit card data, a trendy application in fintech security.

Objectives: Flag anomalies with high ROC-AUC.

Requirements: scikit-learn; ~5,000 transaction subset from Kaggle.

Steps to Complete:

  1. Preprocess and balance data.

  2. Train Isolation Forest.

  3. Evaluate.

Deliverables: Code, ROC curve, README.

Skills Assessed: Unsupervised ML for anomalies.

Tips for Success: Visualize anomalies.

Time Estimate: 5h prep, 11h modeling, 8h analysis.

Overview: Auto-tag StackOverflow questions by programming language, enhancing dev community tools.

Objectives: Accurate tagging for 3-5 languages.

Requirements: Hugging Face; ~5,000 question subset from Kaggle.

Steps to Complete:

  1. Prep data.

  2. Fine-tune DistilBERT.

  3. Evaluate F1-score.

Deliverables: Model code, tagged examples, README.

Skills Assessed: NLP for dev forums.

Tips for Success: Use multi-label if needed.

Time Estimate: 6h prep, 10h fine-tuning, 8h docs.

7. Dev Tool: GitHub Issue Prioritizer

Overview: Prioritize GitHub issues by severity, streamlining dev project management.

Objectives: Classify issues as low/medium/high.

Requirements: scikit-learn or Hugging Face; ~3,000 issues from Kaggle.

Steps to Complete:

  1. Extract features.

  2. Train model.

  3. Test accuracy.

Deliverables: Script, priority examples, README.

Skills Assessed: Text classification for workflows.

Tips for Success: Incorporate issue length as a feature.

Time Estimate: 6h exploration, 10h implementation, 8h eval.

8. Trendy: Fine-Tune LLM for Code Review Suggestions

Overview: Fine-tune an LLM to provide code review feedback, a hot trend in AI dev assistants.

Objectives: Generate useful suggestions for code pairs.

Requirements: Hugging Face (Phi-2); ~1,000 review pairs from GitHub.

Steps to Complete:

  1. Prep data.

  2. Fine-tune.

  3. Generate and review samples.

Deliverables: Code, sample reviews, README.

Skills Assessed: Generative AI trends, LLMs for dev tasks.

Tips for Success: Use low epochs to fit time. Evaluate manually for quality.

Time Estimate: 6h setup, 10h fine-tuning, 8h generation/eval.

9. Customer Support: Escalation Predictor for Support Tickets

Overview: Predict if a support ticket will escalate, optimizing customer service workflows.

Objectives: Predict escalation (yes/no) with good precision.

Requirements: XGBoost or scikit-learn; ~4,000 ticket logs from Kaggle.

Steps to Complete:

  1. EDA and feature engineering.

  2. Train model.

  3. Evaluate with metrics like precision/recall.

Deliverables: Code, feature importance plot, README.

Skills Assessed: Predictive modeling for support.

Tips for Success: Focus on text features like keywords.

Time Estimate: 6h EDA/feature eng, 10h modeling, 8h reporting.

10. Dev Tool: API Usage Recommender

Overview: Recommend related APIs based on usage logs, aiding developers in discovering libraries.

Objectives: Suggest top-3 related APIs with decent hit rate.

Requirements: scikit-surprise; ~2,000 API usage logs from GitHub/Kaggle.

Steps to Complete:

  1. Prep data as user-item matrix.

  2. Build recommender.

  3. Test recommendations.

Deliverables: Script, recommendation examples, README.

Skills Assessed: Recommendation systems for dev productivity.

Tips for Success: Use collaborative filtering for simplicity.

Time Estimate: 6h data prep, 10h modeling, 8h eval/docs.

Final Notes

Good luck! If you have questions about the challenge (not project help), email [email protected]. Remember, this is about showing your potential under time constraints.

Last updated

Was this helpful?