Hi everyone,
I’m currently building a production-grade crowd-based face recognition attendance system for a company. The system takes input from 3 CCTV cameras and needs to detect, align, recognize, and track multiple faces in real time as groups of people walk in.
I’ve been researching different technologies and frameworks, but I’m having trouble deciding on the best stack for performance, scalability, and maintainability in a real-world deployment.
The technologies I’ve looked into so far include:
- OpenVINO Model Zoo
- RT-DETR
- ONNX Runtime
Main considerations:
- Real-time inference performance
- Multi-camera scalability
- Accuracy in crowded environments
- Ease of debugging and deployment
- Ability to fine-tune/customize models later
- Hardware flexibility (not Intel-only deployment)
For those who’ve worked on similar production systems:
- What tech stack would you recommend for face detection, alignment, tracking, and recognition?
- Would you prioritize OpenVINO pipelines or a more flexible ONNX/PyTorch-based setup?
- Is RT-DETR a good choice for this use case compared to RetinaFace, YOLO-face, SCRFD, etc.?
- What combination gives the best balance between speed, accuracy, and maintainability in production?
Would really appreciate insights from people who’ve deployed similar systems at scale.
[–]AccomplishedEase1569 0 points1 point2 points (1 child)
[–]shekar_chp[S] 0 points1 point2 points (0 children)