사용자 시나리오를 통한 워크플로우 파이프라인 User Scenario workflow

포트폴리오 Portfolio/AI Project

사용자 시나리오를 통한 워크플로우 파이프라인 User Scenario workflow

sailorCat 2025. 6. 23. 21:35

728x90

사용자 시나리오

치수를 알고 싶은 물체 사진을 업로드 하고 한쪽 면에 대해 사이즈를 입력하면 전체 비율을 계산해서 정확한 치수를 높이, 너비, 길이, 지름 등등을 알려준다.

사용자가 반드시 하나의 치수를 입력해야 하고, 어떤 곳의 치수를 입력할 지는 고를 수 있다.

업로드한 사진의 물체를 분석해서 길이를 잴 수 있는 곳을 형광색 선으로 표시해서 선택할 수 있게 한다.

사용자가 그 중 하나를 클릭해서 치수를 입력하면 상대적 비율을 계산해서 전체 치수를 알려준다.

사진을 처음에 업로드 할 때 전체 치수의 상대비율을 알 수 없으면 다시 사진 업로드를 받는다.

1. Front-end Flow (User Scenario)

Image Upload • User drops or selects a photo of the object.
Auto-segmentation & Edge Proposal • System segments out the object silhouette and runs line/edge detection (e.g. Canny + Hough) to find all candidate “measurable” segments. • Highlight each candidate in translucent neon (e.g. fluorescent green).
User Picks One Reference Edge • User clicks the edge they actually measured with a ruler/caliper. • An inline form pops up asking: “Enter the real-world length of this segment (e.g. 85 mm)”.
Scale Calibration • Compute mm_per_pixel = user_length / pixel_length(ref_edge).
Dimension Extraction • Compute 2D bounding‐box or oriented bounding box on the mask ⇒ gives you pixel extents along its two principal axes. • Multiply by mm_per_pixel ⇒ returns two real‐world dims (e.g. width and height of that face). • Shape‐specific extras: – Rectangular box ⇒ # of faces = 3; we only got 2 from this view ⇒ need second view or second measurement to resolve depth. – Cylinder ⇒ silhouette minor axis = diameter (can compute directly).
Check Completeness • If all requested dims (width, height, depth, diameter…) are resolvable from this view + known shape ⇒ display them. • Otherwise prompt: “We still need a measurement of the 3rd axis (depth/length). Please upload a second photo showing that axis, or pick a different reference edge from your current photo.”
Final Report • Show a table: {width: xx mm, height: yy mm, depth: zz mm, diameter: dd mm} • Overlay result on the image/mesh for visual confirmation.

2. Back-end Pipeline

text

┌──────────────────┐
│ 1. Pre-processing│
├──────────────────┤
│ • Resize / normalize image              │
│ • Mask R-CNN / U^2-Net ⇒ object mask    │
│ • Perspective correction (vanishing-pt) │
│ • Canny → HoughLines → cluster by θ     │
│   → candidate edges                     │
└──────────────────┘
           ↓
┌──────────────────┐
│ 2. Edge Proposal │
├──────────────────┤
│ • Filter for longest / most “bold”      │
│ • Store: (p1,p2,pixel_length,θ,face_id) │
│ • Return segments to front-end UI       │
└──────────────────┘
           ↓
┌──────────────────┐
│ 3. User Input    │
├──────────────────┤
│ • ref_edge_id,   │
│ • real_length_mm │
└──────────────────┘
           ↓
┌──────────────────┐
│ 4. Scale Calib   │
├──────────────────┤
│ mm_per_px = real_length_mm / pixel_length(ref_edge) │
└──────────────────┘
           ↓
┌──────────────────┐
│ 5. Dim. Extract  │
├──────────────────┤
│ • Oriented BBox on mask ⇒ (w_px, h_px) │
│ • w_mm = w_px * mm_per_px             │
│ • h_mm = h_px * mm_per_px             │
│                                      │
│ • If shape=="cylinder":               │
│     diameter_mm = minor_axis_px * mm_per_px   │
└──────────────────┘
           ↓
┌──────────────────┐
│ 6. Completeness  │
├──────────────────┤
│ • If shape=="box" and we need depth:  │
│     depth unresolved ⇒ request 2nd view  │
│ • Else compile final dims             │
└──────────────────┘
           ↓
┌──────────────────┐
│ 7. Render & Resp │
├──────────────────┤
│ • Return JSON {w_mm, h_mm, d_mm?, dia_mm?} │
│ • Front-end overlays results on image     │
└──────────────────┘

3. Key Components & Why You Might Need a 2nd Photo

A single orthographic (fronto-parallel) shot of a box only gives you its two visible faces—third axis (depth) is foreshortened.
A cylinder you can get height + diameter from one view.
Perspective correction can help flatten but you still can’t conjure the hidden axis without another angle or a second measurement.

When you flip that “depth” flag, your UI should automatically re-prompt: “Depth not measured yet—please either measure a second edge on this photo OR upload a new image showing the object from the side.”

4. Next Steps

Prototype the edge-detection + UI overlay so users can click on any proposed line.
Wire up the scale calibration and oriented-bbox measurement.
Build the “need second view” logic—simple flag in your dimension‐output step.
Polish UX:
- show dynamic “mm_per_px” as they type
- allow switching reference edge if measure was imprecise
- cache previous uploads so second‐view is seamless

여기서 이미 메타데이터에 존재하는 오브젝트는 치수를 재고 비율을 계산하는 걸 거칠 필요 없이 바로 치수를 알려줄 수 있다.

Object Recognition & Retrieval • 사용자 업로드 이미지를 CNN(또는 이미지 임베딩 + FAISS) 기반으로 인덱스된 데이터셋에 매칭 • 예: “mug_001” → YCB 세트의 머그컵, “tube_30x50” → 30×50 mm 사각 튜브 등 • 매칭 신뢰도(confidence) 체크(예: > 0.9면 자동 처리)
Scale Check (선택) • 데이터셋 이미지는 보정된 픽셀→mm 정보를 갖고 있으므로, 보통 생략 가능 • 다만 실제 촬영 환경마다 해상도가 달라지므로, 필요하면 사용자에게 한 축 길이(예: 머그 컵 높이)를 입력받아 배율을 재보정
Dimension Lookup • 매칭된 객체의 JSON/CSV 메타데이터에서 width/depth/height, 지름(diameter) 등 필요한 모든 치수 항목을 읽어옴 • 예: { width:85, depth:80, height:95 }
사용자 피드백 • “이 물체가 mug_001(머그컵)이 맞습니까? → 예/아니오” • 아니오 선택 시, 일반 에지 기반 측정 플로우로 폴백(fallback)
최종 결과 리턴 • 바로 { width: 85 mm, depth: 80 mm, height: 95 mm } 형태로 UI에 표시 • 이미지 위에 주요 축에 해당하는 치수를 오버레이

딥러닝 기반 Depth 보조
- MiDaS, DPT 같은 단일 이미지 Depth 추정 네트워크를 얹고, 스케일 보정을 위해 사용자가 입력한 참조 길이(혹은 ArUco 마커)로 전체 깊이 맵을 Real-World 스케일로 변환
- 그렇게 얻은 3D 포인트클라우드를 통해 눈에 보이지 않는 면의 깊이도 어느 정도 예측 가능
물체 카테고리별 Shape Prior 활용
- 컵, 책상, 의자 등 범용 카테고리별로 “기본 형태(프리미티브, CAD 템플릿)”를 미리 정의
- 사진 위 실루엣이나 엣지에 프리미티브를 매칭(fitting)해서 스케일과 평행이동, 회전 파라미터를 최적화
- 이런 방식으로 보이지 않는 면까지 그 “템플릿”의 실제 치수를 불러올 수 있음
Photometric Stereo / Shape-from-Shading
- 하나의 이미지라도 간단히 플래시 온·오프 두 장 정도만 찍으면, 조명 변화에 따른 밝기 차이로 물체 표면 노멀 벡터(곡률) 추정
- 노멀 지도에서 기하학적 형태를 재구성해 엣지가 없는 곡면 물체(구, 실린더 등)의 치수를 예측
EXIF·메타데이터 활용
- 사진 EXIF에서 화각(FOV), 초점 거리, 센서 크기 정보를 꺼내면, 픽셀→실제 거리 환산에 필요한 내부 파라미터 일부를 자동으로 유추 가능
- 사용자에게 “스마트폰 기종”이나 “초점 거리(mm)” 정도만 물어봐도 캘리브레이션 부담을 줄일 수 있음
사용자 인터랙션 강화
- “2점 클릭” 대신 “면 클릭 + 드래그” UI: 사용자가 대략적인 면(사각형/원)을 드래그하면, 그 영역 크기를 픽셀 단위로 알아서 측정
- 측정 선분 후보를 자동으로 군집화(cluster) → 대표성 높은 3~5개만 추려서 보여주기 → 사용자가 선택
멀티 뷰 영상 보조
- 사진 한 장으론 충분치 않으니, 간단히 사용자에게 3초짜리 짧은 비디오(360° 회전) 촬영을 권유
- SLAM(ORB-SLAM 등)으로 카메라 트랙 추정 → 비디오 프레임에서 물체 지점들까지의 거리 정보로 정확한 3D 포인트 클라우드 재구성
문맥·사전 지식 연계
- 물체에 찍힌 브랜드 로고나 텍스트를 OCR로 읽고, 해당 모델명·스펙 정보를 크롤링해서 치수를 메타데이터로 바로 매핑
- 예: “IKEA” 로고 → “IKEA POÄNG” → 웹에서 사이즈 스펙 자동 조회
품질 평가 & 재촬영 권유
- 자동으로 “측정 신뢰도”(photometric condition, 엣지 밀도, 원근 왜곡량 등)를 계산해, 특정 임계치 이하일 땐 “다시 찍어주세요” 안내
- 촬영 가이드 오버레이(화면 구석에 마커 붙이기, 일정 거리 유지하기) 제공
모바일 SDK/ARKit 연동
- iOS ARKit, Android ARCore Measure API를 백엔드로 호출해서, 단순한 물체 치수 측정은 기기 내장 기능으로 처리
- 앱 내에서 “더 정밀한 측정이 필요하면 서버 파이프라인”으로 폴백
최종 설정 자동화 파라미터

파이프라인마다 중요한 파라미터(e.g. Canny threshold, Hough minimum line length)들을 AutoML로 튜닝하거나, 사용자 기종·환경별 프리셋 제공

가장 실현 속도가 빠른 건 EXIF 활용 + ArUco 마커 스케일, • 정확도를 더 끌어올리고 싶다면 Shape Prior + Depth Estimation, • 사용자 경험을 챙기려면 모바일 ARKit/ARCore 연동과 품질 피드백 기능

728x90