Project 2
Project 2 integrates file I/O, string parsing, validation, and traversal into one cohesive system. You will build a simplified HTML parser that extracts tags and links, verify structural correctness using a proper stack discipline (not simple counting), and implement a DFS or BFS crawler that counts unique reachable pages while avoiding duplicates and missing files.
The most important advice is to
- design your data structures before coding,
- store parsed results by filename so you never reparse unnecessarily,
- separate parsing from balance checking,
- track visited pages during crawling to prevent infinite recursion,
- and thoroughly test edge cases (especially malformed nesting and broken links) with your own HTML files rather than relying only on the provided examples.
Overview
Data Structure Design
Your parser must store data so that:
isBalanced()does not reparse the filevisitPageAmount()can access links efficientlyFiles are not reparsed unnecessarily
Final Implementation Checklist
Read file character-by-character
Extract tags correctly
Handle <a href="...">...</a> carefully
Store parsed data by filename
Implement stack-based balance check
Implement DFS or BFS for crawling
Avoid double parsing
Avoid double counting
Handle missing files correctly
Create additional test HTML files