Skip to content

Project 2

Project 2 integrates file I/O, string parsing, validation, and traversal into one cohesive system. You will build a simplified HTML parser that extracts tags and links, verify structural correctness using a proper stack discipline (not simple counting), and implement a DFS or BFS crawler that counts unique reachable pages while avoiding duplicates and missing files.

The most important advice is to

  • design your data structures before coding,
  • store parsed results by filename so you never reparse unnecessarily,
  • separate parsing from balance checking,
  • track visited pages during crawling to prevent infinite recursion,
  • and thoroughly test edge cases (especially malformed nesting and broken links) with your own HTML files rather than relying only on the provided examples.

Overview

Data Structure Design

Your parser must store data so that:

  • isBalanced() does not reparse the file

  • visitPageAmount() can access links efficiently

  • Files are not reparsed unnecessarily

Final Implementation Checklist

Read file character-by-character

Extract tags correctly

Handle <a href="...">...</a> carefully

Store parsed data by filename

Implement stack-based balance check

Implement DFS or BFS for crawling

Avoid double parsing

Avoid double counting

Handle missing files correctly

Create additional test HTML files