| # Chrome's URL library |
| |
| ## Layers |
| |
| There are several conceptual layers in this directory. Going from the lowest |
| level up, they are: |
| |
| ### Parsing |
| |
| The `url_parse.*` files are the parser. This code does no string |
| transformations. Its only job is to take an input string and splits out the |
| components of the URL as best as it can deduce them, for a given type of URL. |
| Parsing can never fail, it will take its best guess. This layer does not |
| have logic for determining the type of URL parsing to apply, that needs to |
| be applied at a higher layer (the "util" layer below). |
| |
| Because the parser code is derived (_very_ distantly) from some code in |
| Mozilla, some of the parser files are in `url/third_party/mozilla/`. |
| |
| The main header to include for calling the parser is |
| `url/third_party/mozilla/url_parse.h`. |
| |
| ### Canonicalization |
| |
| The `url_canon*` files are the canonicalizer. This code will transform specific |
| URL components or specific types of URLs into a standard form. For some |
| dangerous or invalid data, the canonicalizer will report that a URL is invalid, |
| although it will always try its best to produce output (so the calling code |
| can, for example, show the user an error that the URL is invalid). The |
| canonicalizer attempts to provide as consistent a representation as possible |
| without changing the meaning of a URL. |
| |
| The canonicalizer layer is designed to be independent of the string type of |
| the embedder, so all string output is done through a `CanonOutput` wrapper |
| object. An implementation for `std::string` output is provided in |
| `url_canon_stdstring.h`. |
| |
| The main header to include for calling the canonicalizer is |
| `url/url_canon.h`. |
| |
| ### Utility |
| |
| The `url_util*` files provide a higher-level wrapper around the parser and |
| canonicalizer. While it can be called directly, it is designed to be the |
| foundation for writing URL wrapper objects (The GURL later and Blink's KURL |
| object use the Utility layer to implement the low-level logic). |
| |
| The Utility code makes decisions about URL types and calls the correct parsing |
| and canonicalzation functions for those types. It provides an interface to |
| register application-specific schemes that have specific requirements. |
| Sharing this loigic between KURL and GURL is important so that URLs are |
| handled consistently across the application. |
| |
| The main header to include is `url/url_util.h`. |
| |
| ### GURL and Origin |
| |
| At the highest layer, a C++ object for representing URLs is provided. This |
| object uses STL. Most uses need only this layer. Include `url/gurl.h`. |
| |
| Also at this layer is also the Origin object which exists to make security |
| decisions on the web. Include `url/origin.h`. |
| |
| ## Historical background |
| |
| This code was originally a separate library that was designed to be embedded |
| into both Chrome (which uses STL) and WebKit (which didn't use any STL at the |
| time). As a result, the parsing, canonicalization, and utility code could |
| not use STL, or any other common code in Chromium like base. |
| |
| When WebKit was forked into the Chromium repo and renamed Blink, this |
| restriction has been relaxed somewhat. Blink still provides its own URL object |
| using its own string type, so the insulation that the Utility layer provides is |
| still useful. But some STL strings and calls to base functions have gradually |
| been added in places where doing so is possible. |