The HRTIM has a resolution of 2.5 ns, and the shortest measurable pulse is 2.5 ns. Since it has two "capture" channels, it should be able to measure both the pulse arrival time and the pulse width (rise time and fall time). But here it’s important to know what pulse widths and resolution you’re targeting.
The TDC-GP22 has a minimum start-stop interval of 3.5 ns (which probably wouldn’t bother you). I've tested the AS6500 (which had too low a data throughput for my purposes) and the TDC-GPX2, which turned out to contain a bug that complicates the measurement of times longer than about 500 ns (if needed, I can find a detailed description).
The solution involving the sweeping of the transmit and read pulses also works for long times. Exactly as you suggest, you generate the coarse step using a timer from the MCU and the fine step by configuring the delay line. We currently operate just such a "dual delay pulser" to compensate for photon delays in optical fibers (see the schematic). The HRTIM generates two PWM signals with selectable delay (2.5 ns step), and the NB6L295 dual delay line implements the fine delay (<2.5 ns) and also adjusts the pulse width here (two sightly shifted signals with are mixed at an AND gate).
Adjustable delay pulser
However, I can also see a solution using the STM32F334. It’s equipped with an HRTIM with a output (!) resolution of about 220 ps. It can sweep a test pulse, and to detect the arrival time, you could theoretically use the timer input itself or an external flip-flop circuit (minimum of external components). As far as I know, the HRTIM outputs on the 334 are a bit jittery, but that shouldn’t be a problem if you can “average” across multiple measurements. I’d believe it’s possible to achieve a resolution below 1 ns with this. But as I said, it depends on what time resolution you’re aiming for. If you’re aiming for fractions of a nanosecond, then it will definitely be easier to use a ready-made TDC (like the GP22, assuming it doesn’t have the same bug as the GPX2, if that bothers you at all).
Just to be clear, I’ve never actually built a TDC this way myself. But I’ve experimentally verified that a flip-flop circuit (MC100EP29 / EP51 ...) can distinguish whether the “clock” or “data” signal arrived first with 95% certainty within a worst-case interval of 40 ps. And that the above-mentioned procedure/schematic can generate two pulses with an adjustable interval of approximately 10 ps up to a range of ±50 ns (and likely even more).