Nice post, it triggered some of my thoughts. However I see a few disadvantages of taking this approach.
First, good luck with convincing source maintainers to push their files into a streaming service…
Second, if you can convince them to do so, why not pushing the files directly to your data lake so you skip the complexity of streaming?
Once you start bothering source maintainers for this kind of things, you are likely to get an extra bottleneck for the data ingestion in case the process fails. I would only go this direction in case your team have full permissions and responsibility to maintain the whole process.