What
This is a tool that allows you to create a personal index of web pages. You can then search for pages in your index using keywords. The tool is composed of a web app and a Firefox extension. The web app allows you to upload a page to your index given a URL, while the extension allows you to upload the current page on the browser to your index. The tool is currently in beta and I m not the best programmer, so please be patient if you encounter any bugs. If you have any suggestions, please let me know on Github
Why & Philosophy - some screenshot on the readme - Alternatives
How To use
- Create an account: You can use a temporary email service like Temp-Mail or a tool like Firefox Relay. It's fine if you choose either of those options; I just needed a way to differentiate users and their indexes.
- Log in: Enter your credentials to access your account.
- Add URLs to scrape: Provide a URL to scrape. I understand that it might be considered unethical to take content from others, but I couldn't think of a better approach. The scraped content will be added to your index.
- Repeat the process: You can repeat step-3 multiple times to add more content to your index. However, please note that this functionality is currently limited due to the risk of IP banning or blacklisting. If you exceed the limit, peges will be buffered.
- Alternatively, use the Firefox extension: I created a Firefox extension that allows you to upload your current page directly to your index. This is a much more powerful tool: it has no limits on the number of pages but it can also upload personal data displayed on dynamic pages. Please be cautious as I am still a random person on the internet.
- Manage your pages: You can add categories to your pages in the "/me/pages" section
- Search for pages:
- Visit the website and use the search function with "keywords" (I mean... the query is sent directly to MeiliSearch). You will receive a list of pages from your index that match the query.
- ✨If you have the extentiion installed, search results can be displayed on top of google.
Limitations
Right now, the main limitation is on the number of pages that can be scraped. I have set a limit of 5 pages per month. This is because I don't want to get my ip blacklisted. As a makeshift, I am currently routing the requests through a proxy, but I am still working on a solution to this problem. Suggestion are welcome. Also, I call "scraping" a simple GET request to the page (to extract just the text), so if you need to login to see the content, you can't scrape it (but you can use the extension).
Privacy
Well, of course, if I want to index every web page uploaded, I think there is no alternative but to obtain the clear/plain text (note that this is also valid if you want to index personal data through the extension). In case there is a way to protect the indexed text as well, please let me know. I ask only for an email and save only 2 cookies (that I know of): a JWT and an id. There are no analitycs. The extention is 1.6kB of js, you can look it up. (just rename the file from .xpi to .zip)
Security
I m trying my best, but i still suck. If you find serious issues, please help me on Github.
(: