declutters url lists for crawling/pentesting

Related tags

URL Manipulationuro
Overview

uro

Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

It doesn't make any http requests to the URLs and removes:

  • human written content e.g. blog posts
  • urls with same path but parameter value difference
  • incremental urls e.g. /cat/1/ and /cat/2/
  • image, js, css and other static files

Usage

First, install uro with pip:

pip3 install uro

Now, there's just one way to use it, no args, no bullshit.

cat urls.txt | uro

uro-demo

Comments
  • ImportError: cannot import name 'SIGPIPE' from 'signal'

    ImportError: cannot import name 'SIGPIPE' from 'signal'

    D:\uro>uro Traceback (most recent call last): File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.2', 'console_scripts', 'uro')()) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 25, in importlib_load_entry_point return next(matches).load() File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib\metadata.py", line 77, in load module = import_module(match.group('module')) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 655, in _load_unlocked File "", line 618, in _load_backward_compatible File "", line 259, in load_module File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\site-packages\uro-0.0.2-py3.8.egg\uro\uro.py", line 4, in ImportError: cannot import name 'SIGPIPE' from 'signal' (C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\signal.py)

    opened by umar98 3
  • Error install uro

    Error install uro

    suya has the error... WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behavior with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

    I've done the steps above but haven't found a bright spot :(

    can anyone help me???

    invalid 
    opened by mjulda 2
  • When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front

    example:

    cat subs.txt | uro

    subs.txt example: site.com sub.site.com sub123.site.com

    anything without http:// or https:// in front it leaves the :// in front.

    opened by gprime31 2
  • ERROR

    ERROR

    i just can't get this to work have cloned the repo and run the install command, bur when i try "cat file.txt | uro" it dosen't work. do i have to do any additional commands? any installation video??:)

    invalid 
    opened by spector012 2
  • PLease solve this

    PLease solve this

    └─# cat params.csv | uro | wc -l Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/uro/uro.py", line 155, in main if re.search(pattern, path): File "/usr/lib/python3.9/re.py", line 201, in search return _compile(pattern, flags).search(string) File "/usr/lib/python3.9/re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.9/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.9/sre_parse.py", line 962, in parse raise source.error("unbalanced parenthesis") re.error: unbalanced parenthesis at position 68 6547

    opened by r3dpars3c 2
  • It doesn't delete paths

    It doesn't delete paths

    When we check the paths, we see that 43935989 and 43935976 are used differently.

    [email protected]:~# cat urls.txt
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    

    it should delete one of them but it doesn't.

    [email protected]:~# cat urls.txt | uro
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    
    bug 
    opened by Phoenix1112 1
  • error handling

    error handling

    So I added uro to my workflow and after a while I got this error:

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 139, in main
        if matches_patterns(path):
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 107, in matches_patterns
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 836, in _parse
        raise source.error("missing ), unterminated subpattern",
    re.error: missing ), unterminated subpattern at position 369
    

    It is happening to me with different inputs so seems to be something that happens often

    invalid 
    opened by marcelo321 1
  • Uro error

    Uro error

    λ cat newfile222.txt | uro Traceback (most recent call last): File "C:\Users\Yaseen\AppData\Local\Programs\Python\Python39\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.1', 'console_scripts', 'uro')()) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 139, in main if matches_patterns(path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 107, in matches_patterns if re.search(pattern, path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 201, in search return _compile(pattern, flags).search(string) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 836, in _parse raise source.error("missing ), unterminated subpattern", re.error: missing ), unterminated subpattern at position 379 cat: write error: No space left on device

    Can you help, it saying space issue, i have alot of space

    bug invalid 
    opened by hellofresh01 1
  • Improvement Request

    Improvement Request

    Hi Somdev,

    1. I'd like to suggest you add the following extensions to be blacklisted. I have gathered all of these extensions manually and I think It would be nice to omit them:
    'svg','img','gif','mp4','flv','ogv','webm','webp','mov','mp3','m4a','m4p','ppt','pptx','pdf','scss','tif','tiff','ttf','otf','woff','woff2','eot','htc','swf','rtf','image'
    
    1. Also, I would like to ask for white-listing and allowing the js extension as there are lots of interesting features/endpoints to be found on them and I don't think if they are considered "useless".

    Thanks!

    Kind Regards, HolyBugx

    enhancement 
    opened by HolyBugx 1
  • More extension to declutter

    More extension to declutter

    Maybe it can be useful to add this extension to the one to declutter, at least, it's what I usually do:

    .doc
    .docx
    .mp3
    .mp4
    .exe
    .tif
    .ttf
    .woff
    .woff2
    .ico
    .zip
    
    duplicate 
    opened by leorac 0
  • Bad character range P-C at position 31

    Bad character range P-C at position 31

    cat urls.txt | uro

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 155, in main
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 598, in _parse
        raise source.error(msg, len(this) + 1 + len(that))
    re.error: bad character range P-C at position 31
    
    bug 
    opened by remonsec 0
  • uro error

    uro error

    cat urls.txt | uro > test

    Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/uro/uro.py", line 123, in main for line in sys.stdin: File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

    @s0md3v

    bug 
    opened by Iamsajidkhan 0
  • Error

    Error

    Traceback (most recent call last): File "/usr/local/bin/uro", line 33, in sys.exit(load_entry_point('uro==0.0.4', 'console_scripts', 'uro')()) File "/usr/local/bin/uro", line 25, in importlib_load_entry_point return next(matches).load() StopIteration

    opened by umarahmad125 0
  • broken pipe

    broken pipe

    I have been encountering this issue:

      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 151, in main
        print(host + path + dict_to_params(param))
    BrokenPipeError: [Errno 32] Broken pipe
    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 161, in main
        print(host + path)
    BrokenPipeError: [Errno 32] Broken pipe
    

    Any idea why would it be?

    opened by marcelo321 0
  • enhanced filtration

    enhanced filtration

    like i want to filter "/A/embed?url=" or "/B/embed?url=" which return similar data like i want to filter "/A.php" or "/A.php/" which return similar data

    enhancement 
    opened by LztCode 1
Releases(0.0.4)
  • 0.0.4(Mar 19, 2022)

  • 0.0.3(Feb 27, 2022)

    • removed redundant imports and code
    • added more extensions to blacklist
    • less memory and time consumption
    • fixed 'broken pipe' error when piping the output to utilities like head
    • fixed an error where similar urls were not getting filtered when they had any parameters
    Source code(tar.gz)
    Source code(zip)
  • 0.0.2(Sep 1, 2021)

Owner
Somdev Sangwan
I make things, I break things and I make things that break things.
Somdev Sangwan
This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

mia 1.6k Dec 31, 2022
Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL.

Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL. It allows you to see the complete path a redirected URL goes through. It will show you the full redirectio

JAYAKUMAR 28 Sep 11, 2022
UDdup - URLs Deduplication Tool

UDdup - URLs Deduplication Tool The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive

Rotem Reiss 128 Dec 21, 2022
encurtador de links feito com python

curt-link encurtador de links feito com python! instalação Linux: $ git clone https://github.com/bydeathlxncer/curt-link $ cd curt-link $ python3 url.

bydeathlxncer 5 Dec 29, 2021
Simple python library to deal with URI Templates.

uritemplate Documentation -- GitHub -- Travis-CI Simple python library to deal with URI Templates. The API looks like from uritemplate import URITempl

Hyper 210 Dec 19, 2022
🔗 Generate Phishing URLs 🔗

URLer 🔗 Generate Phishing URLs 🔗 URLer Table Of Contents General Information Preview Installation Disclaimer Credits Social Media Bug Report General

mrblackx 5 Feb 08, 2022
Qysqa - URL shortener website with python

Qysqa - shorten your URL. ~ A simple URL-shortening website. how do you pronounc

Dastan Ozgeldi 0 Nov 18, 2022
Yet another URL library

Yet another URL library

aio-libs 884 Jan 03, 2023
declutters url lists for crawling/pentesting

uro Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

Somdev Sangwan 677 Jan 07, 2023
🌐 URL parsing and manipulation made easy.

furl is a small Python library that makes parsing and manipulating URLs easy. Python's standard urllib and urlparse modules provide a number of URL re

Ansgar Grunseid 2.4k Jan 04, 2023
URL Shortener in Flask - Web service using Flask framework for Shortener URLs

URL Shortener in Flask Web service using Flask framework for Shortener URLs Install Create Virtual env $ python3 -m venv env Install requirements.txt

Rafnix Guzman 1 Sep 21, 2021
C++ library for urlencode.

liburlencode C library for urlencode.

Khaidi Chu 6 Oct 31, 2022
🔗 FusiShort is a URL shortener built with Python, Redis, Docker and Kubernetes

This is a playground application created with goal of applying full cycle software development using popular technologies like Python, Redis, Docker and Kubernetes.

Lucas Fusinato Zanis 7 Nov 10, 2022
Extract countries, regions and cities from a URL or text

This project is no longer being maintained and has been archived. Please check the Forks list for newer versions. Forks We are aware of two 3rd party

Ushahidi 216 Nov 18, 2022
A simple URL shortener built with Flask

A simple URL shortener built with Flask and MongoDB.

Mike Lowe 2 Feb 05, 2022
A url redirect status check module for python

A url redirect status check module for python

Fayas Noushad 2 Oct 24, 2021
A simple, immutable URL class with a clean API for interrogation and manipulation.

purl - A simple Python URL class A simple, immutable URL class with a clean API for interrogation and manipulation. Supports Pythons 2.7, 3.3, 3.4, 3.

David Winterbottom 286 Jan 02, 2023
A URL builder for genius :D

genius-url A URL builder for genius :D Usage from gurl import genius_url

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 12 Aug 14, 2021
A tool to manage the base URL of the Python package index.

chpip A tool to manage the base URL of the Python package index. Installation $ pip install chpip Usage Set pip index URL Set the base URL of the Pyth

Prodesire 4 Dec 20, 2022
Astra is a tool to find URLs and secrets.

Astra finds urls, endpoints, aws buckets, api keys, tokens, etc from a given url/s. It combines the paths and endpoints with the given domain and give

Stinger 198 Dec 27, 2022