List of content farm sites like g.penzai.com.

Overview

内容农场网站清单

Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。

尤为过分的是,该类网站可能有成千上万个分身域名被 Google 收录,严重影响搜索体验。详见 2021 年 10 初的社区反馈:

  1. Github: 如何屏蔽“小搭百科网”?
  2. V2EX: 请问在 google 搜索时,频繁遇到小 X 知识网等内容农场式结果,怎么办?
  3. V2EX: google 搜中文太毒了吧,是不是已经放弃中文搜索了
  4. HOSTLOC: 这采集站群太强了吧
  5. HOSTLOC: 小*知识网站群是哪位大佬的杰作

使用正则匹配标题的方式不能完全屏蔽,所以为方便广大网友过滤搜索结果,特整理此清单。

由于此次事件主角「小搭百科网」在造成影响后主动关站,所以接下来也将关注、收录其他的类似内容农场站。

使用方式

uBlacklist

安装 uBlacklist

Chrome Web Store / Firefox Add-ons / App Store (for macOS and iOS)

后进入 Option 菜单,点击 Add a subscription,输入如下内容:

  • Name: content-farm-list
  • URL: https://raw.githubusercontent.com/wdmpa/content-farm-list/main/uBlacklist.txt

  • Name: content-farm-list
  • URL: https://wdmpa.org/content-farm-list/uBlacklist.txt

单击 'Add' 按钮。

Google Hit Hider

http://www.jeffersonscher.com/gm/google-hit-hider/

Install

Grease Fork / OpenUserJS.org

Manage lists

http://www.jeffersonscher.com/gm/google-hit-hider/manage-lists.php

订阅说明

文件 说明
uBlacklist.txt uBlacklist 规则集合
Surge.txt Surge 规则集合
uBlacklist/spam/g.penzai.com.txt uBlacklist 专用小搭百科网域名集合
Surge/spam/g.penzai.com.txt Surge 专用小搭百科网域名集合
uBlacklist/machine-translated/stackoverflow.txt uBlacklist 专用机翻 StackOverflow 域名集合
Surge/machine-translated/stackoverflow.txt Surge 专用机翻 StackOverflow 域名集合

设置搜索引擎

因与清单中域名匹配的结果会被移除,所以搜索引擎的结果页剩余条目太少,不便浏览,建议登录后设置搜索结果显示为每页面 100 条。

我们能做什么?

一、发 PR 添加域名

  1. 从本地插件 uBlacklist 中导出域名列表
  2. 在搜索引擎中尝试长尾关键词,以发现更多目前权重尚低的农场域名

按结构在 domains 目录中添加新的分类集合文件。参考文件中已有内容的格式,在任意位置添加即可。(Fork 本仓库后编辑再 Push,或在页面中编辑均可。)

文件 说明
domains/spam/g.penzai.com.txt 小搭百科网域名集合
domains/machine-translated/stackoverflow.txt 机翻 StackOverflow 域名集合

提交后,脚本会自动更新订阅文件中的内容。

二、举报

向其使用的云服务提供商举报其滥用行为。

Owner
WDMPA
World Developer Mood Protection Association
WDMPA
Python periodic table module

elemenpy Hello! elements.py is a small Python periodic table module that is used for calling certain information about an element. Installation Instal

Eric Cheng 2 Dec 27, 2021
PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

48 Dec 08, 2022
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.

Molecular Docking integrated in Jupyter Notebooks Description | Citation | Installation | Examples | Limitations | License Table of content Descriptio

Angel J. Ruiz Moreno 173 Dec 25, 2022
Point Cloud Registration Network

PCRNet: Point Cloud Registration Network using PointNet Encoding Source Code Author: Vinit Sarode and Xueqian Li Paper | Website | Video | Pytorch Imp

ViNiT SaRoDe 59 Nov 19, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

On Generating Transferable Targeted Perturbations (ICCV'21) Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli Paper:

Muzammal Naseer 46 Nov 17, 2022
Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks

pix2vox [Demonstration video] Sketch-Based 3D Exploration with Stacked Generative Adversarial Networks. Generated samples Single-category generation M

Takumi Moriya 232 Nov 14, 2022
Neural style transfer as a class in PyTorch

pt-styletransfer Neural style transfer as a class in PyTorch Based on: https://github.com/alexis-jacq/Pytorch-Tutorials Adds: StyleTransferNet as a cl

Tyler Kvochick 31 Jun 27, 2022
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
SHIFT15M: multiobjective large-scale fashion dataset with distributional shifts

[arXiv] The main motivation of the SHIFT15M project is to provide a dataset that contains natural dataset shifts collected from a web service IQON, wh

ZOZO, Inc. 138 Nov 24, 2022
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

Xingyi Zhou 579 Dec 22, 2022
MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

MicRank: Learning to Rank Microphones for Distant Speech Recognition Application Scenario Many applications nowadays envision the presence of multiple

Samuele Cornell 20 Nov 10, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
Dynamic Graph Event Detection

DyGED Dynamic Graph Event Detection Get Started pip install -r requirements.txt TODO Paper link to arxiv, and how to cite. Twitter Weather dataset tra

Mert Koşan 3 May 09, 2022
Official implementation of "Open-set Label Noise Can Improve Robustness Against Inherent Label Noise" (NeurIPS 2021)

Open-set Label Noise Can Improve Robustness Against Inherent Label Noise NeurIPS 2021: This repository is the official implementation of ODNL. Require

Hongxin Wei 12 Dec 07, 2022
Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Tensor Component Analysis for Interpreting the Latent Space of GANs [ paper | project page ] Code to reproduce the results in the paper "Tensor Compon

James Oldfield 4 Jun 17, 2022