1、气候监测数据集 http://cdiac.ornl.gov/ftp/ndp026b
2、几个实用的测试数据集下载的网站
http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the o-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the o-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址可以找到reuters数据集http://www.research.att.com/~lewis/reuters21578.ht ml
以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www /naive-bayes.html
3、找了很多测试数据集,写论文的同志们肯定需要的,至少能用来检验算法的效果
可能有一些不能访问,但是总有能访问的吧:
UCI收集的机器学习数据集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm
statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/
样本数据库
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html
关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp
http://lans.ece.utexas.edu/~strehl/
reuters数据集
http://www.research.att.com/~lewis/reuters21578.ht ml
各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=dat asets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resource s/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/
进行文本分类&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www /naive-bayes.html
http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html# AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/ba la2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the o-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/Ti meCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.ht ml
时间序列数据的网址
http://www.stat.wisc.edu/~reinsel/bjr-data/
apriori算法的测试数据
http://www.almaden.ibm.com/cs/quest/syndata.html
数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.ht ml
http://www.almaden.ibm.com/cs/quest/syndata.html
关联:
http://flow.dl.sourceforge.net/sourceforge/weka/re gression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resource s/datasets/syndata.html#assocSynData
WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/re gression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-U CI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-n umeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression -datasets.jar
癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.c gi
金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
另一个人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the o-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/the o-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址可以找到reuters数据集
http://www.research.att.com/~lewis/reuters21578.ht ml
以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www /naive-bayes.html
Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
kdnuggets 相关链接数据集(借花献佛了):
http://www.kdnuggets.com/datasets/index.html UCI KDD Database Repository for large datasets used machine learning and knowledge discovery research.
UCI Machine Learning Repository.
Delve, Data for Evaluating Learning in Valid Experiments
FEDSTATS, a comprehensive source of US statistics and more
FIMI repository for frequent itemset mining, implementations and datasets.
Financial Data Finder at OSU, a large catalog of financial data sets
GeneSifter Data Center, access to microarray datasets through the GeneSifter microarray data analysis system.
GEO (GEO Gene Expression Omnibus), a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.
Grain Market Research, financial data including stocks, futures, etc.
Investor Links, includes financial data
Microsofts TerraServer, aerial photographs and satellite images you can view and purchase.
MIT Cancer Genomics gene expression datasets and publications, from MIT Whitehead Center for Genome Research.
National Government Statistical Web Sites, data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
National Space Science Data Center (NSSDC), NASA data sets from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
PubGene(TM) Gene Database and Tools, genomic-related publications database
SMD: Stanford Microarray Database, stores raw and normalized data from microarray experiments.
SourceForge.net Research Data, includes historic and status statistics on approximately 100,000 projects and over 1 million registered users activities at the project management web site.
STATOO Datasets part 1 and part 2
UCR Time Series Data Mining Archive, offering datasets, papers, links, and code.
United States Census Bureau.