如果文本里有不是中文的请翻译成中文。"content=re.sub("[s+.!/_,$%^*(+"']+|[a-zA-Z0-9+——,。!、:~@#¥%……&*()]+","",content_raw)print(content_raw)print(content)``````pythonfromsklearn.feature_extraction.textimportCountVectorizercv=CountVectorizer(ngram_range=(2,2))#构建字典cv_fit=cv.fit_transform([content])#获取词袋print(cv.get_feature_names())``````python#单独一个文件fromsklearn.feature_extraction.textimportCountVectorizercv=CountVectorizer()#构建字典cv_fit=cv.fit_transform([content])#获取词袋print(cv.get_feature_names())``````python#单独一个文件fromsklearn.feature_extraction.textimportCountVectorizercv=CountVectorizer()#构建字典cv_fit=cv.fit_transform([content])#获取词袋print(cv.get_feature_names())print(cv_fit.toarray())``````pythonfromsklearn.feature_extraction.textimportTfidfVectorizertv=TfidfVectorizer()tv_fit=tv.fit_transform([content])print(tv.get_feature_names())print(tv_fit.toarray())``````pythonimportjieba#分词content=jieba.cut(content)res="".join(content)print(res)``````python#分词content=jieba.cut(content)res="".join(content)print(res)fromsklearn.feature_extraction.textimportTfidfVectorizertv=TfidfVectorizer()tv_fit=tv.fit_transform([res])print(tv.get_feature_names())print(tv_fit.toarray())``````python#分词content=jieba.cut(content)res="".join(content)print(res)fromsklearn.feature_extraction.textimportCountVectorizercv=CountVectorizer()cv_fit=cv.fit_transform([res])print(cv.get_feature_names())print(cv_fit.toarray())``````python#分词content=jieba.cut(content)res="".join(content)print(res)#构建字典fromsklearn.feature_extraction.textimportTfidfVectorizertv=TfidfVectorizer()tv_fit=tv.fit_transform([res])#获取词袋print(tv.get_feature_names())print(tv_fit.toarray())``````python#分词content=jieba.cut(content)res="".join(content)print(res)#调用fromsklearn.feature_extraction.textimportHashingVectorizerhv=HashingVectorizer(n_features=6)hv_fit=hv.fit_transform([res])#获取词袋print(hv.get_feature_names())print(hv_fit.toarray())```---如果遇到问题请发邮件至:naer7403@gmail.com并标明模块名称,谢谢。
Copyright © 2025