[Mil MYS-8MMX] Mil MYS-8MMQ6-8E2D-180-C Application 2 - A Preliminary Study on NLP

tobot

[Mil MYS-8MMX] Mil MYS-8MMQ6-8E2D-180-C Application 2 - A Preliminary Study on NLP [Copy link]

The application of natural language (NL) to machine language (ML) is currently a popular direction. One of its branches is how to enable machines to recognize a sentence of human speech, including context, semantics, emotions, etc.

The most important part is the sentence segmentation. Today we try to use the Mil MYS-8MMQ6-8E2D-180-C to try the sentence segmentation.

The NLP library I tried today is jieba. Install the library file. Because direct installation may cause connection exceptions, you need to specify the source:

pip3 install jieba -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

Similarly, install jieba on python2 with the command:

pip install jieba -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

Let’s try a simpler one first: “The bicycle was about to fall over, so I grabbed it by the handlebars.” The pronunciations of the Chinese characters “把” in this sentence are different, and it means that I grabbed (the preposition) “把” (read the fourth tone, referring to the handlebars of a bicycle) by the handlebars.

What’s more interesting is that python2 can be executed normally, but python3 will fail, which seems to be an error in the re module.

Let’s count the words in famous works and get the 20 most frequent words. Again, python3 is not available, but it runs successfully under python2.

In our experimental example, we chose "War and Peace", the content of which is very familiar to everyone, so we will not increase the word count.

From the above example, we can see that jieba also cuts out the symbols separately. Single-word words are not very meaningful, so we can directly discard words with a length of 1 (including punctuation marks). According to Chinese rules, we can select "stop word list", which can be downloaded at https://gitee.com/chen_kailun/stopwords. There are four commonly used stop word lists in Chinese:

Vocabulary Name	Vocabulary file
Chinese stop words list	cn_stopwords.txt
HIT stop word list	hit_stopwords.txt
Baidu stop word list	baidu_stopwords.txt
Stopword Library of Machine Intelligence Laboratory of Sichuan University	scu_stopwords.txt

Select "Baidu Stop Words List" and directly call the functions textrank and extract_tags in jieba to obtain keywords and compare them with the high-frequency words we selected.

It can be seen that there are some overlapping contents, such as: "Duke" (Andre is indeed the real protagonist), and there are more different keywords. The method of selecting keywords by jieba is unclear, but it may not be a simple and crude selection of the most frequently appearing words as keywords.

In addition, I feel that the performance of single-board computers is still too slow compared to laptops. The same code is executed on a computer, but it only takes seconds to ten seconds. I still use Raspberry Pi for comparison. The Raspberry Pi 4 at hand also installs jieba on python2. The same code is tested:

Comparing the above two results, we find that the Raspberry Pi 4b only needs half the time (71/177, 174/466, 16/34) to do the same work, which is different from the result we tested with Pi before, that MYS-8MMQ6-8E2D-180-C is only slightly weaker than Raspberry Pi 4b (see: https://en.eeworld.com/bbs/thread-1175554-1-1.html).

In addition, in the running result of MYS-8MMQ6-8E2D-180-C, "East:530" strangely became ":11679". I don't know if it is a coding error.

getkeyword.py (2.65 KB, downloads: 0, 售价: 10 分芯积分)

soso

Hehe, it’s quite fun.

Jacktang

Pi test: Mir MYS-8MMX is only slightly weaker than Raspberry Pi 4b. What is the reason?

tobot

Jacktang published on 2021-8-25 09:42 Pi test Mier MYS-8MMX is only slightly weaker than Raspberry Pi 4b, what is the reason?

Look at my post (although some pictures are eaten up, I will retest and make up for it when I have time), the test results are like this

freebsder

Is it the SD card that is affecting the system? The system reads the SD card much slower.

tobot

freebsder posted on 2021-8-25 17:02 Is it the SD card that is affecting it? The system read of the SD card is indeed much slower.

No, put it in MMC.

The Raspberry uses an SD card, but it's still very fast.

Moreover, when I calculated the starting time in the code, the text file had already been read into ddr, and the time for reading the text was not calculated.

freebsder

tobot posted on 2021-8-25 22:04 No, it's in the mmc. The Raspberry uses an sd card, but it's still very fast. And when I calculated the starting point of the time in the code, I had already put the text...

I have used SD cards of different speeds before, and the overall impact on the system is very noticeable. The high-speed SD flash command is very fast, while the slow speed SD card requires a noticeable wait after pressing Enter before it is executed.

tobot

freebsder posted on 2021-8-26 17:08 I have used SD cards of different speeds before, and the impact on the overall system is so great that I can feel it. The high-speed SD flash command is a flash, and the slow speed is just a press of the Enter key...

Yes, I have bought SD cards from different manufacturers on pdd before (probably all parallel imports) and did read and write tests on them. . .

freebsder

tobot posted on 2021-8-26 17:15 Yes, I used to buy SD cards from different manufacturers on pdd (probably all parallel imports) and did read and write tests. . .

pdd... I guess it's not slow, it's stuck.

[Mil MYS-8MMX] Mil MYS-8MMQ6-8E2D-180-C Application 2 - A Preliminary Study on NLP [Copy link]

Latest reply

Comments

Comments

Comments

Comments

Comments