Share this post on:

D and RARELY, in the book, The Origin of Species. According to subject of the book, HYBRID is an important and the RARELY is an irrelevant word, both of them have the same frequency equal to 45. RARELY is distributed in the text, uniformly but,HYBRID is clustered. doi:10.1371/journal.pone.0130617.gFig 6. Results of box counting for, HYBRID and RARELY. The dashed line and dash dotted line demonstrate the power law regression. The H 4065 site fractal dimension is about 0.4 for HYBRID and is close to 0.8 for RARELY. The box counting result of CELL and journal.pone.0077579 ACTUALLY is also showed. The fractal dimension is about 0.4 for CELL and is close to 0.8 for ACTUALLY. doi:10.1371/journal.pone.0130617.gPLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,9 /The Fractal Patterns of Words in a TextFig 7. Results of box counting for distribution of HYBRID in the original and shuffled text. HYBRID is an important word in the book, The Origin of Species. So, there is a considerable difference between Velpatasvir chemical information boxcounting of this word in the original and shuffled text. doi:10.1371/journal.pone.0130617.gunimportant. The difference between patterns of a word in the original and shuffled text can be considered an indication of its importance. The degree of fractality which is defined in Eq 4 measures this difference. Fig 8 shows the degree of fractality for two words, HYBRID and CELL. It is clear from this figure that CELL is more important than HYBRID. The degree of fractality of HYBRID is 8.21 and is 12.71 in the case of CELL. Now we can rank all of the words according to the degree of fractality. Table 1 reports the list of twenty top-ranked words and also fnins.2015.00094 the first twenty frequent words for comparison. According to the subject of the book, words such as, SLAVES, ILLEGITIMATE, SALIVA, and PEDICELLARIAE are important words. They also have higher degree of fractality in comparison with other words. The irrelevant words like, THE, OF, AND, and IN have lower degree of fractality, though they are very frequent in the book. It is useful to point out that function words have the lowest degree of fractality overall, but unimportant content words still have lower fractality than important keywords. For small texts, word frequency becomes increasingly important. For taking into account the effect of frequency, we multiply log(M) by the degree of fractality, causing the most changes in degree of fractality rank in the middle of the list, while words at the top of the list have a small change in their rank. Other choices may change the rank of the words in all parts of the list significantly. Table 2 presents another retrieved list of words according to this Combined Measure. Now, words like SLAVES, WAX, HYBRIDS, and INSTINCTS are placed in the top. In this new ranking list, the word, HYBRID, changes its place from 321 to 48, the word, RARELY also moves from 2203 rank to 1011. In addition to the degree of fractality, there exist several methods that assign an importance value to any word in a given text. We can list the words in descending order of their importance. In this list the words that are placed in the top ranks are assumed to be keywords. By choosing a threshold value we can identify the list of keywords. In the following section we evaluate our proposed method for the keyword detection task.PLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,10 /The Fractal Patterns of Words in a TextFig 8. Area which is bounded between two curves for CELL and HYBRID in the box counting diagram. The c.D and RARELY, in the book, The Origin of Species. According to subject of the book, HYBRID is an important and the RARELY is an irrelevant word, both of them have the same frequency equal to 45. RARELY is distributed in the text, uniformly but,HYBRID is clustered. doi:10.1371/journal.pone.0130617.gFig 6. Results of box counting for, HYBRID and RARELY. The dashed line and dash dotted line demonstrate the power law regression. The fractal dimension is about 0.4 for HYBRID and is close to 0.8 for RARELY. The box counting result of CELL and journal.pone.0077579 ACTUALLY is also showed. The fractal dimension is about 0.4 for CELL and is close to 0.8 for ACTUALLY. doi:10.1371/journal.pone.0130617.gPLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,9 /The Fractal Patterns of Words in a TextFig 7. Results of box counting for distribution of HYBRID in the original and shuffled text. HYBRID is an important word in the book, The Origin of Species. So, there is a considerable difference between boxcounting of this word in the original and shuffled text. doi:10.1371/journal.pone.0130617.gunimportant. The difference between patterns of a word in the original and shuffled text can be considered an indication of its importance. The degree of fractality which is defined in Eq 4 measures this difference. Fig 8 shows the degree of fractality for two words, HYBRID and CELL. It is clear from this figure that CELL is more important than HYBRID. The degree of fractality of HYBRID is 8.21 and is 12.71 in the case of CELL. Now we can rank all of the words according to the degree of fractality. Table 1 reports the list of twenty top-ranked words and also fnins.2015.00094 the first twenty frequent words for comparison. According to the subject of the book, words such as, SLAVES, ILLEGITIMATE, SALIVA, and PEDICELLARIAE are important words. They also have higher degree of fractality in comparison with other words. The irrelevant words like, THE, OF, AND, and IN have lower degree of fractality, though they are very frequent in the book. It is useful to point out that function words have the lowest degree of fractality overall, but unimportant content words still have lower fractality than important keywords. For small texts, word frequency becomes increasingly important. For taking into account the effect of frequency, we multiply log(M) by the degree of fractality, causing the most changes in degree of fractality rank in the middle of the list, while words at the top of the list have a small change in their rank. Other choices may change the rank of the words in all parts of the list significantly. Table 2 presents another retrieved list of words according to this Combined Measure. Now, words like SLAVES, WAX, HYBRIDS, and INSTINCTS are placed in the top. In this new ranking list, the word, HYBRID, changes its place from 321 to 48, the word, RARELY also moves from 2203 rank to 1011. In addition to the degree of fractality, there exist several methods that assign an importance value to any word in a given text. We can list the words in descending order of their importance. In this list the words that are placed in the top ranks are assumed to be keywords. By choosing a threshold value we can identify the list of keywords. In the following section we evaluate our proposed method for the keyword detection task.PLOS ONE | DOI:10.1371/journal.pone.0130617 June 19,10 /The Fractal Patterns of Words in a TextFig 8. Area which is bounded between two curves for CELL and HYBRID in the box counting diagram. The c.

Share this post on:

Author: PDGFR inhibitor