mercredi 3 mai 2023

Laravel MySQL Full text search InnoDB Relevancy Calculation for multiple rows with same number of occurences in total

using mysql from xaamp v8.1.10 along with laravel v5.5.3.

Full text Search Index enabled in table 'ex1' for column 'des' and 'ft_min_word_len = 1'.

laravel query:

$string='pink item product';

$result = ex::active()->selectRaw("*,match(des) against('$string' in boolean mode)  as relevance")->whereRaw("match(des) against('$string' in boolean mode)")->orderBy('relevance', 'desc')->get();

I am getting the result as

( results are truncated )

id Relevance des
64 1.3543391227722 amazing pink petite lace item is item.
182 1.3543391227722 this pink tall polyester item is item.
598 1.3543391227722 which pink large polyester item is item.
653 1.3543391227722 amazing pink plus size linen item is item.
808 1.3543391227722 what pink tall cashmere item is item.
19 1.344024181366 different pink large silk product is item.
134 1.344024181366 that pink large wool product is item.
140 1.344024181366 this pink tall lace item is product.
233 1.344024181366 new pink extra-small lace item is product.
273 1.344024181366 this pink tall cashmere item is product.
625 1.344024181366 amazing pink medium lace product is item.
893 1.344024181366 pink extra-small leather item is product.
908 1.344024181366 which pink tall lace product is item.
992 1.344024181366 this pink large cotton product is item.
21 1.3337094783783 pink extra-small cashmere product is product.

My question is :

id Relevance des
64 1.3543391227722 amazing pink petite lace item is item.
21 1.3337094783783 that pink extra-small cashmere product is product.

though having the same number of occurences:

column 64 has : item-2 times, pink-1 time and column 21 has : product-2 times pink-1 time.

Initial Thoughts

I thought the calculation is like this:

(refrence from MySQL FUll Text Search Doc, which says:)

${rank} = ${TF} * ${IDF} * ${IDF} + ${TF} * ${IDF} * ${IDF}

SELECT (1*log10(8/6)*log10(8/6)) + (2*log10(8/2)*log10(8/2));

which is :

(number of occurences of word 1* log10(total records in table/total matching records where the word is found) + (number of occurences of word 2* log10(total records in table/total matching records where the word is found) + ... + (number of occurences of word n * log10(total records in table/total matching records where the word n is found)

my question is how is the relevance calulated ?

also, even though column 64 and column 182 has same set of occureneces, both has pink-1 time and item-2 times, the relevance for column 64 is higher. why ?

initially i though this was based on row number in database, the order in which it was created. But in that case, my first question should be invalid as column 21 should come before column 64. I cant understand how it is calculated.

i did calculation this far with MySQL doc on full text search and i cant find a documnetation covering this kind of scenario.

Thanks in Advance.



via Chebli Mohamed

Aucun commentaire:

Enregistrer un commentaire