Make sure agencies leave "Homogeneity" switched OFF when running statistics in memoQ! (MemoQ support)

Technical forums » MemoQ support »
Make sure agencies leave "Homogeneity" switched OFF when running statistics in memoQ!
Track this topic

Pages in topic: < [1 2]

Make sure agencies leave "Homogeneity" switched OFF when running statistics in memoQ!

Thread poster: Michael Beijer

Samuel Murray

Netherlands
Local time: 18:58
Member (2006)
English to Afrikaans
+ ...

@Rossana, DVX doesn't have it

Aug 21, 2015

Rossana Triaca wrote:
First of all, y'all making me feel really old, because good ol' Déjà vu had it first over a decade ago...

According to Mr Pivard, DVX doesn't have this feature, even now. Are you sure DV had it previously? Does DV really have it currently? What is it called?

DVX does have something called "internal repetition", but according to this thread, it's not the same as internal fuzzy matching at all, but rather relates to the repetition of very short phrases.

[Edited at 2015-08-21 10:53 GMT]

Michael Beijer

United Kingdom
Local time: 17:58
Member (2009)
Dutch to English
+ ...

TOPIC STARTER

Ought becomes is, and it is very important to know where you stand.

Aug 21, 2015

Hi Rossana,

Rossana Triaca wrote:

So many points, so little time!

[…]

It is not standard industry practice as far as I am concerned (...)

I don't think that's quite how it works...

Firstly, let me rephrase my,

It is not standard industry practice as far as I am concerned (...)

This should have been:

It should not be standard industry practice as far as I am concerned (...)

And in reply to your, "I don't think that's quite how it works..." … I actually think this is how it works: "standard industry practice" is formed by us all, together, over time.

If enough translators (and agencies) say, "Sure, I see how it can be fair to offer agencies a discount on internal fuzzy matches" (like you, Samuel and Patrick), then, over time, this could become standard practice. However (and this is where my post and my attempts to raise awareness come in), if enough people disagree with this idea, and say, "No, I don't think it is fair…", over time, the idea will not turn into standard practice. Ought becomes is, and it is very important to know where you stand. I am translator, so it is my interest to not offer discounts on fuzzy internal matches. This is why it also annoys me when fellow translators assume the defeatist position that some of you here have taken.

Somewhere else, Samuel also said he doesn't mind offering discounts on internal fuzzies, because he then just increases his word rate on every job ("so I can simply adjust my per-word rate for every project accordingly"). Interesting solution, but with most of the agencies I work for, we agree on a word rate in advance, and this can't be changed for each project. This would be way too much work for both parties. Also, by admitting that he has to increase his word rate if they demand discounts on internal fuzzies, isn’t Samuel saying that he doesn't accept internal fuzzy matches? So why not say so from the start and make your position on the matter clear?

Rossana Triaca wrote:

Don't shoot the messenger, I'm just relaying what is actually now the standard for agencies that demand a specific CAT as part of their workflow (keyword: demand). Then again, there's a whole other market outside of these, and I obviously support you 100% to send them packing if it doesn't make financial sense to you to accept their terms. However, I think we can agree that an agency that uses this option nowadays is not trying to embezzle you and doesn't deserve public shaming for it.*

Hmm, I also don't agree with you on this point. I work for a ton of agencies that demand a particular CAT tool, and not one of them asks for discounts on internal fuzzies/homogeneity. One of them tried it (the unnamed greedy entity that occasioned this thread), and I dropped them. If more translators would do the same, over time, it would become/stay standard practice.

Public shaming, sadly, is one of the few tools we translators have at our disposal to effect a change, which is why I have no qualms about using it.

Also note (Samuel) that I am not annoyed because the agency in question had switched on Homogeneity without telling me, but because it is their company policy to do so in all memoQ projects. This is just another reason why I think memoQ and SDL Studio are bad for our industry.

I invest an inordinate amount of my own time and energy in my CAT tools, terminology tools, translation memories, bilingual corpora, and all kinds of other digital and paper resources and workflows. I should be the one reaping the rewards, not the agency, if indeed there are even any to be reaped in the first place from all these fuzzy matches, etc. Just to get by these days, you need to own 2-3, if not more CAT tools, and god knows how many other programs. It's not like the agency is sending me on paid CAT tool and translation courses. That's all my responsibility. They just want to swoop in, grab the money, and run.

And I can hear you (not you specifically, Rossana

) saying, "Yes, well, obviously it's because your rates are too low. Just raise your rates, and then offering all manner of discounts wouldn't be a problem." I do not agree. My per word rate is something different. I am talking about internal fuzzy matches.

Let me ask you: do any of us here even REALLY, FULLY understand how memoQ's homogeneity algorithm even works? I doubt it. (Gergely, István, are you reading this? If so, care to explain it to us laypeople?) Then why are we so eager to offer discounts based on it? CafeTran has a homogeneity thingee in its statistics module too. I sometimes look at it (and the one in memoQ, when I run analyses in memoQ), but so far they have they never successfully predicted how long it ended up taking me in practice to complete a job. Not once.

Michael

[Edited at 2015-08-21 11:29 GMT]

CCTranslator45

Manuel Arcedillo
Spain
Local time: 18:58
English to Spanish

Bug affecting regular leverage and homogeneity

Aug 21, 2015

There is also currently (build 7.8.53) a known bug which affects homogeneity and regular TM leverage: if you set a minimum match threshold of 74 or above in TM settings, in the Editor grid you will not get the leverage reported. For example, you could have 86% matches reported in Statistics, but when you move to that segment the automatic lookup does not inform you of any leverage and you would need to translate it from scratch.

This only affects automatic lookup (pre-translation works as expected regardless of the threshold you set), so beware if you are planning an extensive use of internal leverage. To be on the safe side, you would need to set that threshold to 73 or below in build 7.8.53 and to 64 or below in previous builds. ▲ Collapse

Samuel Murray

Netherlands
Local time: 18:58
Member (2006)
English to Afrikaans
+ ...

@Michael

Aug 21, 2015

Michael Beijer wrote:
Samuel said he doesn't mind offering discounts on internal fuzzies, because he then just increases his word rate on every job...
Interesting solution, but with most of the agencies I work for, we agree on a word rate in advance, and this can't be changed for each project.

I also prefer to have a standard per-word rate that applies to all jobs, and most of my agency clients have such a policy. Most of my agency clients also have their own discount grid that they apply to all jobs (i.e. they don't apply my grid, but instead they apply theirs). This is also fine, for most jobs, as long as the translator knows how the agency's grid works, because then he can adjust his per-word rate accordingly, if he feels that the agency's grid is unreasonable.

It is not ideal if the agency sets the conditions, but that's just the world I (we) live in. In reality, I have a much greater objection to discounts for external matches than for internal matches, for reasons mentioned previously. And I have greater objection to low fuzzy match discounts than to high fuzzy match discounts. And I have an even greater objection to pre-inserted fuzzy matches. Still, most of my clients expect discounts for external matches, and many of their grids go as low as 60%, and some of them pre-insert fuzzy matches, and I oblige, because that is the industry that I'm in. I just change the variables that I do have control over.

Also, by admitting that he has to increase his word rate if they demand discounts on internal fuzzies, isn’t Samuel saying that he doesn't accept internal fuzzy matches? So why not say so from the start and make your position on the matter clear?

I don't increase my per-word rate merely if they demand internal match discounts. I increase my per-word rate if the agency's weighted word count is lower than the weighted word count that I consider reasonable for the job... regardless of what kind of matching was used.

Also note (Samuel) that I am not annoyed because the agency in question had switched on homogeneity without telling me, but because it is their company policy to do so in all MemoQ projects.

Oh, sorry for the misunderstanding.

You can be angry about this, or you can meet them halfway.

The fact is that internal fuzzy matches do save you time (right?), so if there are a lot of internal fuzzy matches, your speed will increase a bit. So then, offer a slightly higher rate for jobs that involve internal fuzzy matching... knowing that in most cases, the higher rate will neutralise the agency's discount grid, and in those few cases that it doesn't, it will be off-set by an increase of speed anyway.

Let me ask you: do any of us here even REALLY, FULLY understand how memoQ's homogeneity algorithm even works?

No, but I don't understand how any of my CAT tools' match algorithms work. I assume the internal match algorithm is the same as the external match algorithm (why would developers go out of their way to develop two, when the same can be used for both?).

I sometimes look at [CafeTran's homogeneity thingy], but so far it has never successfully predicted how long it ended up taking me in practice to complete a job.

My experience is similar -- the internal fuzzy match statistics help me gain an understanding of how much faster I'll be in general, but every job is different, and internal fuzzy matching is just one of the variables that affect how long it takes to do the translation.

Chunyi Chen
United States
Local time: 09:58
English to Chinese

Internal fuzzy matches can also mislead in estimating how long it takes to finish a job

Aug 21, 2015

Hi Michael,

Like you, when I see a MemoQ project analysis sent to me with Homogeneity option enabled, I raise a red flag and know that this is not the kind of clients I want to establish long-term relationship with.

Homogeneity not only lowers the weighted word count but also gives wrong information on how long a job would take. By enabling this feature, it lowers the number of new words and increases the number of fuzzy words. Just as you said, sometimes it takes longer to translate fuzzy matches.

I want my CAT tool to work for me. And I feel homogeneity does exactly the opposite.

[quote]Michael Beijer wrote:

And in reply to your, "I don't think that's quite how it works..." … I actually think this is how it works: "standard industry practice" is formed by us all, together, over time.

If enough translators (and agencies) say, "Sure, I see how it can be fair to offer agencies a discount on internal fuzzy matches" (like you, Samuel and Patrick), then, over time, this could become standard practice. However (and this is where my post and my attempts to raise awareness come in), if enough people disagree with this idea, and say, "No, I don't think it is fair…", over time, the idea will not turn into standard practice. Ought becomes is, and it is very important to know where you stand. I am translator, so it is my interest to not offer discounts on fuzzy internal matches. ▲ Collapse

Rossana Triaca

Uruguay
Local time: 13:58
English to Spanish

Of Unions and Algorithms...

Aug 22, 2015

Michael, your position is clear but it's really at odds with what I see is happening (which, granted, is anecdotal too). Translators are a diverse bunch, from all extractions and backgrounds, and hoping they'll take a stand on a matter is akin as wishing upon a star.

Moreover, it never occurred to me I was taking a position or being defeatist in accepting internal fuzzies because I just *love* them, since they play in my favor most of the time and allow me to accept projects twice as big in the same time slot - projects that I routinely turned down in the good old days before having this valuable information.

It also encourages transparency between both parties, and for direct clients is a huge selling point - again, a win-win situation.

That being said, how you treat them business-wise to estimate a project's cost is entirely a different issue, and I see no problem with asking an agency up front if they use them or not when negotiating terms. Basically, I do have two rates depending on this - hell, I have an infinite amount of rates depending on a lot of factors (I'm really against blanket rates), but it does happen that the same agency tends to process and send the same type of work with the same terms and expectations, so in the end I think I simply end up having a per-client rate (which again, doesn't mean I don't evaluate each project individually as they come).

Technically speaking, I don't think anyone can really know the algorithms used in each tool without some serious reverse engineering because they are (generally) closely guarded proprietary trade secrets. But, the general idea of pattern string matching has historically been to measure the editing distance (as in, the number of character substitutions needed to convert one string into another), and if you're mathematically inclined you can read more about Levenshtein's distance and Wagner and Seller's work in pretty much any computer theory text dealing with approximate matching. It's a fascinating subject, but really outside our scope, and as Samuel says, you don't need to know how it *really* works to use it in practice. (But by all means, if you're interested do read-up on it! I wish all translators would have a crash course on character encoding and string metrics, markup languages, databases & project management before even opening up their first CAT).

@Samuel, Mr. Pivard's missed the boat on this one, because the option you're searching for in DVX is "Intra-Project Analysis" (when you analyse a file). It had a different name before DV became DVX, but it escapes my memory (or it was the default when you ran the analysis against a blank TM, I tend to mix my CATs). It's not the same as the "internal repetition" index - which is also useful, but completely different.

*Edited to add, don't get mixed up with "Inter-project Analysis", which is kinda the same for 2 projects... really bad terminology choices if you ask me. Atril had a winning ticket a decade ago and really dropped the ball...

[Edited at 2015-08-22 11:39 GMT] ▲ Collapse

Samuel Murray

Netherlands
Local time: 18:58
Member (2006)
English to Afrikaans
+ ...

@Rossana

Aug 24, 2015

Rossana Triaca wrote:
@Samuel, Mr. Pivard's missed the boat on this one, because the option you're searching for in DVX is "Intra-Project Analysis" (when you analyse a file). ... It's not the same as the "internal repetition" index - which is also useful, but completely different.

Yes, I can confirm this (I saw it in a test project in DVX3). DVX3 has internal fuzzy matching.

Pages in topic: < [1 2]

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Maya Gorgoshidze	[Call to this topic]
Peter Zauner	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

Make sure agencies leave "Homogeneity" switched OFF when running statistics in memoQ!

Forum rules

Help and orientation

Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators. Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way. More info »

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

Make sure agencies leave "Homogeneity" switched OFF when running statistics in memoQ!

Make sure agencies leave "Homogeneity" switched OFF when running statistics in memoQ!

You have native languages that can be verified

Your current localization setting

Select a language