Pages in topic: [1 2] > | Translating a website: Tool for downloading hundreds of files and counting words Thread poster: Rajan Chopra
| Rajan Chopra India Local time: 10:34 Member (2008) English to Hindi + ...
Hi friends, A translation agency wants me to tranlate a website that contains dozens of links and in every link there are many links and sub-links. If I download the files one by one, it will take a great deal of time. Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and a... See more Hi friends, A translation agency wants me to tranlate a website that contains dozens of links and in every link there are many links and sub-links. If I download the files one by one, it will take a great deal of time. Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time. Thanks in advance for your precious help. Regards, Chopra ▲ Collapse | | | Laurent KRAULAND (X) France Local time: 07:04 French to German + ... Unprofessional way of dealing | Dec 5, 2010 |
Hi langclinic, there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe. I would insist on manageable work, i. e. the agency contacts the client and asks them to send the contents they really want to have translated. For you, it will be an insurance against doing not enough or too muc... See more Hi langclinic, there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe. I would insist on manageable work, i. e. the agency contacts the client and asks them to send the contents they really want to have translated. For you, it will be an insurance against doing not enough or too much work. This being said, I use Anycount to count the words in a PDF file. But the PDF file must be genuine PDF (like pages created in a DTP software or through an office application), not scanned files - in this case, and as the file would be images put in a PDF, you would have to count the words manually too. Good luck! ▲ Collapse | | | Riadh Muslih (X) Local time: 22:04 Arabic to English + ...
Laurent KRAULAND wrote: Hi langclinic, there is just no way anybody (agency or direct client for that matter) could make me download a whole website, with links, downloadable documents and the like - it is just unprofessional to say the least; and even more given the structure you describe. I fully agree with Krauland. Not only on the point of professionalism, and perhaps copyright, also because I will not do the work of the client. The client must send me what he/she wants me to translate, not me fishing for it, with or without pay. | | | jyuan_us United States Local time: 01:04 Member (2005) English to Chinese + ... I think the question is still relevant and worth looking into | Dec 5, 2010 |
Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files. | |
|
|
I have a piece of advice | Dec 5, 2010 |
1. To download a site, you need Teleport Pro. It allows us to download the site. You just indicate the URL, and the program does not go beyond the limits you have indicated (it is very important that you do not download external pages, links to which can occur on the web-site that you need to download). Downloaded files will be stored in a separate folder. Be aware that the program downloads everything (images, or whatever). It stores all these files in one folder. 2. Fine count is... See more 1. To download a site, you need Teleport Pro. It allows us to download the site. You just indicate the URL, and the program does not go beyond the limits you have indicated (it is very important that you do not download external pages, links to which can occur on the web-site that you need to download). Downloaded files will be stored in a separate folder. Be aware that the program downloads everything (images, or whatever). It stores all these files in one folder. 2. Fine count is a very powerful tool to count html files, pdf, etc. You just select the folder where your downloaded files are stored, and than select html files only (to add them to the list). 3. You translate the files in TagEditor. 4. You than look through the on-line version of your translation to find any errors, slips of the pen, etc. That is all. I successfully translated and localized several sites using the method. Of course, only small-scale web-sites can be translated in such a way. When having a large one, you will be lost in the piles of pages, images, etc. All that takes you time (which means money). And frankly speaking, only rather small sites, of individuals or small companies, can be processed in that way. Large companies will of course never ask a single free-lancer to translate the whole web-site.
[Edited at 2010-12-05 06:52 GMT] ▲ Collapse | | | Laurent KRAULAND (X) France Local time: 07:04 French to German + ...
jyuan_us wrote: Suppose you meet a direct client, who don't have an IT department but they just want you to translate their entire website. And they don't know how to download the files either. In this case, you may have to figure out how to download all the files. but a website does not appear ex nihilo somewhere on the Internet. Someone *must* be in possession of the original files. It is like the plague some of us are dealing with when handling scanned PDFs - you'd be surprised how fast some clients manage to get the originals when you say that processing scanned PDFs comes at a surcharge of X%. And how does one download Flash-generated content? | | | | Samuel Murray Netherlands Local time: 07:04 Member (2006) English to Afrikaans + ... Three sets of tools | Dec 5, 2010 |
langclinic wrote: Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? Yes, you need an "offline browser". I recommend Oleg Chernavin's Web Downloader 2.2 (google for webdown.exe and look on abandonware sites). Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time. You can try Anycount: http://www.anycount.com/download.html | |
|
|
Emma Goldsmith Spain Local time: 07:04 Member (2004) Spanish to English | Joakim Braun Sweden Local time: 07:04 German to Swedish + ... Original files | Dec 5, 2010 |
"Someone must be in possession of the original files". Yes, but they may be server-side scripts querying databases and contain no actual HTML at all. (That still doesn't make it the translator's problem, of course.) | | | Joakim Braun Sweden Local time: 07:04 German to Swedish + ... Original files | Dec 5, 2010 |
"Someone must be in possession of the original files". Yes, but they may be server-side scripts querying databases and contain no actual HTML at all. (That still doesn't make it the translator's problem, of course.) | | |
As stated by various previous posters, the right way to do this is to get the files (and instructions!) from the IT guy of the company. If the site is not exclusively made up of static HTML pages, you can't possibly translate it by just trying to download the site "from outside". If you know that it's all static html, it's still better to get the files from the webmaster, but it's possible to grab them from the net. The "right" tools for that are httrack and wget. I use wget, I believe the... See more As stated by various previous posters, the right way to do this is to get the files (and instructions!) from the IT guy of the company. If the site is not exclusively made up of static HTML pages, you can't possibly translate it by just trying to download the site "from outside". If you know that it's all static html, it's still better to get the files from the webmaster, but it's possible to grab them from the net. The "right" tools for that are httrack and wget. I use wget, I believe the command to download (mirror) a site is wget -m -np -P outputfolder -p http://www.site/address.com -m: mirror site, -np no parent folders, -P: specify name of output folder, -p: get page dependencies such as images Word counts shouldn't be an issue with HTML. You should do HTML with a CAT anyway, and your CAT will give you a word count. BTW both downloading and translating these files takes a fair bit of IT knowledge - I'm not sure I myself would take it on without the client's guidance and support.
[Edited at 2010-12-05 12:22 GMT] ▲ Collapse | |
|
|
Jack Doughty United Kingdom Local time: 06:04 Russian to English + ... In memoriam Translator's Abacus | Dec 5, 2010 |
Looked at "Anycount" and wondered if there was anything similar but free. Came across "Translator's Abacus" at http://www.globalrendering.com/download.html and downloaded it. I've tried it at it seems quite useful. | | | Webreaper & Anycount | Dec 5, 2010 |
langclinic wrote: Hi friends, Is there any method to download all files available on a website (Word files, pdf files, html pages and scanned pages etc.) in a quick and convenient manner? WebReaper 10.0 (Freeware) Secondly, there is no problem in counting the words in MS Word files as I can go to Tools and ascertain the word count but is there any tool to count words in pdf files, html pages etc as counting them manually will kill a lot of time.
Anycount | | | Samuel Murray Netherlands Local time: 07:04 Member (2006) English to Afrikaans + ... | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Translating a website: Tool for downloading hundreds of files and counting words CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |