Data acquisition

250.0 EUR

250.0 EUR peopleperhour Technology & Programming Overseas
9 hour ago

Description

We are a software company. For one of our projects we need to downloadinformation from a website containing articles about medical topics.The website contains cca. 10000 HTML pages of paged listing of articlesin Czech language. The list contains titles of articles, each title havinga link to the detail HTML page with the article text.We need someone to produce wget and other scripts and download the titles ofall articles, parse the links from those titles, download the detailed pagesof the articles and distill the text that is shown in the page. The titles as well as the detail pages mostly have the same structure sothis allows for an automated work. But it is not so in 100% cases, there maybe several types of structure so it may require some attention as to howto distill the correct information.The result of this work will be a set of static HTML files. You can view thisstructure underhttps://fomenot.com/z/dwld24/main.htmlI.e. the result will contain the contents of the article separated intoparagraphs of normal text and captions (nothing else, no images or othertexts). We only want the main text of the article that is visible on the screenfor the user. No other text or html content.Another result will be the raw HTML output for each of the detail pagesFor accepting the output, we will do our check of the result. If we find errors,we will give examples of these errors and we will expect the vendor to fixall such errors in the result, not just those examples. If there are only a fewerrors we may not be able to find them and it is ok. But if we find any we willrequire correcting them.We expect that the raw HTML files will be 100% error free (for these we will notgive examples, we just would demand fixing them). For the text-based resultswe will give examples before demanding to fix them.
An example of such a source page you can find here: https://www.idnes.cz/onadnes/zdravi/2You can see a list of articles, each having a link leading to the detailand then a paging control that can load more articles from the next page.This is NOT the page we need to download but similar. Putting here the exampleonly that you understand what is the task.
Let us know if you could do it and for what price. We will provide the real linksto the selected candidate.

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now

Similar Teleworks

Experience Level: Expert Company: British Medical Experts Website: British Medical Experts About Us British Medical Experts is a leading organization specializing in medical reporting and expert witness services. We are committed to delivering exceptional service and ensuring the highest standards of professionalism. We’re looking for a skilled Web Developer to elevate our online presence and maintain a seamless digital experience for our clients and stakeholders. Key Responsibilities • Website Development & Maintenance: Design, develop, and maintain a responsive, user-friendly website that aligns with our branding and mission. • Collaboration: Work closely with operations, CEO to implement features that enhance user experience. • Performance Optimization: Ensure optimal website performance, speed, and search engine rankings through effective optimization techniques. • Troubleshooting & Security: Identify and resolve website issues, ensuring secure and seamless functionality at all times. • Content Management: Update and maintain website content, integrating new tools and features as needed. • Hosting & Server Management: Configure and manage hosting environments, servers, and database integrations for smooth website operations. • Innovation: Stay updated on the latest web development trends and propose improvements to keep the website competitive and innovative. Requirements • Proven experience as a Web Developer with a strong portfolio of completed projects, ideally in the healthcare or professional services sector. • Expertise in front-end technologies, including HTML5, CSS3, and JavaScript frameworks (e.g., React, Angular, or Vue.js). • Proficiency in back-end technologies such as PHP, Node.js, or Python, and experience with WordPress customization. • Strong knowledge of SEO best practices and tools to improve website visibility. • In-depth understanding of website security protocols and compliance standards, particularly for handling sensitive information. • Experience with version control systems (e.g., Git) and database management (e.g., MySQL, MongoDB). • Excellent communication and organizational skills, with the ability to collaborate effectively in a cross-functional team. Preferred Qualifications • Experience in the healthcare or medical reporting industry preferred. • Familiarity with design tools like Adobe XD or Figma for creating wireframes and prototypes. • Knowledge of e-commerce platforms such as Shopify or WooCommerce for potential integrations. • API development experience for seamless third-party integrations. Why Join Us? At British Medical Experts, you’ll have the opportunity to work on impactful projects that make a difference in the healthcare and legal sectors. We foster a collaborative and innovative environment, where your expertise will play a vital role in shaping our digital presence. If you’re an experienced Web Developer passionate about creating impactful and user-centric websites, we’d love to have you join our team at British Medical Experts. Apply now with your resume, portfolio, and a brief cover letter highlighting your relevant experience. Note: Please do not send messages to our website regarding the application as it will automatically disqualify you.
27.0 GBP Technology & Programming peopleperhour Overseas
1 days ago
Project Summary: Resolving Magento 2.4 Environment Issues and Plesk-Related Conflicts Live Server 1. Current Status: - The live version of Magento 2.4 works without any issues. 2. Problematic Behavior: - Developer instances using the exact same codebase display Error 500/503 since this morning. - Potential Cause: A recent Plesk update is suspected of causing these issues, as this has been a recurring problem after updates. 3. Cross-Domain Interference: - An unrelated domain on the same server, with a completely different codebase, also stops working unexpectedly. - Running a custom script to fix ownership and privileges in the vhost directory resolves some symptoms but does not fully address the root cause. 4. Configuration Discrepancies: - A comprehensive comparison (diff) of configuration files on the live server reveals discrepancies in Nginx/Apache, PHP-FPM, and SSL/TLS configurations, which may be contributing to the issue. Development Server 1. Current Status: - Development instances running on different servers fail to work with the same Magento 2.4 codebase that functions perfectly on the live server. 2. Problematic Behavior: - Despite adjustments in permissions, ownership, and service restarts, the development server continues to show Error 500/503. 3. Configuration Differences: - Configuration files on the development server are inherently different due to its purpose, but they seem to be misaligned with the live server setup. Objective 1. Diagnose and resolve the 500/503 errors affecting the developer instances. 2. Ensure the developer instances replicate the live server environment as closely as possible while maintaining their independent configurations. 3. Fix any Plesk 18-related issues causing conflicts or cross-domain interference. Deliverables 1. Fully functioning developer instances of Magento 2.4 that mirror the live environment. 2. Isolation of domains on the same server to prevent cross-domain interference. 3. Comprehensive documentation of the solution, highlighting: - Configuration changes. - Any Plesk-related fixes. - Steps to prevent recurrence after future Plesk updates.
18.0 USD Technology & Programming peopleperhour Overseas
14 hour ago