MLabs Logo

MLabs

Web Scraping Specialist

Posted 5 Days Ago
In-Office or Remote
2 Locations
75K-100K Annually
Mid level
In-Office or Remote
2 Locations
75K-100K Annually
Mid level
Build and maintain high-performance web scraping pipelines: develop extraction code, handle dynamic content and pagination, ensure data quality and storage, monitor and scale distributed scraping infrastructure, and optimize processes for large-scale AI data ingestion.
The summary above was generated by AI

Location: Remote with a 6 hour overlap with EST

Remote | Full-time

Compensation: $75K - $100K

We are hiring on behalf of our client who is seeking a Web Scraping Specialist to join a specialized technical team focused on building the infrastructure that delivers massive amounts of web data for the training of advanced AI models. This organization operates a massive distributed crawler and manages complex pipelines for ingesting, segmenting, and annotating billions of data points, including videos, transcripts, and audio files.

The successful candidate will lead efforts to gather and analyze data, optimize scraping processes, and support the scaling of high-quality public web data accessibility. This role is ideal for a lean, technical builder who thrives in a fast-paced environment without bureaucratic red tape.

Key Responsibilities:

  • Code Development: Write, test, and refine high-performance code to extract data from various online sources, ensuring maximum reliability and efficiency.
  • Data Retrieval: Manage complex data retrieval tasks, including handling pagination and dynamic content loaded via AJAX.
  • Data Quality: Clean and format extracted data to ensure it meets rigorous quality standards for downstream analysis and processing.
  • Database Management: Store and manage scraped data in appropriate databases, optimizing for both access speed and long-term data integrity.
  • Monitoring and Maintenance: Regularly monitor scraping processes and infrastructure to identify and resolve issues, ensuring a continuous and stable data flow.

Requirements
  • Extraction Expertise: Demonstrated ability to extract data from complex websites with minimal supervision, supported by a portfolio of past projects.
  • Technical Proficiency: Advanced skills in Python or JavaScript, specifically with libraries and frameworks such as BeautifulSoup, Scrapy, or Selenium.
  • Advanced Programming: Strong knowledge of asynchronous programming, multithreading, and distributed scraping architectures.
  • Web Fundamentals: In-depth knowledge of HTML, CSS, JavaScript, and the Document Object Model (DOM).
  • Data Storage: Experience with NoSQL databases (e.g., MongoDB, Cassandra), including the ability to design efficient storage solutions.
  • Cloud Infrastructure: Experience deploying and managing large-scale scraping jobs using cloud services such as AWS, Google Cloud, or Azure.
  • Preferred Skills: Ability to apply machine learning algorithms for data cleaning, categorization, or predictive analysis; active participation in relevant open-source projects.

Benefits
  • Competitive Compensation: A highly competitive salary ranging from $75,000 to $100,000, complemented by a comprehensive benefits and equity package.
  • Impactful Work: The opportunity to work at the forefront of AI development and web-scale knowledge graph creation.
  • High-Output Culture: A professional environment that prioritizes low ego, technical autonomy, and rapid execution.
  • Remote Flexibility: This is a remote position requiring a 6-hour overlap with the core team's schedule.

Due to the high volume of applications we anticipate, we regret that we are unable to provide individual feedback to all candidates. If you do not hear back from us within 4 weeks of your application, please assume that you have not been successful on this occasion. We genuinely appreciate your interest and wish you the best in your job search.

Commitment to Equality and Accessibility:

At MLabs, we are committed to offer equal opportunities to all candidates. We ensure no discrimination, accessible job adverts, and providing information in accessible formats. Our goal is to foster a diverse, inclusive workplace with equal opportunities for all. If you need any reasonable adjustments during any part of the hiring process or you would like to see the job-advert in an accessible format please let us know at the earliest opportunity by emailing [email protected].

MLabs Ltd collects and processes the personal information you provide such as your contact details, work history, resume, and other relevant data for recruitment purposes only. This information is managed securely in accordance with MLabs Ltd’s Privacy Policy and Information Security Policy, and in compliance with applicable data protection laws. Your data may be shared only with clients and trusted partners where necessary for recruitment purposes. You may request the deletion of your data or withdraw your consent at any time by contacting [email protected].

Similar Jobs

7 Hours Ago
Remote
USA
70K-140K Annually
Mid level
70K-140K Annually
Mid level
Artificial Intelligence • Software
The Web Scraping Specialist will extract data from websites, optimize scraping processes, and manage data integrity, ensuring high-quality outputs.
Top Skills: AWSAzureBeautifulsoupCassandraGCPJavaScriptMongoDBNoSQLPythonScrapySelenium
Senior level
Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Lead global HR service delivery strategy, drive AI-first transformation and autonomous HR workflows, own HRSD roadmap, improve operations via process redesign and automation, build team capabilities and governance, and influence cross-functional stakeholders to scale ServiceNow HR service delivery.
Top Skills: Agentic WorkflowsAICase Management ToolsEmployeeworksIntelligent AutomationMachine LearningNow Assist For HrPayroll PlatformsSailpointServicenow HrsdVirtual AgentWorkday
An Hour Ago
Remote
USA
Entry level
Entry level
Insurance • Financial Services
Handle inbound cancellation calls and outbound lapse recovery to retain life insurance policyholders. Assess customer needs, recommend solutions, document interactions, meet call center KPIs, and maintain product knowledge while delivering empathetic, accurate service.
Top Skills: ExcelMicrosoft OutlookMicrosoft Word

What you need to know about the Los Angeles Tech Scene

Los Angeles is a global leader in entertainment, so it’s no surprise that many of the biggest players in streaming, digital media and game development call the city home. But the city boasts plenty of non-entertainment innovation as well, with tech companies spanning verticals like AI, fintech, e-commerce and biotech. With major universities like Caltech, UCLA, USC and the nearby UC Irvine, the city has a steady supply of top-flight tech and engineering talent — not counting the graduates flocking to Los Angeles from across the world to enjoy its beaches, culture and year-round temperate climate.

Key Facts About Los Angeles Tech

  • Number of Tech Workers: 375,800; 5.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Snap, Netflix, SpaceX, Disney, Google
  • Key Industries: Artificial intelligence, adtech, media, software, game development
  • Funding Landscape: $11.6 billion in venture capital funding in 2024 (Pitchbook)
  • Notable Investors: Strong Ventures, Fifth Wall, Upfront Ventures, Mucker Capital, Kittyhawk Ventures
  • Research Centers and Universities: California Institute of Technology, UCLA, University of Southern California, UC Irvine, Pepperdine, California Institute for Immunology and Immunotherapy, Center for Quantum Science and Engineering

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account